Sponsored Content
Top Forums Shell Programming and Scripting Failure using regex with awk in 'while read file' loop Post 302940524 by pathunkathunk on Monday 6th of April 2015 11:26:55 PM
Old 04-07-2015
Failure using regex with awk in 'while read file' loop

I have a file1.txt with several 100k lines, each of which has a column 9 containing one of 60 "label" identifiers. Using an labels.txt file containing a list of labels, I'd like to extract 200 random lines from file1.txt for each of the labels in index.txt.

Using a contrived mini-example:
Code:
$ cat file1.txt 
H	0	328	100.0	-	0	0	38D150M140D	M01433:68:000000000-AAT0D:1:1111:13371:3239;barcodelabel=c8;	OTU_1;size=17947;
H	1	325	100.0	+	0	0	150M175D	M01433:68:000000000-AAT0D:1:1105:27659:19941;barcodelabel=c12;	OTU_2;size=101;
H	4	411	99.3	+	0	0	24D150M237D	M01433:68:000000000-AAT0D:1:2107:16393:23698;barcodelabel=g10;	OTU_5;size=64;
H	2	283	98.7	+	0	0	150M133D	M01433:68:000000000-AAT0D:1:2104:21919:3018;barcodelabel=c12;	OTU_3;size=80;
H	1	277	98.5	-	0	0	15I135M142D	M01433:68:000000000-AAT0D:1:2108:12616:12185;barcodelabel=c12;	OTU_2;size=101;
H	0	295	100.0	+	0	0	14D150M131D	M01433:68:000000000-AAT0D:1:1108:4978:15986;barcodelabel=g10;	OTU_1;size=17947;
H	29	312	97.6	-	0	0	25I125M187D	M01433:68:000000000-AAT0D:1:1109:20934:22671;barcodelabel=g15;	OTU_30;size=8;
H	0	315	99.3	-	0	0	88D150M77D	M01433:68:000000000-AAT0D:1:2114:17509:23920;barcodelabel=g10;	OTU_1;size=17947;

$ cat labels.txt
c12
g10

This is what I'm trying, but it results in empty files:
Code:
$ while read file
> do
> awk '/${file}/' file1.txt | gshuf -n 200 > ${file}.txt
> done < labels.txt

Desired output--two random lines for each label in labels.txt (i.e. may vary except for "label=c12" or "label=g12", respectively):
Code:
$ cat c12.txt
H	1	325	100.0	+	0	0	150M175D	M01433:68:000000000-AAT0D:1:1105:27659:19941;barcodelabel=c12;	OTU_2;size=101;
H	2	283	98.7	+	0	0	150M133D	M01433:68:000000000-AAT0D:1:2104:21919:3018;barcodelabel=c12;	OTU_3;size=80;

$ cat g10.txt
H	0	295	100.0	+	0	0	14D150M131D	M01433:68:000000000-AAT0D:1:1108:4978:15986;barcodelabel=g10;	OTU_1;size=17947;
H	0	315	99.3	-	0	0	88D150M77D	M01433:68:000000000-AAT0D:1:2114:17509:23920;barcodelabel=g10;	OTU_1;size=17947;

It seems like the problem is with the " awk '/${file}/' "? I say this because I can extract lines for each label but only if I explicitly specify the label regex (in this case g10.txt also has two random lines with "label=c12" instead of g10):
Code:
$ while read file
> do
> awk '/c12/' file1.txt | gshuf -n 2 > ${file}.txt
> done < labels.txt
$ cat c12.txt 
H	1	277	98.5	-	0	0	15I135M142D	M01433:68:000000000-AAT0D:1:2108:12616:12185;barcodelabel=c12;	OTU_2;size=101;
H	1	325	100.0	+	0	0	150M175D	M01433:68:000000000-AAT0D:1:1105:27659:19941;barcodelabel=c12;	OTU_2;size=101;

Thanks for any pointers.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Read from a file and use the strings in a loop

Hello all, I need some help to create a script which contain a few strings on every line, and use those strings in a loop to fire some commands. for exmaple the file looks like tom dave bill andy paul I want to read one line at a time and use it in loop like command tom command dave... (3 Replies)
Discussion started by: xboxer21
3 Replies

2. UNIX for Dummies Questions & Answers

How to read a file in unix using do....done loop

Hi , can some give me idea about how to use do...done while loop in UNIX to read the contents of a file.. (2 Replies)
Discussion started by: sreenusola
2 Replies

3. Shell Programming and Scripting

How to Read the entire file using while loop

Guys, I am trying to read the whole file using while loop but when i run the ssh part of the script it reads only the first line and exit after that. There are in total 134 lines in the file, but when the output is redirected, it does only for one line and comes to command prompt. pls help..... (11 Replies)
Discussion started by: sdosanjh
11 Replies

4. SCO

file system not getting mounted in read write mode after system power failure

After System power get failed File system is not getting mounted in read- write mode (1 Reply)
Discussion started by: gtkpmbpl
1 Replies

5. Shell Programming and Scripting

IF awk in a while read line-loop

Hi As a newbe in scripting, i struggle hard with my first script. What i want to do is, bringing data of two files together. file1: .... 05/14/12-04:00:00 41253 4259 5135 5604 5812 5372 05/14/12-04:10:00 53408 5501 6592 7402 7354 6639 05/14/12-04:20:00 58748 6037 7292 8223... (13 Replies)
Discussion started by: IMPe
13 Replies

6. UNIX for Dummies Questions & Answers

read regex from ID file, print regex and line below from source file

I have a file of protein sequences with headers (my source file). Based on a list of IDs (which are included in some of the headers), I'd like to print out only the specified sequences, with only the ID as header. In other words, I'd like to search source.txt for the terms in IDs.txt, and print... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

7. Shell Programming and Scripting

Using awk instead of while loop to read file

Hello, I have a huge file, I am currently using while loop to read and do some calculation on it, but it is taking a lot of time. I want to use AWK to read and do those calculations. Please suggest. currently doing: cat input2 | while read var1 do num=`echo $var1 | awk... (6 Replies)
Discussion started by: anand2308
6 Replies

8. Shell Programming and Scripting

For loop inside awk to read and print contents of files

Hello, I have a set of files Xfile0001 - Xfile0021, and the content of this files (one at a time) needs to be printed between some line (lines start with word "Generated") that I am extracting from another file called file7.txt and all the output goes into output.txt. First I tried creating a for... (5 Replies)
Discussion started by: jaldo0805
5 Replies

9. Shell Programming and Scripting

Use while loop to read file and use ${file} for both filename input into awk and as string to print

I have files named with different prefixes. From each I want to extract the first line containing a specific string, and then print that line along with the prefix. I've tried to do this with a while loop, but instead of printing the prefix I print the first line of the file twice. Files:... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

10. Shell Programming and Scripting

Failure: if grep "$Var" "$line" inside while read line loop

Hi everybody, I am new at Unix/Bourne shell scripting and with my youngest experiences, I will not become very old with it :o My code: #!/bin/sh set -e set -u export IFS= optl="Optl" LOCSTORCLI="/opt/lsi/storcli/storcli" ($LOCSTORCLI /c0 /vall show | grep RAID | cut -d " "... (5 Replies)
Discussion started by: Subsonic66
5 Replies
RANDOM(6)							 BSD Games Manual							 RANDOM(6)

NAME
random -- random lines from a file or random numbers SYNOPSIS
random [-elrUuw] [-f filename] [denominator] DESCRIPTION
Random has two distinct modes of operations. The default is to read in lines from the standard input and randomly write them out to the standard output with a probability of 1 / denominator. The default denominator for this mode of operation is 2, giving each line a 50/50 chance of being displayed. The second mode of operation is to read in a file from filename and randomize the contents of the file and send it back out to standard out- put. The contents can be randomized based off of newlines or based off of space characters as determined by isspace(3). The default denominator for this mode of operation is 1, which gives each line a chance to be displayed, but in a random(3) order. The options are as follows: -e If the -e option is specified, random does not read or write anything, and simply exits with a random exit value of 0 to denominator - 1, inclusive. -f filename The -f option is used to specify the filename to read from. Standard input is used if filename is set to '-'. -l Randomize the input via newlines (the default). -r The -r option guarantees that the output is unbuffered. -U Tells random(6) that it is okay for it to reuse any given line or word when creating a randomized output. -u Tells random(6) not to select the same line or word from a file more than once (the default). This does not guarantee uniqueness if there are two of the same tokens from the input, but it does prevent selecting the same token more than once. -w Randomize words separated by isspace(3) instead of newlines. SEE ALSO
random(3), fortune(6) HISTORY
The functionality to randomizing lines and words was added in 2003 by Sean Chittenden <seanc@FreeBSD.org>. BUGS
No index is used when printing out tokens from the list which makes it rather slow for large files (10MB+). For smaller files, however, it should still be quite fast and efficient. BSD
February 8, 2003 BSD
All times are GMT -4. The time now is 06:50 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy