run awk on one file for each line in a second file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting run awk on one file for each line in a second file
# 1  
Old 10-14-2008
run awk on one file for each line in a second file

I have a file with a list of 'samples' all in one column and a second file with a list of 'results' for these samples. I am trying to use a FOR loop to go through each sample and gawk the second file to return (the first field of) all the results for that sample.

File 1:
BCM51
CNC11
CNC41
MCW11
UMN51
UWA51
BCM161
BCM211
...
...

File 2:
RS2297516 BCM 5 1 BCM51 A/A C/A FALSE
RS254255 BCM 5 1 BCM51 G/G C/C Mis-Match HOMO
RS254255 CNC 1 1 CNC11 0/0 G/G MISSING
RS1106839 CNC 4 1 CNC41 G/G G/A FALSE
RS2294942 CNC 4 1 CNC41 C/T C/C FALSE
RS3736890 CNC 4 1 CNC41 G/G G/A FALSE
...
...

Desired Result:
BCM51 RS2297516 RS254255
CNC11 RS254255
CNC41 RS1106839 RS2294942 RS3736890
...
...

Code:
#!/bin/ksh

for line in `cat $1`
do
echo $line
gawk     -v sample=$line '
    BEGIN { printf "%s ", sample }
    sample==$5 { printf "%s ", $1}
    END { printf "\n" }' $2
done

All I am getting is errors so far. I have deduced (via commenting out gawk) that it generates an error for each line in the first file and then there is an error on my gawk statement. I can't make sense out of any of them:

Code:
samples.txt: CNC11:  not found
samples.txt[2]: MCW11:  not found
samples.txt[3]: UMN51:  not found
samples.txt[4]: UWA51:  not found
samples.txt[5]: BCM161:  not found
samples.txt[6]: BCM211:  not found
samples.txt[7]: BCM291:  not found
...
...
...
errorsummary.sh[5]: syntax error at line 7 : `'' unmatched

# 2  
Old 10-14-2008
Hello all,
I realized that I was actually executing the wrong script. I had two versions and was editing one but executing the other. No wonder I kept getting the same error no matter what I changed Smilie
# 3  
Old 10-14-2008
Heh. And when you find out about fgrep, you might kill yourself Smilie
# 4  
Old 10-14-2008
Hmmm. I take it from your response that fgrep would have provided a much easier way to perform the task, but after looking it up I don't see how. (I found this definition: "use fgrep to find all the lines of a file that contain a particular word.")
Seems that I would have just been doing the same 'for' loop except to fgrep instead of gawk. In fact, there would have been an extra step because that would have just given me the lines, but I'd still need to parse out the first field from the rest of the line.
Am I missing something?
# 5  
Old 10-14-2008
The -f option to grep provides a facility for finding all lines in a file that match any of the strings in the -f file. Historically this was only available with fgrep, but with GNU grep, you can always do it. So
Code:
fgrep -f file1 file2

Granted, it doesn't ensure that its the 5th field. But you can use -w option to make sure that it IS a field.
# 6  
Old 10-15-2008
Code:
nawk '{
if(NR==FNR)
	arr[$1]++
else
	brr[$5]=sprintf("%s %s",brr[$5],$1)
}
END{
for(i in arr)
	if(brr[i]!="")
		print i" "brr[i]
}
' file1 file2

# 7  
Old 10-15-2008
Thanks otheus & summer cherry.
I got it all figured out and learned something while I was at it. Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk to update file with partial matching line in another file and append text

In the awk below I am trying to cp and paste each matching line in f2 to $3 in f1 if $2 of f1 is in the line in f2 somewhere. There will always be a match (usually more then 1) and my actual data is much larger (several hundreds of lines) in both f1 and f2. When the line in f2 is pasted to $3 in... (4 Replies)
Discussion started by: cmccabe
4 Replies

2. Shell Programming and Scripting

Printing string from last field of the nth line of file to start (or end) of each line (awk I think)

My file (the output of an experiment) starts off looking like this, _____________________________________________________________ Subjects incorporated to date: 001 Data file started on machine PKSHS260-05CP ********************************************************************** Subject 1,... (9 Replies)
Discussion started by: samonl
9 Replies

3. UNIX for Beginners Questions & Answers

Use strings from nth field from one file to match strings in entire line in another file, awk

I cannot seem to get what should be a simple awk one-liner to work correctly and cannot figure out why. I would like to use patterns from a specific field in one file as regex to search for matching strings in the entire line ($0) of another file. I would like to output the lines of File2 which... (1 Reply)
Discussion started by: jvoot
1 Replies

4. Shell Programming and Scripting

Read in txt file and run a different command for each line

hi, i'm trying to write a tcsh script that reads in a text file (one column) and the runs a different command for each line of text. i've found lots of example commands for bash, but not for tcsh. can anyone give me a hint? thanks, jill (8 Replies)
Discussion started by: giuinha
8 Replies

5. Shell Programming and Scripting

Honey, I broke awk! (duplicate line removal in 30M line 3.7GB csv file)

I have a script that builds a database ~30 million lines, ~3.7 GB .cvs file. After multiple optimzations It takes about 62 min to bring in and parse all the files and used to take 10 min to remove duplicates until I was requested to add another column. I am using the highly optimized awk code: awk... (34 Replies)
Discussion started by: Michael Stora
34 Replies

6. Shell Programming and Scripting

Run a command on each line of a text file

Say I have a text file, with several lines. Each line may contain spaces or the # symbol. For each line, I want to pass that line as the path of a file, in order to add it to a tar file. I've tried this but doesn't work: cat contents.txt | xargs -0 `tar -uvf contents.tar $1`Any ideas? ... (3 Replies)
Discussion started by: Tribe
3 Replies

7. Shell Programming and Scripting

[awk] line by line processing the same file

Hey, not too good at this, so I only managed a clumsy and SLOW solution to my problem that needs a drastic speed up. Any ideas how I write the following in awk only? Code is supposed to do... For every line read column values $6, $7, $8 and do a calculation with the same column values of every... (6 Replies)
Discussion started by: origamisven
6 Replies

8. Shell Programming and Scripting

awk concatenate every line of a file in a single line

I have several hundreds of tiny files which need to be concatenated into one single line and all those in a single file. Some files have several blank lines. Tried to use this script but failed on it. awk 'END { print r } r && !/^/ { print FILENAME, r; r = "" }{ r = r ? r $0 : $0 }' *.txt... (8 Replies)
Discussion started by: sdf
8 Replies

9. Shell Programming and Scripting

reading a file inside awk and processing line by line

Hi Sorry to multipost. I am opening the new thread because the earlier threads head was misleading to my current doubt. and i am stuck. list=`cat /u/Test/programs`; psg "ServTest" | awk -v listawk=$list '{ cmd_name=($5 ~ /^/)? $9:$8 for(pgmname in listawk) ... (6 Replies)
Discussion started by: Anteus
6 Replies

10. Shell Programming and Scripting

Awk not working due to missing new line character at last line of file

Hi, My awk program is failing. I figured out using command od -c filename that the last line of the file doesnt end with a new line character. Mine is an automated process because of this data is missing. How do i handle this? I want to append new line character at the end of last... (2 Replies)
Discussion started by: pinnacle
2 Replies
Login or Register to Ask a Question