Some Awk Getline help?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Some Awk Getline help?
# 1  
Old 06-02-2010
Some Awk Getline help?

Greetings,

I have about 3000 files that I want to search. The first column in all of these 3000 files has a unique serial number on each line. The subsequent columns have lots of data.
I have another masterfile with three columns to help me find all the data I need in a moments notice:

col 1 col2 col3
serial.1 row# file1
serial.2 row# file345
serial.3 row# file1023


what I want to do take list of serial numbers, get the file name and row number for where the data sits, and then spit out the data into one file. I figured awk getline might help call up the filename and row name to go searching, but I am really confused with this command.
Another thought was to make a shell script to go down each row in the masterfile, create a variable for "row number variable" and one for "filenamevariable" and then awk '(NF=rownumbervariable){print $0}' filenamevariable >>outputfile

but cant get this working either.


ANy suggestions?

Thanks,
jeeplou
# 2  
Old 06-02-2010
Welcome to the Forum,

Can you please post the sample input and expected output ( might be you can place two or three files ).
# 3  
Old 06-02-2010
1. large datafiles to be queried called "file.dat*" (column one is serial number, others are data)


file.dat1

rs10001 900 900 100 100
rs10002 800 300 200 100

file.dat2
rs10003 222 111 333 444
rs10004 999 121 232 434



2. small masterlist/cheatsheet called "master.list" with three columns: serial#,row#,filename

rs10001 1 file.dat1
rs10002 2 file.dat1
rs10003 1 file.dat2
rs10004 2 file.dat2


3. If I want to recover rs10001 and rs10004 data, I grep from my masterlist to create a subset of insterest called subset.txt

subset.txt
rs10001 1 file.dat1
rs10004 2 file.dat2


Now, I need to recover the original data for each serial number with the expected output:

rs10001 900 900 100 100
rs10004 999 121 232 434

Thanks,
jeeplou

Last edited by jeeplou; 06-02-2010 at 01:04 PM..
# 4  
Old 06-02-2010
Try this:
Code:
awk 'NR==FNR{a[$1]=$1;next}a[$1]' subset.txt file.dat*

# 5  
Old 06-02-2010
thanks for the speedy reply. I understand each individual element in your code, but since I'm not sure what it is "saying", I'm not sure how to set this up for my data....can you elaborate briefly please?


also, is it not helpful to use the file name and row number to help awk figure out where to look?
# 6  
Old 06-02-2010
Have you tried the command?

Place file.dat1, file.dat2 and subset.txt in a directory, run the command to see what happens.
# 7  
Old 06-02-2010
it most definitely works!
what I understand from the code is that you are creating an array that puts all of the serial numbers together and then testing each line of the array as a pattern in all of the files.

My concern is the length of time this will take over 64GB of data...I havent tried this yet, but will get the LSF farm going to see what happens. Is there a way to use the file name and row number to speed things up?


thanks again.
jonah
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk with if, getline, and another if

Howdy Folks, It seems like it is always awk that confuses the heck out of me and I even have books and examples. I have this line: awk '{if (/clientIP/)(SRV = $NF); if ($2 ~ /BUNDLE-GIM/) getline; if ($2 ~ /r100595/) {print SRV,"BUNDLE-GIM",$2}}' post.txt to parse this text: <api... (4 Replies)
Discussion started by: port43
4 Replies

2. Shell Programming and Scripting

awk getline

Hi, I have an awk script with the following function in it . function cmd( c ) { while( ( c | getline foo) > 0 ){ return foo ; close( c ); } } c =... (4 Replies)
Discussion started by: MetaMan
4 Replies

3. Shell Programming and Scripting

awk getline problem

Hello, I want to print out the DNA sequence entries (tens of thousand!) that are longer than certain value (i=200) from a file (FASTA file) as: >S94D_ctg_8004 Average coverage: 402.95 ATAATGCCTGTGAATATGACATGTGTTCCTGTTTCTACATCAGACTACTATTCTTGCATA... (12 Replies)
Discussion started by: yifangt
12 Replies

4. Shell Programming and Scripting

awk getline t file

I want to import a textfile with getline into var t which has several lines. How do import all lines, since it only imports the last line: while < ((getline t "textfile") > 0) (7 Replies)
Discussion started by: sdf
7 Replies

5. Shell Programming and Scripting

awk getline question

Hi there, I have an ifconfig output and i want to write a script that determines whether there is a line "groupname ipmp" on a particular interface here is my example ifconfig -a output lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1 inet 127.0.0.1... (2 Replies)
Discussion started by: rethink
2 Replies

6. Shell Programming and Scripting

Using getline in awk

I am using awk and want to use getline from a file like below getline x < file However file consists of two columns and I only want to store $2 Any way I can do this? ---------- Post updated at 06:54 AM ---------- Previous update was at 06:45 AM ---------- Done something like this.... (1 Reply)
Discussion started by: kristinu
1 Replies

7. Shell Programming and Scripting

awk getline

How do you make the getline function return to the original line? The example below should make it clear where I am currently going wrong. Thanks AWK SCRIPT: ------------- awk -F '-' '{ tmpLine = "EMPTY" print "CURRENT LINE :"$0 getline tmpLine print "NEXT LINE :"tmpLine }'... (1 Reply)
Discussion started by: garethsays
1 Replies

8. Shell Programming and Scripting

awk and system getline

Hello, Need some help here. I have this script (test.sh): #!/bin/sh var=$1 (( var = 2 * var )) echo $var Now I want to call this script from awk with one argument and then capture the result in a variable, something like: echo 40 | awk ' { x = $1; "test.sh " x | getline y; print y }... (1 Reply)
Discussion started by: fbg
1 Replies

9. Shell Programming and Scripting

awk getline help maybe?

hello collegues, I am attempting to use awk to search file1 (serverlist.csv) from each row with file2 (supported.txt). If the is no entry exists in serverlist then output to a file called notsupp.out if there is an entry output to supp.out I can do this with basic shell scripting however... (0 Replies)
Discussion started by: chlawren
0 Replies

10. Shell Programming and Scripting

awk:Problem with getline

$ echo |awk ' BEGIN {"date" | getline current_time;close("date");print "Report printed on " current_time}' Report printed on Thu May 11 14:57:29 METDST 2006 This example works fine but how can i print all the output when is longer... (3 Replies)
Discussion started by: Klashxx
3 Replies
Login or Register to Ask a Question