Need awk to extract lines and sort


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Need awk to extract lines and sort
# 1  
Old 10-11-2009
Need awk to extract lines and sort

Hi,
My data looks like this.
Code:
 CHR                           SNP         BP   A1       TEST    NMISS         OR         STAT            P
   0                 SNP_A-8282315          0    2        ADD     1530      1.074       0.7707       0.4409
   0                 SNP_A-8282315          0    2       COV1     1530  1.771e+04        4.764    1.898e-06
   0                 SNP_A-8282315          0    2       COV2     1530  1.513e+04        4.645    3.402e-06
   0                 SNP_A-8282315          0    2       COV3     1530      14.16        1.306       0.1915
   0                 SNP_A-8282315          0    2       COV4     1530      1.264       0.1139       0.9093
   0                 SNP_A-8282315          0    2       COV5     1530      2.389       0.4268       0.6695
   0                 SNP_A-8338258          0    4        ADD     1528     0.9498      -0.6824        0.495
   0                 SNP_A-8338258          0    4       COV1     1528  1.846e+04        4.777    1.783e-06
   0                 SNP_A-8338258          0    4       COV2     1528  1.374e+04          4.6    4.224e-06
   0                 SNP_A-8338258          0    4       COV3     1528      14.82         1.33       0.1836
   0                 SNP_A-8338258          0    4       COV4     1528      1.251       0.1087       0.9134
   0                 SNP_A-8338258          0    4       COV5     1528      2.431       0.4354       0.6633

.
.
  1. I want to extract only lines with "ADD" in col. 5 (TEST).
  2. I want to sort ascending on Col 9 (P). This column as shown above has both regular and scientific formats.
  3. How can I keep the headers of the file intact in the outfile?
What I tried:
Code:
gawk '{if(NR>1 && $5="ADD") {print $0}}' infile>outfile

Everything in col5 is changed to ADD. Smilie

Last edited by genehunter; 10-11-2009 at 02:33 AM.. Reason: spelling ALL to ADD
# 2  
Old 10-11-2009
Quote:
Originally Posted by genehunter
...
What I tried:
Code:
gawk '{if(NR>1 && $5="ADD") {print $0}}' infile>outfile

Everything in col5 is changed to ALL. Smilie
That's because you assigned the value "ADD" to the fifth field ($5). You will have to compare the fifth field ($5) with "ADD" instead.

=> Operator for assignment is "=". It changes the value.
=> Operator for comparison is "==". It doesn't change the value.

Code:
$
$ cat f1
 CHR                           SNP         BP   A1       TEST    NMISS         OR         STAT            P
   0                 SNP_A-8282315          0    2        ADD     1530      1.074       0.7707       0.4409
   0                 SNP_A-8282315          0    2       COV1     1530  1.771e+04        4.764    1.898e-06
   0                 SNP_A-8282315          0    2       COV2     1530  1.513e+04        4.645    3.402e-06
   0                 SNP_A-8282315          0    2       COV3     1530      14.16        1.306       0.1915
   0                 SNP_A-8282315          0    2       COV4     1530      1.264       0.1139       0.9093
   0                 SNP_A-8282315          0    2       COV5     1530      2.389       0.4268       0.6695
   0                 SNP_A-8338258          0    4        ADD     1528     0.9498      -0.6824        0.495
   0                 SNP_A-8338258          0    4       COV1     1528  1.846e+04        4.777    1.783e-06
   0                 SNP_A-8338258          0    4       COV2     1528  1.374e+04          4.6    4.224e-06
   0                 SNP_A-8338258          0    4       COV3     1528      14.82         1.33       0.1836
   0                 SNP_A-8338258          0    4       COV4     1528      1.251       0.1087       0.9134
   0                 SNP_A-8338258          0    4       COV5     1528      2.431       0.4354       0.6633
$
$ gawk '{if(NR>1 && $5=="ADD") {print $0}}' f1
   0                 SNP_A-8282315          0    2        ADD     1530      1.074       0.7707       0.4409
   0                 SNP_A-8338258          0    4        ADD     1528     0.9498      -0.6824        0.495
$
$

Rookie programmer mistake. Smilie

In fact, since printing the line is the default action of awk and its variants, you could slim down your one-liner to:

Code:
gawk '$5=="ADD"' <filename>

tyler_durden
# 3  
Old 10-11-2009
Hey! Thanks for pointing it out.
I corrected it and worked!
Any ideas on how to sort that mixed data type col9.?
I also want to keep that header intact. But cant figure why NR>1 doesnt work it to keep it.
Thanks
# 4  
Old 10-11-2009
Hi,

Try this....

Code:
awk '{if(NR==1) {print $0}};{if(NR>1 && $5=="ADD") {print $0}}' test_forum | sort -nk9

# 5  
Old 10-11-2009
And a bit shorter version:

Code:
awk 'NR==1 || $5=="ADD"' filename | sort -nk9

tyler_durden
# 6  
Old 10-11-2009
OP requires to sort on both regular and scientific format. The g modifier would then be more appropriate.

Code:
 ... | sort -k9,9g

# 7  
Old 10-11-2009
Thank you all

Hi,
That worked!
Thanks to durden_tyler, pravin27 and ripat.
Once again, thanks for helping me out. I am not a programmer, but a biologist, kind of masquerading as a bioinformatician; purely due to necessity rather than choice.
This forum has been of so much use to me.
Code:
awk 'NR==1 || $5=="ADD"'

Can you please tell me what the two pipes do?
Is that the (OR) operator?

Last edited by genehunter; 10-11-2009 at 02:20 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

awk - (URGENT!) Print lines sort and move lines if match found

URGENT HELP IS NEEDED!! I am looking to move matching lines (01 - 07) from File1 and 77 tab the matching string from File2, to File3.txt. I am almost done but - Currently, script is not printing lines to File3.txt in order. - Also the matching lines are not moving out of File1.txt ... (1 Reply)
Discussion started by: High-T
1 Replies

2. UNIX for Dummies Questions & Answers

awk - Extract 4 lines in Column to Rows Tab Delimited between tags

I have tried the following to no avail. xargs -n8 < test.txt awk '{if(NR%6!=0){p=""}else{p="\n"};printf $0" "p}' Mod_Alm_log.txt > test.txt I have tried different variations of the above, the problem is mixes lines together. And it includes the tags "%a and %A" I need them to be all tab... (16 Replies)
Discussion started by: mytouchsr
16 Replies

3. UNIX for Dummies Questions & Answers

Extract lines in awk

Hi, I have one file of the following format: TBCD, 1521, 14585236, NSDFC XSDF, 1845, 14525426, SDFFF SDFC, 4524, 14523655, SDNCV ASBC, 1845, 48754251, SDFFC ASBC, 1845, 54542512, SDFFF ASBC, 1845, 34212512, NSDFC ASBC, 1845, 16890234, ASFCH MNDG, 1896, 15842642, SFTDD SDFC, 8524,... (4 Replies)
Discussion started by: alex2005
4 Replies

4. Shell Programming and Scripting

Search for a pattern,extract value(s) from next line, extract lines having those extracted value(s)

I have hundreds of files to process. In each file I need to look for a pattern then extract value(s) from next line and then search for value(s) selected from point (2) in the same file at a specific position. HEADER ELECTRON TRANSPORT 18-MAR-98 1A7V TITLE CYTOCHROME... (7 Replies)
Discussion started by: AshwaniSharma09
7 Replies

5. Shell Programming and Scripting

awk: sort lines by count of a character or string in a line

I want to sort lines by how many times a string occurs in each line (the most times first). I know how to do this in two passes (add a count field in the first pass then sort on it in the second pass). However, can it be done more optimally with a single AWK command? My AWK has improved... (11 Replies)
Discussion started by: Michael Stora
11 Replies

6. Shell Programming and Scripting

Awk to extract lines with a defined number of characters

This is my problem, my file (file A) contains the following information: Now, I would like to create a file (file B) containing only the lines with 10 or more characters but less than 20 with their corresponding ID: Then, I need to compare the entries and determine their frequency. Thus, I... (7 Replies)
Discussion started by: Xterra
7 Replies

7. Shell Programming and Scripting

Need to extract some lines from output via AWK

Hello Friends, I have got, this output below and i want to extract the name of symlink which is highlighted in red and the path above it highlighted in blue. At the end i want to append path and symlink. /var/tmp/asirohi/jdk/jre /var/tmp/asirohi/jdk/jre/.systemPrefs... (3 Replies)
Discussion started by: asirohi
3 Replies

8. Shell Programming and Scripting

AWK: How to extract text lines between two strings

Hi. I have a text test1.txt file like:Receipt Line1 Line2 Line3 End Receipt Line4 Line5 Line6 Canceled Receipt Line7 Line8 Line9 End (9 Replies)
Discussion started by: TQ3
9 Replies

9. Shell Programming and Scripting

AWK or KSH : Sort, Group and extract from 3 files

Hi, I've the following two CSV files: File1.csv File2.csv Class,Student# Student#,Marks 1001,6001 6002,50 1001,6002 6001,60 1002,7000 ... (3 Replies)
Discussion started by: Matrix2682
3 Replies

10. Shell Programming and Scripting

Remove lines, Sorted with Time based columns using AWK & SORT

Hi having a file as follows MediaErr.log 84 Server1 Policy1 Schedule1 master1 05/08/2008 02:12:16 84 Server1 Policy1 Schedule1 master1 05/08/2008 02:22:47 84 Server1 Policy1 Schedule1 master1 05/08/2008 03:41:26 84 Server1 Policy1 ... (1 Reply)
Discussion started by: karthikn7974
1 Replies
Login or Register to Ask a Question