awk to retain header lines in output


Login or Register for Dates, Times and to Reply

 
Thread Tools Search this Thread
# 1  
awk to retain header lines in output

The awk below executes and produces the current output, which is correct, except I can not seem to include the header lines # and ## in the output as well. I tried adding !/^#/ thinking that it would skip the lines with # and output them but the entire file prints as is. Thank you Smilie.

file
Code:
##bcftools_normVersion=1.9+htslib-1.9
##bcftools_normCommand=norm --do-not-normalize -m -both /path/to/xxxxx.vcf; Date=Tue Feb 26 12:59:30 2019
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	xxxx
chr1	11174372	MTOR	A	<CNV>	100	PASS	FR=.;PRECISE=FALSE;SVTYPE=CNV;END=11217311;LEN=42939;NUMTILES=7;SD=0.47;CDF_MAPD=0.01:1.480581,0.025:1.544948,0.05:1.602554,0.1:1.671659,0.2:1.759366,0.25:1.793881,0.5:1.94,0.75:2.098021,0.8:2.139179,0.9:2.251416,0.95:2.348502,0.975:2.436069,0.99:2.541976;REF_CN=2;CI=0.05:1.60255,0.95:2.3485;RAW_CN=1.94;FUNC=[{'gene':'MTOR'}]	GT:GQ:CN	./.:0:1.94
chr1	11174383	COSM1161896	A	G	264.674	PASS	AF=0;AO=0;DP=4229;FAO=0;FDP=2000;FDVR=5;FR=.;FRO=2000;FSAF=0;FSAR=0;FSRF=1166;FSRR=834;FWDB=-0.0180893;FXX=0;HRUN=1;HS_ONLY=0;LEN=1;MLLD=127.881;OALT=G;OID=COSM1161896;OMAPALT=G;OPOS=11174383;OREF=A;PB=.;PBP=.;QD=0.529347;RBI=0.0955223;REFB=9.08841e-06;REVB=0.0937938;RO=4212;SAF=0;SAR=0;SRF=2442;SRR=1770;SSEN=0;SSEP=0;SSSB=3.6281e-08;STB=0.5;STBP=1;TYPE=snp;VARB=0;HS;FUNC=[{'transcript':'NM_004958.3','gene':'MTOR','location':'exonic','exon':'53'}]	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR	0/0:264:4229:2000:4212:2000:0:0:0:0:0:2442:1770:0:0:1166:834
chr1	43814978	COSM1342796;COSM86963	A	G	231.262	PASS	AF=0.0010005;AO=4;DP=3351;FAO=2;FDP=1999;FDVR=10;FR=.,.;FRO=1997;FSAF=1;FSAR=1;FSRF=944;FSRR=1053;FWDB=0.00987233;FXX=0.000499998;HRUN=1;HS_ONLY=0;LEN=1,1;MLLD=106.81;OALT=G,T;OID=COSM1342796,COSM86963;OMAPALT=G,T;OPOS=43814978,43814978;OREF=A,A;PB=.;PBP=.;QD=0.462755;RBI=0.014386;REFB=4.80559e-05;REVB=0.010464;RO=3338;SAF=1;SAR=3;SRF=1576;SRR=1762;SSEN=0;SSEP=0;SSSB=-0.0113679;STB=0.526994;STBP=0.848;TYPE=snp;VARB=-0.0370454;HS;FUNC=[{'transcript':'NM_005373.2','gene':'MPL','location':'exonic','exon':'10'}]	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR	0/0:231:3351:1999:3338:1997:4:2:0.0010005:3:1:1576:1762:1:1:944:1053
chr1	43814978	COSM1342796;COSM86963	A	G	231.262	PASS	AF=0.05;AO=4;DP=3351;FAO=2;FDP=1999;FDVR=10;FR=.,.;FRO=1997;FSAF=1;FSAR=1;FSRF=944;FSRR=1053;FWDB=0.00987233;FXX=0.000499998;HRUN=1;HS_ONLY=0;LEN=1,1;MLLD=106.81;OALT=G,T;OID=COSM1342796,COSM86963;OMAPALT=G,T;OPOS=43814978,43814978;OREF=A,A;PB=.;PBP=.;QD=0.462755;RBI=0.014386;REFB=4.80559e-05;REVB=0.010464;RO=3338;SAF=1;SAR=3;SRF=1576;SRR=1762;SSEN=0;SSEP=0;SSSB=-0.0113679;STB=0.526994;STBP=0.848;TYPE=snp;VARB=-0.0370454;HS;FUNC=[{'transcript':'NM_005373.2','gene':'MPL','location':'exonic','exon':'10'}]	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR	0/0:231:3351:1999:3338:1997:4:2:0.0010005:3:1:1576:1762:1:1:944:1053

current output
Code:
chr1	43814978	COSM1342796;COSM86963	A	G	231.262	PASS	AF=0.05;AO=4;DP=3351;FAO=2;FDP=1999;FDVR=10;FR=.,.;FRO=1997;FSAF=1;FSAR=1;FSRF=944;FSRR=1053;FWDB=0.00987233;FXX=0.000499998;HRUN=1;HS_ONLY=0;LEN=1,1;MLLD=106.81;OALT=G,T;OID=COSM1342796,COSM86963;OMAPALT=G,T;OPOS=43814978,43814978;OREF=A,A;PB=.;PBP=.;QD=0.462755;RBI=0.014386;REFB=4.80559e-05;REVB=0.010464;RO=3338;SAF=1;SAR=3;SRF=1576;SRR=1762;SSEN=0;SSEP=0;SSSB=-0.0113679;STB=0.526994;STBP=0.848;TYPE=snp;VARB=-0.0370454;HS;FUNC=[{'transcript':'NM_005373.2','gene':'MPL','location':'exonic','exon':'10'}]	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR	0/0:231:3351:1999:3338:1997:4:2:0.0010005:3:1:1576:1762:1:1:944:1053

desired output
Code:
##bcftools_normVersion=1.9+htslib-1.9
##bcftools_normCommand=norm --do-not-normalize -m -both /path/to/xxxxx.vcf; Date=Tue Feb 26 12:59:30 2019
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	xxxx
chr1	43814978	COSM1342796;COSM86963	A	G	231.262	PASS	AF=0.05;AO=4;DP=3351;FAO=2;FDP=1999;FDVR=10;FR=.,.;FRO=1997;FSAF=1;FSAR=1;FSRF=944;FSRR=1053;FWDB=0.00987233;FXX=0.000499998;HRUN=1;HS_ONLY=0;LEN=1,1;MLLD=106.81;OALT=G,T;OID=COSM1342796,COSM86963;OMAPALT=G,T;OPOS=43814978,43814978;OREF=A,A;PB=.;PBP=.;QD=0.462755;RBI=0.014386;REFB=4.80559e-05;REVB=0.010464;RO=3338;SAF=1;SAR=3;SRF=1576;SRR=1762;SSEN=0;SSEP=0;SSSB=-0.0113679;STB=0.526994;STBP=0.848;TYPE=snp;VARB=-0.0370454;HS;FUNC=[{'transcript':'NM_005373.2','gene':'MPL','location':'exonic','exon':'10'}]	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR	0/0:231:3351:1999:3338:1997:4:2:0.0010005:3:1:1576:1762:1:1:944:1053


awk
Code:
awk -F'[\t;]' '
  {
    split(x,V)
    for(i=1; i<=NF; i++) {
      split($i,F,/=/)
      V[F[1]]=F[2]
    }
  }
  (V["AF"]+0 > .03) && 
  (V["DP"]+0 > 20)
' file

# 2  
Hi, try:
Code:
  (V["AF"]+0 > .03) && 
  (V["DP"]+0 > 20) ||
  /^#/

or
Code:
awk -F'[\t;]' '
  /^#/ {
    print
    next
  }
  {
    split(x,V)
    for(i=1; i<=NF; i++) {
      split($i,F,/=/)
      V[F[1]]=F[2]
    }
  }
  (V["AF"]+0 > .03) && 
  (V["DP"]+0 > 20)
' file

This User Gave Thanks to Scrutinizer For This Post:
# 3  
Code:
awk -F'[\t;]' '
/^#/ { print;next}
  {
    split(x,V)
....
}

This User Gave Thanks to vgersh99 For This Post:
# 4  
Works great, thank you. I am currently learning python (or trying) and was going to use the awk as practice.... that is try rewriting it in python. Could I post back comments on each line to see if my thinking is correct? Thank you Smilie.

awk

Code:
awk -F'[\t;]' ' # call awk script and define FS as pattern of tab and semi-colon
  {
    split(x,V) # split each tab and ; and read into array V
    for(i=1; i<=NF; i++) {  # start loop iterating over each line
      split($i,F,/=/)  # split on the = and store in array F
      V[F[1]]=F[2]  # each V is tag=value (example AF=0.05)
    }
  }
  (V["AF"]+0 > .03) && # check AF is greater then 3% and
  (V["DP"]+0 => 20) || check DP is greter than or equal to 20
  /^#/  # retain header lines (if AF and DP criteria are met, print line(s) and header
' file  # define output file

/^#/ { print;next} # retains header as well

Last edited by cmccabe; 02-27-2019 at 02:18 PM.. Reason: commented awk
# 5  
Quote:
Originally Posted by cmccabe
Could I post back comments on each line to see if my thinking is correct?
Of course you can do that - in fact you are explicitly encouraged to do so. This forum is all about self-empowerment and learning to help yourself. But you probably knew that already, didn't you?

A major difference between awk and sed is that the latter outputs every line, changed or not, by default. i.e.

Code:
sed 's/old/NEW/g' /some/file

will not only output all lines containing "old" with "old" changed to "NEW" but also all other lines, simply without any change at all. awk works different and will only output what it is explicitly told to output - through the print command or whatever means. Therefore, if there is no rule to print lines starting with a "#" then these lines will not be printed.

Quote:
Originally Posted by cmccabe
define FS as pattern of tab and semi-colon
Not quite: FS is defined as either a tab or a semicolon. [....] is a so-called "character-class" and often used in regexps. It always means "one of the enclosed characters". i.e. d[ae]n would match either "dan" or "den" but neither "dean" nor "daen". There is the possibility of grouping characters instead of enumerating them, i.e [a-z] is "any (non-capitalised) character a-z" and [a-zA-Z] is "any character a-z, capitalised or not".

You can also negate these classes by using "^" as first character: [^0-9] is "anything but a digit".

I hope this helps.

bakunin

Last edited by bakunin; 02-27-2019 at 02:28 PM..
These 2 Users Gave Thanks to bakunin For This Post:
# 6  
Thank you Smilie.
# 7  
Only the last 2 lines are correctly compared with this separator -F'[\t;]
Code:
awk -F "AF=|DP=" '
/^#/    {print; next}
        {split($2 $3, V, ";")}
( V[1] > 0.03 ) && ( V[3] > 20 )
' file

This User Gave Thanks to nezabudka For This Post:
Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find header in a text file and prepend it to all lines until another header is found

I've been struggling with this one for quite a while and cannot seem to find a solution for this find/replace scenario. Perhaps I'm getting rusty. I have a file that contains a number of metrics (exactly 3 fields per line) from a few appliances that are collected in parallel. To identify the... (3 Replies)
Discussion started by: verdepollo
3 Replies

2. Shell Programming and Scripting

Print header and lines that meet both conditions in awk

In the awk below I am trying to print only the header lines starting with # or ## and the lines that $7 is PASS and AF= is less than 5%. The awk does execute but returns an empty file and I am not sure what I am doing wrong. Thank you. file ... (0 Replies)
Discussion started by: cmccabe
0 Replies

3. Shell Programming and Scripting

awk to output lines less than number

I am trying to output all lines in a file where $7 is less than 30. The below code does create a result file, but with all lines in the original file. The original file is tab deliminated is that the problem? Thank you :). awk 'BEGIN{FS=OFS=","} $7 < 30 {print}' file.txt > result.txt... (3 Replies)
Discussion started by: cmccabe
3 Replies

4. Shell Programming and Scripting

Manipulate all rows except header, but header should be output as well

Hello There... I have a sample input file .. number:department:amount 125:Market:125.23 126:Hardware store:434.95 127:Video store:7.45 128:Book store:14.32 129:Gasolline:16.10 I will be doing some manipulations on all the records except the header, but the header should always be... (2 Replies)
Discussion started by: juzz4fun
2 Replies

5. Shell Programming and Scripting

AWK print and retain original format

I have a file with very specific column spacing formatting, I wish to do the following: awk '{print $1, $2, $3, $4, $5, $6, $19-$7, $20-$8, $21-$9, $10, $11, $12}' merge.pdb > vector.pdb but the format gets ruined. I have tried with print -f but to no avail.... (7 Replies)
Discussion started by: chrisjorg
7 Replies

6. UNIX for Dummies Questions & Answers

cat to a file but retain header

Hi, Is there a way to write to a txt file each day but retain the header on the file? I'm cat'ing 5 files into one .txt file each day but I want the new data to be written after the first 2 lines which are: Progname Size Date Owner ---------------------------- Basically I want my new... (4 Replies)
Discussion started by: Grueben
4 Replies

7. Shell Programming and Scripting

How to retain blank spaces in AWK?

Hi all, I have space delimated file which look like this 1 2 3 4 5 6 7 8 9 1 0 11 I am using simple awk command to read the second column awk '{print $2}' input_file but i got the output like this which also read 10 from the third column 2 6... (8 Replies)
Discussion started by: bsn2011
8 Replies

8. Shell Programming and Scripting

Need to extract some lines from output via AWK

Hello Friends, I have got, this output below and i want to extract the name of symlink which is highlighted in red and the path above it highlighted in blue. At the end i want to append path and symlink. /var/tmp/asirohi/jdk/jre /var/tmp/asirohi/jdk/jre/.systemPrefs... (3 Replies)
Discussion started by: asirohi
3 Replies

9. UNIX for Dummies Questions & Answers

How to retain the header information of a file

Hi, I am using Bash shell to create some data and these data would be piped out to a file, let say output.txt. This output.txt I would like to add some extra header information such as comments, descriptions and general information on the text. I would like to know how could I maintain... (0 Replies)
Discussion started by: ahjiefreak
0 Replies

Featured Tech Videos