Sponsored Content
Top Forums Shell Programming and Scripting awk to remove field and match strings to add text Post 302974338 by cmccabe on Saturday 28th of May 2016 12:08:58 PM
Old 05-28-2016
awk to remove field and match strings to add text

In file1 field $18 is removed.... column header is "Otherinfo", then each line in file1 is used to search file2 for a match. When a match is found the last four strings in file2 are copied to file1.

Maybe:
cut -f1-17 file1 and then match each line to file2

file1
Code:
Chr    Start    End    Ref    Alt    Func.refGene    Gene.refGene    GeneDetail.refGene    ExonicFunc.refGene    AAChange.refGene    PopFreqMax    CLINSIG    CLNDBN    CLNACC    CLNDSDB    CLNDSDBID    common    Otherinfo
chr1    949654    949654    A    G    exonic    ISG15    .    synonymous SNV    ISG15:NM_005101:exon2:c.294A>G:p.V98V    0.96    .    .    .    .    .    .    1    3825.28    624    chr1    949654    .    A    G    3825.28    PASS    AF=1;AO=621;DP=624;FAO=399;FDP=399;FR=.;FRO=0;FSAF=225;FSAR=174;FSRF=0;FSRR=0;FWDB=0.00425236;FXX=0.00249994;HRUN=1;LEN=1;MLLD=97.922;OALT=G;OID=.;OMAPALT=G;OPOS=949654;OREF=A;PB=0.5;PBP=1;QD=38.3487;RBI=0.0367904;REFB=0.0353003;REVB=-0.0365438;RO=2;SAF=335;SAR=286;SRF=0;SRR=2;SSEN=0;SSEP=0;SSSB=0.00332809;STB=0.5;STBP=1;TYPE=snp;VARB=-3.42335e-05;ANN=ISG15    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    1/1:171:624:399:2:0:621:399:1:286:335:0:2:174:225:0:0 GOOD 399 reads
chr1    977330    977330    T    C    intronic    AGRN    .    .    .    1.    Benign    not_specified    RCV000116254.1    MedGen    CN169374    Common    1    1555.72    163    chr1    977330    .    T    C    1555.72    PASS    AF=1;AO=163;DP=163;FAO=163;FDP=163;FR=.;FRO=0;FSAF=65;FSAR=98;FSRF=0;FSRR=0;FWDB=0.0181869;FXX=0;HRUN=1;LEN=1;MLLD=117.006;OALT=C;OID=.;OMAPALT=C;OPOS=977330;OREF=T;PB=0.5;PBP=1;QD=38.1771;RBI=0.0262515;REFB=0;REVB=-0.0189308;RO=0;SAF=65;SAR=98;SRF=0;SRR=0;SSEN=0;SSEP=0;SSSB=6.62803e-08;STB=0.5;STBP=1;TYPE=snp;VARB=-1.32905e-06;ANN=AGRN    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    1/1:74:163:163:0:0:163:163:1:98:65:0:0:98:65:0:0 GOOD 163 reads
chr1    981931    981931    A    G    exonic    AGRN    .    synonymous SNV    AGRN:NM_198576:exon18:c.3066A>G:p.S1022S    0.96    Benign    not_specified    RCV000116259.1    MedGen    CN169374    Common    1    915.076    96    chr1    981931    .    A    G    915.076    PASS    AF=1;AO=96;DP=96;FAO=96;FDP=96;FR=.;FRO=0;FSAF=34;FSAR=62;FSRF=0;FSRR=0;FWDB=0.0308666;FXX=0;HRUN=1;LEN=1;MLLD=62.5559;OALT=G;OID=.;OMAPALT=G;OPOS=981931;OREF=A;PB=0.5;PBP=1;QD=38.1282;RBI=0.0605651;REFB=0;REVB=-0.0521094;RO=0;SAF=34;SAR=62;SRF=0;SRR=0;SSEN=0;SSEP=0;SSSB=-4.95992e-08;STB=0.5;STBP=1;TYPE=snp;VARB=-0.000155929;ANN=AGRN    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    1/1:43:96:96:0:0:96:96:1:62:34:0:0:62:34:0:0 GOOD 96 reads

file2
Code:
##.....
##.....
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    NS12911_BC1
chr1    949654    .    A    G    3825.28    PASS    AF=1;AO=621;DP=624;FAO=399;FDP=399;FR=.;FRO=0;FSAF=225;FSAR=174;FSRF=0;FSRR=0;FWDB=0.00425236;FXX=0.00249994;HRUN=1;LEN=1;MLLD=97.922;OALT=G;OID=.;OMAPALT=G;OPOS=949654;OREF=A;PB=0.5;PBP=1;QD=38.3487;RBI=0.0367904;REFB=0.0353003;REVB=-0.0365438;RO=2;SAF=335;SAR=286;SRF=0;SRR=2;SSEN=0;SSEP=0;SSSB=0.00332809;STB=0.5;STBP=1;TYPE=snp;VARB=-3.42335e-05;ANN=ISG15    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    1/1:171:624:399:2:0:621:399:1:286:335:0:2:174:225:0:0 GOOD 399 reads hom
chr1    977330    .    T    C    1555.72    PASS    AF=1;AO=163;DP=163;FAO=163;FDP=163;FR=.;FRO=0;FSAF=65;FSAR=98;FSRF=0;FSRR=0;FWDB=0.0181869;FXX=0;HRUN=1;LEN=1;MLLD=117.006;OALT=C;OID=.;OMAPALT=C;OPOS=977330;OREF=T;PB=0.5;PBP=1;QD=38.1771;RBI=0.0262515;REFB=0;REVB=-0.0189308;RO=0;SAF=65;SAR=98;SRF=0;SRR=0;SSEN=0;SSEP=0;SSSB=6.62803e-08;STB=0.5;STBP=1;TYPE=snp;VARB=-1.32905e-06;ANN=AGRN    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    1/1:74:163:163:0:0:163:163:1:98:65:0:0:98:65:0:0 GOOD 163 reads hom
chr1    981931    .    A    G    915.076    PASS    AF=1;AO=96;DP=96;FAO=96;FDP=96;FR=.;FRO=0;FSAF=34;FSAR=62;FSRF=0;FSRR=0;FWDB=0.0308666;FXX=0;HRUN=1;LEN=1;MLLD=62.5559;OALT=G;OID=.;OMAPALT=G;OPOS=981931;OREF=A;PB=0.5;PBP=1;QD=38.1282;RBI=0.0605651;REFB=0;REVB=-0.0521094;RO=0;SAF=34;SAR=62;SRF=0;SRR=0;SSEN=0;SSEP=0;SSSB=-4.95992e-08;STB=0.5;STBP=1;TYPE=snp;VARB=-0.000155929;ANN=AGRN    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    1/1:43:96:96:0:0:96:96:1:62:34:0:0:62:34:0:0 GOOD 96 reads hom

desired output
Code:
Chr    Start    End    Ref    Alt    Func.refGene    Gene.refGene    GeneDetail.refGene    ExonicFunc.refGene    AAChange.refGene    PopFreqMax    CLINSIG    CLNDBN    CLNACC    CLNDSDB    CLNDSDBID    common    Otherinfo
chr1    949654    949654    A    G    exonic    ISG15    .    synonymous SNV    ISG15:NM_005101:exon2:c.294A>G:p.V98V    0.96    .    .    .    .    .    .     GOOD 399 reads hom
chr1    977330    977330    T    C    intronic    AGRN    .    .    .    1.    Benign    not_specified    RCV000116254.1    MedGen    CN169374    Common    GOOD 163 reads hom
chr1    981931    981931    A    G    exonic    AGRN    .    synonymous SNV    AGRN:NM_198576:exon18:c.3066A>G:p.S1022S    0.96    Benign    not_specified    RCV000116259.1    MedGen    CN169374    Common    GOOD 96 reads hom


Last edited by cmccabe; 05-28-2016 at 01:35 PM.. Reason: fixed format, added cut
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove Carriage returns between strings in a field

Is there any way to remove carriage retuns between the records? We have input records separated by TABS and have carriage returns as below: 123 456 789 ABC "1952.00" 678 "abcdef ghik lmno" Above we... (10 Replies)
Discussion started by: acheepi
10 Replies

2. Shell Programming and Scripting

awk or sed to add field in a text file

Hi there, I have a csv file with some columns comma sepated like this : 4502-17,PETER,ITA2,LEGUE,92,ME - HALF,23/05/10 15:00 4502-18,CARL,ITA2,LEGUE,96,ME - HALF,20/01/09 14:00 4502-19,OTTO,ITA2,LEGUE,97,ME - MARY,23/05/10 15:00 As you can see the column n. 7 is a timestamp column, I need... (23 Replies)
Discussion started by: capnino
23 Replies

3. Shell Programming and Scripting

awk, comma as field separator and text inside double quotes as a field.

Hi, all I need to get fields in a line that are separated by commas, some of the fields are enclosed with double quotes, and they are supposed to be treated as a single field even if there are commas inside the quotes. sample input: for this line, 5 fields are supposed to be extracted, they... (8 Replies)
Discussion started by: kevintse
8 Replies

4. Shell Programming and Scripting

AWK: Pattern match between 2 files, then compare a field in file1 as > or < field in file2

First, thanks for the help in previous posts... couldn't have gotten where I am now without it! So here is what I have, I use AWK to match $1 and $2 as 1 string in file1 to $1 and $2 as 1 string in file2. Now I'm wondering if I can extend this AWK command to incorporate the following: If $1... (4 Replies)
Discussion started by: right_coaster
4 Replies

5. Shell Programming and Scripting

awk to parse field and include the text of 1 pipe in field 4

I am trying to parse the input in awk to include the |gc= in $4 but am not able to. The below is close: awk so far: awk '{sub(/\|]+]++/, ""); print }' input.txt Input chr1 955543 955763 AGRN-6|pr=2|gc=75 0 + chr1 957571 957852 AGRN-7|pr=3|gc=61.2 0 + chr1 970621 ... (7 Replies)
Discussion started by: cmccabe
7 Replies

6. Shell Programming and Scripting

Using awk to remove lines from file that match text

I am trying to remove each line in which $2 is FP or RFP. I believe the below will remove one instance but not both. Thank you :). file 12 123 FP 11 10 RFP awk awk -F'\t' ' $2 != "FP"' file desired output 12 11 (6 Replies)
Discussion started by: cmccabe
6 Replies

7. Shell Programming and Scripting

awk to match field between two files and use conditions on match

I am trying to look for $2 of file1 (skipping the header) in $2 of file2 (skipping the header) and if they match and the value in $10 is > 30 and $11 is > 49, then print the line from file1 to a output file. If no match is foung the line is not printed. Both the input and output are tab-delimited.... (3 Replies)
Discussion started by: cmccabe
3 Replies

8. UNIX for Beginners Questions & Answers

Use strings from nth field from one file to match strings in entire line in another file, awk

I cannot seem to get what should be a simple awk one-liner to work correctly and cannot figure out why. I would like to use patterns from a specific field in one file as regex to search for matching strings in the entire line ($0) of another file. I would like to output the lines of File2 which... (1 Reply)
Discussion started by: jvoot
1 Replies

9. Shell Programming and Scripting

awk to add text to matching pattern in field

In the awk I am trying to add :p.=? to the end of each $9 that matches the pattern NM_. The below executes andis close but I can not seem to figure out why the :p.=? repeats in the split as in the green in the current output. I have added comments as well. Thank you :). file ... (4 Replies)
Discussion started by: cmccabe
4 Replies

10. Shell Programming and Scripting

awk to print text in field if match and range is met

In the awk below I am trying to match the value in $4 of file1 with the split value from $4 in file2. I store the value of $4 in file1 in A and the split value (using the _ for the split) in array. I then strore the value in $2 as min, the value in $3 as max, and the value in $1 as chr. If A is... (6 Replies)
Discussion started by: cmccabe
6 Replies
All times are GMT -4. The time now is 01:53 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy