Modify text file using awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Modify text file using awk
# 1  
Old 05-23-2013
Ubuntu Modify text file using awk

Quote:
chr1 412573 . A C 2758.77 . AC=2;AF=1.00;AN=2;DP=71;Dels=0.00;FS=0.000;HaplotypeScore=2.8822;MLEAC=2;MLEAF=1.00;MQ=58.36;MQ0=0;Q D=38.86;resource.EFF=INTERGENIC(MODIFIER||||||||) GT:ADSmilieP:GQ:PL 1/1:0,71:71:99:2787,214,0 GATKSAM

chr1 602567 rs21953190 A G 5481.77 . AC=2;AF=1.00;AN=2;DB;DP=152;Dels=0.00;FS=0.000;HaplotypeScore=6.8385;MLEAC=2;MLEAF=1.00;MQ=59.09;MQ0 =0;QD=36.06;resource.EFF=SYNONYMOUS_CODING(LOW|SILENT|gaT/gaC|D1034|ADNP2|protein_coding|CODING|ENSCAFT00000000008|5) GT:ADSmilieP:GQ:PL 1/1:0,151:151:99:5510,430,0 GATKSAM
I have text file with lines as shown here. Each row has 11 columns separated by tab. In each row, i want to split the 8th column such that the output should look like shown below. Here value in the 9th column is DP value and in the 10th column is MQ value followed by the values after resource.EFF=.
Quote:
chr1 412573 . A C 2758.77 . 71 58.36 INTERGENIC MODIFIER GT:ADSmilieP:GQ:PL 1/1:0,71:71:99:2787,214,0 GATKSAM

chr1 602567 rs21953190 A G 5481.77 . 152 59.09 SYNONYMOUS_CODING LOW SILENT gaT/gaC D1034 ADNP2 protein_coding CODING ENSCAFT00000000008 5 GT:ADSmilieP:GQ:PL 1/1:0,151:151:99:5510,430,0 GATKSAM


Which means the 8th column has to be cleaned up such that, it has only DP value, MQ value and the information after resource.EFF= separated by tabs.

Could anyone help?
# 2  
Old 05-23-2013
This sure looks a lot like another thread you started in this forum earlier today:Split column using awk in a text file

Am I missing something, or is this a duplicate posting?
# 3  
Old 05-23-2013
Indeed the question looks similar, but the input format in the 8th column differs from the previous thread. To avoid confusion i have posted it as a new thread.
# 4  
Old 05-23-2013
You can use previous thread solution to solve this Smilie
# 5  
Old 05-23-2013
The previous solution

Quote:
awk '{n=split ($8,TMP,";"); $8=""; for (i=1; i<=n; i++) if (match (TMP[i], /^DP=|^MQ=|^resource.EFF/)) {sub (/^.*=/,"",TMP[i]); $8=$8 ($8?"\t":"") TMP[i]}}1' FS="\t" file
gives the output as

Quote:
chr1 412573 . A C 2758.77 . 71 58.36 INTERGENIC(MODIFIER||||||||) GT:ADSmilieP:GQ:PL 1/1:0,71:71:99:2787,214,0

chr1 602567 rs21953190 A G 5481.77 . 152 59.09 SYNONYMOUS_CODING(LOW|SILENT|gaT/gaC|D1034|ADNP2|protein_coding|CODING|ENSCAFT00000000008|5) GT:ADSmilieP:GQ:PL 1/1:0,151:151:99:5510,430,0
Here i want to split further

Quote:
chr1 412573 . A C 2758.77 . 71 58.36 INTERGENIC(MODIFIER||||||||)
in first row into
Quote:
chr1 412573 . A C 2758.77 . 71 58.36 INTERGENIC MODIFIER
similary the data in the second row
Quote:
chr1 602567 rs21953190 A G 5481.77 . 152 59.09 SYNONYMOUS_CODING(LOW|SILENT|gaT/gaC|D1034|ADNP2|protein_coding|CODING|ENSCAFT00000000008|5) GT:ADSmilieP:GQ:PL 1/1:0,151:151:99:5510,430,0
into
Quote:
chr1 602567 rs21953190 A G 5481.77 . 152 59.09 SYNONYMOUS_CODING LOW SILENT gaT/gaC D1034 ADNP2 protein_coding CODING ENSCAFT00000000008 5 GT:ADSmilieP:GQ:PL 1/1:0,151:151:99:5510,430,0
which means i want to replace () and | with tabs.

Last edited by mehar; 05-23-2013 at 08:52 AM..
# 6  
Old 05-23-2013
If you like to do lots of modification, its better to split it up and do some in one run and some other things in another run.
This reduce the complexity of the program

Running this on output above
Code:
awk '/INT/ {split($10,t,"[(|]"); $10=t[1]" "t[2]}{gsub(/[|()]/," ")}1'

gives
Code:
chr1 412573 . A C 2758.77 . 71 58.36 INTERGENIC MODIFIER GT:ADP:GQ:PL 1/1:0,71:71:99:2787,214,0

chr1 602567 rs21953190 A G 5481.77 . 152        59.09    SYNONYMOUS_CODING LOW SILENT gaT/gaC D1034 ADNP2 protein_coding CODING ENSCAFT00000000008 5  GT:ADP:GQ:PL 1/1:0,151:151:99:5510,430,0


Last edited by Jotne; 05-23-2013 at 09:03 AM.. Reason: Fixed both lines
# 7  
Old 05-23-2013
This code is working only with first row. The second row remains the same. I want to replace the () and | in the 10th column in all the rows. i.e.
the data in the second row also should look like,
Quote:
chr1 602567 rs21953190 A G 5481.77 . 152 59.09 SYNONYMOUS_CODING LOW SILENT gaT/gaC D1034 ADNP2 protein_coding CODING ENSCAFT00000000008 5 GT:ADP:GQ:PL 1/1:0,151:151:99:5510,430,0

Last edited by mehar; 05-23-2013 at 09:56 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Modify text file if found multiple pattern match for every line.

Looking for help, i have input file like below and want to modify to expected output, if can without create additional file, hope can direct modify it. have 2 thing need do. 1st is adding a word (testplan generation off) after ! ! IPG: Tue Aug 07 14:31:17 2018 2nd is adding... (16 Replies)
Discussion started by: kttan
16 Replies

2. Shell Programming and Scripting

Modify text file using sed

Hello all, I have some text files I need to do the following on: Delete banner page (lines 1-56) --I am doing this using sed Remove ^M --I am doing this using vi Remove trailer page --this can vary based on the contents of the file, it usually starts with *************************** I am... (5 Replies)
Discussion started by: jeffs42885
5 Replies

3. Shell Programming and Scripting

Modify one line in a plain text file

Hi everyone, I want to know, if there is a way to modify one line in a text file with unix script, with out re-writing all the file. For example, i have this file: CONFIGURATION_1=XXXX CONFIGURATION_2=YYYY CONFIGURATION_3=ZZZZ supose i have a command or function "modify" that... (7 Replies)
Discussion started by: Xedrox
7 Replies

4. Shell Programming and Scripting

Modify the text file by script

Hi All the Helpers! I have a text file which looks like input.txt.I would request to please suggest me how can I make this file look like output.txt input.txt VOP 111 0 1 2 DEM 111 0 222 333 444 555 879 888 987 888 989 VOP 118 0... (2 Replies)
Discussion started by: Indra2011
2 Replies

5. Shell Programming and Scripting

Modify text file using shell script

Hi, I have a text file which is following format - COL VAL ABC 1 ABC 2 ABC 3 ABC 4 ABC 5 My requirement is to search for a particular value (provided by user) in the file and comment the previous entries including that as well. E.g. If I search for number 3, then the output... (6 Replies)
Discussion started by: bhupinder08
6 Replies

6. UNIX for Dummies Questions & Answers

Modify Text File

Hi, I would like to remove any lines from a text file that begin with #, or that are blank. How can I do that with BASH? Mike (3 Replies)
Discussion started by: msb65
3 Replies

7. Shell Programming and Scripting

Need help to modify perl script: Text file with line and more than 1 space

Dear Friends, I am beginner in Perl and trying to find the problem in a script. Kindly help me to modify the script. My script is not giving the output for the last field and followed text (LA: Language English). Input file & script as follows: Input file: Thu Mar 19 2:34:14 EDT 2009 STC... (3 Replies)
Discussion started by: srsahu75
3 Replies

8. Shell Programming and Scripting

Modify Specific Line of a Text File

Given a text file, how do you add a line of text after a specific line number? I believe I would want to use "sed" but I am unsure of the syntax. Thank you. Mike (5 Replies)
Discussion started by: msb65
5 Replies

9. Shell Programming and Scripting

Modify a text or xml file

Hi all, I want to write a shell which would allow me to edit a text file or a xml file. Basically i want to add a new node in a existing xml file. The values for this new node are based on user input. Thanks in advance Zing (9 Replies)
Discussion started by: zing
9 Replies

10. Shell Programming and Scripting

modify file using awk

I have a file, a.asc which is generated from a shell script: -----BEGIN PGP MESSAGE----- Version: PGP 6.5.8 qANQR1DBwE4DR5PN6zVjZTcQA/9z5Eg94cwYdTnC7v+JUegQuJwHripqnyjFrEs/ejzKYCNmngbHHmf8V4K3uFkYyp74aFf+CdymA030RKs6ewOwkmqRW19oIXCgVe8Qmfg+/2KTq8XN =0QSP -----END PGP MESSAGE----- I want... (12 Replies)
Discussion started by: nattynatty
12 Replies
Login or Register to Ask a Question