Split column using awk in a text file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Split column using awk in a text file
# 8  
Old 05-23-2013
Code:
awk '{n=split ($8,TMP,";"); $8=""; for (i=1; i<=n; i++) if (match (TMP[i], /^DP=|^MQ=|^SNPEFF/)) {sub (/^.*=/,"",TMP[i]); $8=$8 ($8?"\t":"") TMP[i]}}1' FS="\t"

# 9  
Old 05-23-2013
Quote:
Originally Posted by mehar
. . .
Hi,

The original question is slightly modified with slightly different input in the 8th column. Could you help? Thanks in advance.
Can't see what has been modified. What's going wrong?

BTW - why do you need a one liner?
# 10  
Old 05-23-2013
Ubuntu

In the initial question, 8th column is of type

Code:
chr1	   602567	rs21953190	A	G	5481.77	.
AC=2;AF=1.00;AN=2;DB;DP=152;Dels=0.00;FS=0.000;HaplotypeScore=6.8385;MLEAC=2;MLEAF=1.00;MQ=59.09;MQ0=0;QD=36.06;SNPEFF_AMINO_ACID_CHANGE=D1034;SNPEFF_CODON_CHANGE=gaT/gaC;SNPEFF_EFFECT=SYNONYMOUS_CODING;SNPEFF_EXON_ID=5;SNPEFF_FUNCTIONAL_CLASS=SILENT;SNPEFF_GENE_BIOTYPE=protein_coding;SNPEFF_GENE_NAME=ADNP2;SNPEFF_IMPACT=LOW;SNPEFF_TRANSCRIPT_ID=ENSCAFT00000000008 1/1:0,151:151:99:5510,430,0

In the modified question, the data is like,
Quote:
chr1 602567 rs21953190 A G 5481.77 . AC=2;AF=1.00;AN=2;DB;DP=152;Dels=0.00;FS=0.000;HaplotypeScore=6.8385;MLEAC=2;MLEAF=1.00;MQ=59.09;MQ0 =0;QD=36.06;resource.EFF=SYNONYMOUS_CODING(LOW|SILENT|gaT/gaC|D1034|ADNP2|protein_coding|CODING|ENSCAFT00000000008|5) GT:ADSmilieP:GQ:PL 1/1:0,151:151:99:5510,430,0
---------- Post updated at 01:01 PM ---------- Previous update was at 12:27 PM ----------

Anyways i have modified the code and it works now. But i have a new problem,

Code:
chr1	    901534	rs21932296	   T	G	34.77	0/1:3,2:5:63:63,0,64	GATKSAM	5	55.21	INTRON	MODIFIER				CTDP1	protein_coding	CODING	ENSCAFT00000000012	11

If we observe after MODIFIER i have a series of empty tabs. When i am piping this input to awk to perform s0me other action with the command,
Code:
awk 'BEGIN{OFS="\t"}{split ($7,TMP,":"); $7= TMP[1]}1'

it is replacing multiple empty tabs into a single tab and gives the output like below:

Code:
chr1	    901534	rs21932296	  T	G	34.77	 0/1	GATKSAM	5	55.21	INTRON	MODIFIER	CTDP1	protein_coding	CODING	ENSCAFT00000000012	11


I don't want the multiple tabs to be replaced by single tab. Could you help where im going wrong?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Using awk to split a column into two columns

Hi, I am trying to split the following output into two columns, where each column has Source: Destination: OUTPUT TO FILTER $ tshark -r Capture_without_mtr.pcap -V | awk '/ (Source|Destination): /' | more Source: x.x.x.x Destination: x.x.x.x Source:... (2 Replies)
Discussion started by: sand1234
2 Replies

2. Shell Programming and Scripting

awk split columns to row after N number of column

I want to split this with every 5 or 50 depend on how much data the file will have. And remove the comma on the end Source file will have 001,0002,0003,004,005,0006,0007,007A,007B,007C,007E,007F,008A,008C Need Output from every 5 tab and remove the comma from end of each row ... (4 Replies)
Discussion started by: ranjancom2000
4 Replies

3. Shell Programming and Scripting

Awk: split column if special characters

Hi, I've data like these: Gene1,Gene2 snp1 Gene3 snp2 Gene4 snp3 I'd like to split line if comma and then print remaining information for the respective gene. My code: awk '{ if($1 ~ /,/){ n = split($0, t, ",") (7 Replies)
Discussion started by: genome
7 Replies

4. Shell Programming and Scripting

How to split a file into column with awk?

The following is my code nawk -F',' ' BEGIN { printf "MSISDN,IMSI,NAM,TS11,TS21,TS22,OBO,OBI,BAIC,BAOC,BOIC,BOIEXH,APNID0,APNID1,APNID2,APNID3,APNID0,CSP,RSA\n" } { for(i=1; i<=NF; i++) { split($i,a,":") gsub(" ","", a) printf "%s;",a } printf "\n" }'HLR_DUMP_BZV >> HLR_full This is... (1 Reply)
Discussion started by: gillesi
1 Replies

5. Shell Programming and Scripting

awk to sum a column based on duplicate strings in another column and show split totals

Hi, I have a similar input format- A_1 2 B_0 4 A_1 1 B_2 5 A_4 1 and looking to print in this output format with headers. can you suggest in awk?awk because i am doing some pattern matching from parent file to print column 1 of my input using awk already.Thanks! letter number_of_letters... (5 Replies)
Discussion started by: prashob123
5 Replies

6. Shell Programming and Scripting

Split text separated by ; in a column into multiple columns

Hi, I need help to split a long text in a column which is separated by ; and i need to print them out in multiple columns. My input file is tab-delimited and has 11 columns as below:- aRg02004 21452 asdfwf 21452 21452 4.6e-29 5e-29 -1 3 50 ffg|GGD|9009 14101.10 High class -node. ; ffg|GGD|969... (3 Replies)
Discussion started by: redse171
3 Replies

7. UNIX for Dummies Questions & Answers

Using awk to log transform a column in a tab-delimited text file?

How do I use awk to log transform the fifth column of a tab-delimited text file? Thanks! (1 Reply)
Discussion started by: evelibertine
1 Replies

8. Shell Programming and Scripting

using awk to substitute data in a column delimited text file

using awk to substitute data in a column delimited text file hello i would like to use awk to do the following calculation from the following snippet. input file C;2390 ;CV BOUILLOTTE 2L 2FACES NERVUREES ;1.00 ;3552612239004;13417 ;25 ;50 ; 12;50000 ; ; ... (3 Replies)
Discussion started by: iindie
3 Replies

9. Shell Programming and Scripting

How to split a fixed width text file into several ones based on a column value?

Hi, I have a fixed width text file without any header row. One of the columns contains a date in YYYYMMDD format. If the original file contains 3 dates, I want my shell script to split the file into 3 small files with data for each date. I am a newbie and need help doing this. (14 Replies)
Discussion started by: bhanja_trinanja
14 Replies

10. UNIX for Dummies Questions & Answers

Split a file with no pattern -- Split, Csplit, Awk

I have gone through all the threads in the forum and tested out different things. I am trying to split a 3GB file into multiple files. Some files are even larger than this. For example: split -l 3000000 filename.txt This is very slow and it splits the file with 3 million records in each... (10 Replies)
Discussion started by: madhunk
10 Replies
Login or Register to Ask a Question