Sponsored Content
Top Forums Shell Programming and Scripting awk to format file with conditional split Post 302999704 by cmccabe on Monday 26th of June 2017 03:17:19 PM
Old 06-26-2017
awk to format file with conditional split

In the awk below I am splitting $7 on the : (colon) then - (hyphen) as array a. The word chr is printed at the start of every $1 line.
Next, $4 is split on the > (greater then) as array b. I am not sure how to account for the two other possibilities in $4 so the correct output is printed. Every line in the input file with either be a del, an ins or have a > in it. The split of $4 captures the last condition, but I am not sure about the first two. I added comments to each line in the awk and hope it is close. Thank you Smilie.

The --- are not part of the file they are just there as examples.

Code:
1. if $4 has del then the using the del to the right (whatever letter(s), the format of that line is REF=C;OBS=      --- line 3 --- OBS is a null string
2. if $4 has ins then the using del the to the right (whatever letter(s), the format of that line is REF=;OBS=ACA      --- line 4 --- REF is a null string

file tab-delimited
Code:
Gene	Accession_number	COSMIC_id	CDS_mut_syntax	AA_mut_syntax	Strand	HG19_coordinates	Amplicon_id	Insert	Target_url
SMARCB1	NM_003073.2	1002	c.118C>T	p.R40*	+	22:24133967-24133967	CHP2_SMARCB1_1	CCTCCGTATGTTCCGAGGTTCTCTGTACAAGAGATACCCCTCACTCTGGAGGCGACTAGCCACTGTGGAAGAGAGGAAGAAAATAGTTGCATCGTCACATGGTAAAAAAAC	http://www.sanger.ac.uk/perl/genetics/CGP/cosmic?action=mut_summary&id=1002
RB1	NM_000321	1042	c.2107-2A>G	p.?	+	13:49037865-49037865	CHP2_RB1_9	TACATCAATTTATTTACTAGATTATGATGTGTTCCATGTATGGCATATGCAAAGTGAAGAATATAGACCTTAAATTCAAAATCATT	http://www.sanger.ac.uk/perl/genetics/CGP/cosmic?action=mut_summary&id=1042
SMARCB1	NM_003073.2	1057	c.1148delC	p.P383fs	+	22:24176357-24176357	CHP2_SMARCB1_4	CCTTGGGAAGGGCAGCGCCCAGGCTGGGAGCTGGCCCCGACTCATTGCCCTCCCCACTCCTCTTCCAGGCGGATGAGGCGTCTTGCCAACACGGCCCCGGCCTGGTAACCAGCCCATCAGCACACGGCTCCC	http://www.sanger.ac.uk/perl/genetics/CGP/cosmic?action=mut_summary&id=1057
BRAF	NM_004333	144982	c.1797_1798insACA	p.T599_V600insT	-	7:140453137-140453138	CHP2_BRAF_2	TCAAACTGATGGGACCCACTCCATCGAGATTTCACTGTAGCTAGACCAAAATCACCTATTTTTACTGTGAGGTCTTCATGAAGAAATATATCTGAGGTGTAGTAAGTAAAGGAAAACAG	http://www.sanger.ac.uk/perl/genetics/CGP/cosmic?action=mut_summary&id=144982

desired output tab-delimited
Code:
chr22	24133967	24133967	c.118C>T	REF=C;OBS=T	SMARCB1
chr13	49037865	49037865	c.2107-2A>G	 REF=A;OBS=G	RB1
chr22	24176357	24176357	c.1148delC	REF=C;OBS=	SMARCB1
chr7	140453138	140453138	c.1797_1798insACA	REF;OBS=TACTACG	BRAF

awk
Code:
awk 'BEGIN{ FS=OFS="\t" }NR>1{ split($7,a,":|-"); k="chr";  # define FS and OFS as tab, skip header, split $7 on : then - reading values in array a, set k equal to chr
          { split($4,b,">");                                # split $4 on > reading values into array b
            printf(k a[1],a[2],a[3],$4,"REF="b[1],"";"OBS="b[2],$1) }' file  # print desired output


Last edited by cmccabe; 06-26-2017 at 04:20 PM.. Reason: added details
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

conditional split

Hi, Can someone let me know how I can split a record when it contains a vairable length of fields. Line1 field1,field101,field102,field 103,field104 Line 2 field1,field101,field102,field 103,field104,field201,field202,field 203,field204 Line 3 field1,field101,field102,field... (5 Replies)
Discussion started by: braindrain
5 Replies

2. UNIX for Dummies Questions & Answers

Split a file with no pattern -- Split, Csplit, Awk

I have gone through all the threads in the forum and tested out different things. I am trying to split a 3GB file into multiple files. Some files are even larger than this. For example: split -l 3000000 filename.txt This is very slow and it splits the file with 3 million records in each... (10 Replies)
Discussion started by: madhunk
10 Replies

3. Shell Programming and Scripting

AWK CSV to TXT format, TXT file not in a correct column format

HI guys, I have created a script to read 1 column in a csv file and then place it in text file. However, when i checked out the text file, it is not in a column format... Example: CSV file contains name,age aa,11 bb,22 cc,33 After using awk to get first column TXT file... (1 Reply)
Discussion started by: mdap
1 Replies

4. Shell Programming and Scripting

Split variable length and variable format CSV file

Dear all, I have basic knowledge of Unix script and her I am trying to process variable length and variable format CSV file. The file length will depend on the numbers of Earnings/Deductions/Direct Deposits. And The format will depend on whether it is Earnings/Deductions or Direct Deposits... (2 Replies)
Discussion started by: chechun
2 Replies

5. Shell Programming and Scripting

Need to split a xml file in proper format

Hi, I have a file which has xml data but all in single line Ex - <?xml version="1.0"?><User><Name>Robert</Name><Location>California</Location><Occupation>Programmer</Occupation></User> I want to split the data in proper xml format Ex- <?xml version="1.0"?> <User> <Name>Robert</Name>... (6 Replies)
Discussion started by: avishek007
6 Replies

6. Shell Programming and Scripting

Split File by Pattern with File Names in Source File... Awk?

Hi all, I'm pretty new to Shell scripting and I need some help to split a source text file into multiple files. The source has a row with pattern where the file needs to be split, and the pattern row also contains the file name of the destination for that specific piece. Here is an example: ... (2 Replies)
Discussion started by: cul8er
2 Replies

7. Shell Programming and Scripting

How to Split a source file in specified format?

Requirement: Need to split a source file say a1.txt which can be of size upto 150 MB into 25 target files each with a max size of 25 MB along with the header line in each target file. NOTE: Few target files can be empty also ,but 25 files must be generated for 1 source file( I can expect upto... (4 Replies)
Discussion started by: mad_man12
4 Replies

8. Shell Programming and Scripting

awk - read from a file and write conditional output

I have a file, which has '|' as separator; I need to read each line from that file and produce output to another file. While reading, I have certain condition on few specific columns (like column3 ='good'); only those lines will be processed. (3 Replies)
Discussion started by: mady135
3 Replies

9. UNIX for Beginners Questions & Answers

Conditional Split

Greetings, I need help in splitting the files in an efficient way while accommodating the below requirements . I am on AIX. Split condition Split the file based on the record type and the position of the data pattern that appears on the on the record type. Both record type and and the... (9 Replies)
Discussion started by: techedipro
9 Replies

10. Shell Programming and Scripting

awk conditional operators- lookup value in 2nd file

I use conditional operators alot in AWK to print rows from large text files based on values in a certain column. For example: awk -F '\t' '{ if ($1 == "A" || $1 == "C" ) print $0}' OFS="\t" file1.txt > file2.txt In this case every row is printed from file1 to file2 for which the column 1... (5 Replies)
Discussion started by: Geneanalyst
5 Replies
All times are GMT -4. The time now is 11:30 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy