awk to reformat lines based on condition


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to reformat lines based on condition
# 1  
Old 05-31-2018
awk to reformat lines based on condition

The awk below uses the tab-delimeted fileand reformats each line based on one of three conditions (rules). The 3 rules are for deletion (lines in blue), snv (line in red), and insertion (lines in green). I have included all possible combinations of lines from my actual data, which is very large. The awk includes comments but does nt produce the desired output. I think my thinking is correct but maybe I am missing something or have not included something. Thank you Smilie.


file tab-delimeted
Code:
id1	1	101702547     AG	A
id2	15	48782104     G	C
id3	1	116268178     GAAA	G
id4	1	116268178     GAAA	GAAAA
id5	2	228197304     A	AATCC

current output
Code:
id1	1	101702548	101702547	-
id3	1	116268179	116268178	-

desired output tab-delimeted
Code:
id1	1	101702548	101702549     G	-
id2	15	48782104	48782104     G	C
id3	1	116268179	116268182     AAA	-
id4	1	116268179	116268179     -	A
id5	2	228197305	228197305     -	TCC

rules
Code:
line1: since length of $5 is greater then the length of $6 the matching value in $5 and $6  is remove and a - is placed in $6 the value in $3 has 1 added to it and the length of $5 is added to $3 and copied to $4 (condition 1)

line2: since length of $5 and length of $6 are equal to 1 the value in $3 is duplicated or copied in front of $4 (condition 2)

line3: since length of $5 is greater then the length of $6 the matching value in $5 is removed from $5 and $5 and a - is placed in $5 the value in $3 has 1 added to it and the length of $4 is added to $3 and copied in front of $4 (condition 1)

line4: since length of $4 is less then the length of $5 the matching value(s) from $4 and $5 are removed in $6 and a - is placed in $4 the value in $3 has 1 added to it and $3 and copied in front of $4 (condition 3)

line5: since length of $5 is less then the length of $6 the matching value(s) from $5 are removed in $5 and $6 and a - is placed in $4 the value in $3 has 1 added to it and $3 and copied in front of $4 (condition 3)

awk
Code:
awk 'BEGIN{FS=OFS="\t"}  # define fs and output
     FNR==NR{ # process each field in each line of file
     if(length($5) > length($6)) {  # condition 1 for deletion
        gsub($5,"",$6)       # removing matching
           print $1,$2,$3+1,$3+length($4),"-"  # print desired output
           next
  }
     if(length($5) == length($6)) {  # condition 2 for snv   
        print $1,$2,$3,$3,$5,$6  # print desired output
        next
  }
     if(length($5) < length($6)) {  # condition 3 for insertion
        gsub($5,"",$6)       # removing matching
           print $1,$2,$3+1,"-",$3+1  # print desired output
  }
}' file


Last edited by cmccabe; 05-31-2018 at 12:08 PM.. Reason: fixed format, added current output
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with awk color codes based on condition

HI i have two files say test and test1 Test.txt Code: Lun01 2TB 1.99TB 99.6% Lun02 2TB 1.99TB 99.5% Lun03 2TB 1.99TB 99.5% Lun04 2TB 1.55TB 89.6% Code: Test1.txt Lun01 2TB 1.99TB 89.5% Lun02 2TB 1.99TB 99.5% Lun03 2TB 1.99TB 99.5% Requirement is to compare... (6 Replies)
Discussion started by: venkitesh
6 Replies

2. Shell Programming and Scripting

Print lines based on line number and specified condition

Hi, I have a file like below. 1,2,3,4,5,6,7,8,9I would like to print or copied to a file based of line count in perl If I gave a condition 1 to 3 then it should iterate over above file and print 1 to 3 and then again 1 to 3 etc. output should be 1,2,3 4,5,6 7,8,9 (10 Replies)
Discussion started by: Anjan1
10 Replies

3. Shell Programming and Scripting

Delete lines from file based on condition

I want to keep last 2 days data from a file and want to delete others data from the file. Please help me. Sample Input # cat messages-2 Apr 15 11:25:03 test1 kernel: imklog 4.6.2, log source = /proc/kmsg started. Apr 15 11:25:03 test1 rsyslogd: (re)start Apr 16 19:42:03 test1 kernel:... (2 Replies)
Discussion started by: makauser
2 Replies

4. Shell Programming and Scripting

Print certain lines based on condition

Hi All, I have following listing Filesystem GB blocks Free Used Iused Iused Mounted on /dev/hd2 4.00 0.31 93 63080 43 /usr Filesystem GB blocks Free Used Iused Iused Mounted on Filesystem GB blocks Free Used Iused Iused... (11 Replies)
Discussion started by: ckwan
11 Replies

5. Shell Programming and Scripting

Deleting lines based on a condition for a group of files

hi i have a set of similar files. i want to delete lines until certain pattern appears in those files. for a single file the following command can be used but i want to do it for all the files at a time since the number is in thousands. awk '/PATTERN/{i++}i' file (6 Replies)
Discussion started by: anurupa777
6 Replies

6. Shell Programming and Scripting

extracting lines based on condition and copy to another file

hi i have an input file that contains some thing like this aaa acc aa abc1 1232 aaa abc2.... poo awq aa abc1 aaa aaa abc2 bbb bcc bb abc1 3214 bbb abc3.... bab bbc bz abc1 3214 bbb abc3.... vvv ssa as abc1 o09 aaa abc4.... azx aaq aa abc1 900 aqq abc19.... aaa aa aaaa abc1 899 aa... (8 Replies)
Discussion started by: anurupa777
8 Replies

7. Shell Programming and Scripting

compare 2 files and return unique lines in each file (based on condition)

hi my problem is little complicated one. i have 2 files which appear like this file 1 abbsss:aa:22:34:as akl abc 1234 mkilll:as:ss:23:qs asc abc 0987 mlopii:cd:wq:24:as asd abc 7866 file2 lkoaa:as:24:32:sa alk abc 3245 lkmo:as:34:43:qs qsa abc 0987 kloia:ds:45:56:sa acq abc 7805 i... (5 Replies)
Discussion started by: anurupa777
5 Replies

8. Shell Programming and Scripting

Remove lines from XML based on condition

Hi, I need to remove some lines from an XML file is the value within a tag is empty. Imagine this scenario, <acd><acdID>2</acdID><logon></logon></acd> <acd><acdID></acdID><logon></logon></acd> <acd><acdID></acdID><logon></logon></acd> <acd><acdID></acdID><logon></logon></acd> I... (3 Replies)
Discussion started by: giles.cardew
3 Replies

9. Shell Programming and Scripting

awk to print lines based on string match on another line and condition

Hi folks, I have a text file that I need to parse, and I cant figure it out. The source is a report breaking down softwares from various companies with some basic info about them (see source snippet below). Ultimately what I want is an excel sheet with only Adobe and Microsoft software name and... (5 Replies)
Discussion started by: rowie718
5 Replies

10. Shell Programming and Scripting

searching and storing unknown number of lines based on the string with a condition

Dear friends, Please help me to resolve the problem below, I have a file with following content: date of file creation : 12 feb 2007 ==================== = name : suresh = city :mumbai #this is a blank line = date : 1st Nov 2005 ==================== few lines of some text this... (7 Replies)
Discussion started by: swamymns
7 Replies
Login or Register to Ask a Question