awk to parse field and include the text of 1 pipe in field 4


Login or Register to Reply

 
Thread Tools Search this Thread
# 1  
Old 11-07-2015
awk to parse field and include the text of 1 pipe in field 4

I am trying to parse the input in awk to include the |gc= in $4 but am not able to. The below is close:
awk so far:
Code:
awk '{sub(/\|[^[:blank:]]+[[:blank:]]+[0-9]+/, ""); print }' input.txt

Input
Code:
chr1    955543  955763  AGRN-6|pr=2|gc=75   0   + 
chr1    957571  957852  AGRN-7|pr=3|gc=61.2 0   + 
chr1    970621  970740  AGRN-8|pr=1|gc=57.1 0   +

Current Output
Code:
chr1    955543  955763  AGRN-6  + 
chr1    957571  957852  AGRN-7  + 
chr1    970621  970740  AGRN-8  +

Desired Output (each field separated by a tab)
Code:
chr1    955543  955763  AGRN-6|gc=75    + 
chr1    957571  957852  AGRN-7|gc=61.2  + 
chr1    970621  970740  AGRN-8|gc=57.1  +

# 2  
Old 11-07-2015
Code:
awk '{
          printf("%s\t%s\t%s\t%s\t%s\n", $1,$2,$3,$4,$6)
         }' oldfile >newfile

Just do not print column #5, assuming your examples for input are correct. You can also play with the awk OS variable to get tab separation.
This User Gave Thanks to jim mcnamara For This Post:
cmccabe (11-07-2015)
# 3  
Old 11-07-2015
That `awk` produces:

Code:
chr1    955543    955763    AGRN-6|pr=2|gc=75    +    
 
chr1    957571    957852    AGRN-7|pr=3|gc=61.2    +  
   
chr1    970621    970740    AGRN-8|pr=1|gc=57.1    +

The |pr=2, |pr=3, and pr=1 is not needed and there looks to be a line skipped each after each row and that will may be problematic for later analysis.

Thank you Smilie.
# 4  
Old 11-07-2015
Code:
awk '{n=split($4, a, "|"); print $1, $2, $3, a[1]"|"a[n], $6}' cmccabe.file

or:
Code:
awk '{n=split($4, a, "|"); print $1,$2,$3,a[1]"|"a[n],$6}' OFS="\t" cmccabe.file


Last edited by Aia; 11-07-2015 at 11:31 AM.. Reason: Add alternative tab output separator
This User Gave Thanks to Aia For This Post:
cmccabe (11-07-2015)
# 5  
Old 11-07-2015
I had something similar @Aia

Code:
awk '{split($4,a,"|"); print $1,$2,$3,a[1],"|",a[3],$6}' input
chr1 955543 955763 AGRN-6 | gc=75 + 
chr1 957571 957852 AGRN-7 | gc=61.2 + 
chr1 970621 970740 AGRN-8 | gc=57.1 +

but that outputs everything on one line. Your awkis much better, thank you Smilie.
This User Gave Thanks to cmccabe For This Post:
Aia (11-07-2015)
# 6  
Old 11-07-2015
Quote:
Originally Posted by cmccabe
I had something similar @Aia

Code:
awk '{split($4,a,"|"); print $1,$2,$3,a[1],"|",a[3],$6}' input
chr1 955543 955763 AGRN-6 | gc=75 + 
chr1 957571 957852 AGRN-7 | gc=61.2 + 
chr1 970621 970740 AGRN-8 | gc=57.1 +

but that outputs everything on one line. Your awkis much better, thank you Smilie.
Yes, those highlighted red commas get translated into OFS.

Here's a Perl alternative:
Code:
 perl -pe 's/(\|\w+=[\w\.]+){1,2}\s+\d+/$1/' cmccabe.file

# 7  
Old 11-07-2015
How about
Code:
awk '{sub ("\|.*\|", "|")}1' file
chr1    955543  955763  AGRN-6|gc=75   0   + 
chr1    957571  957852  AGRN-7|gc=61.2 0   + 
chr1    970621  970740  AGRN-8|gc=57.1 0   +

?
Login or Register to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
awk to create separate files but not include specific field in output cmccabe Shell Programming and Scripting 3 05-10-2018 07:33 AM
awk to update field using matching value in file1 and substring in field in file2 cmccabe Shell Programming and Scripting 2 06-18-2017 07:38 AM
awk to adjust coordinates in field based on sequential numbers in another field cmccabe Shell Programming and Scripting 3 01-30-2017 07:39 AM
How can awk ignore the field delimiter like comma inside a field? gopal.biswal Shell Programming and Scripting 6 11-29-2016 05:49 AM
awk Parse And Create Multiple Files Based on Field Value ec012 Shell Programming and Scripting 7 02-13-2015 09:41 AM
awk : Filter a set of data to parse header line and last field of multiple same match. rveri Shell Programming and Scripting 5 02-01-2013 12:50 AM
AWK: Pattern match between 2 files, then compare a field in file1 as > or < field in file2 right_coaster Shell Programming and Scripting 4 10-06-2011 06:07 PM
Awk Search text string in field, not all in field. rocket_dog Shell Programming and Scripting 3 09-12-2011 09:09 AM
how to parse with awk (using different fields), then group by a field? Josef_Stalin Shell Programming and Scripting 4 03-02-2011 08:37 PM
awk, comma as field separator and text inside double quotes as a field. kevintse Shell Programming and Scripting 8 11-15-2010 05:31 PM
How to pass a field from awk in a pipe? Kingsley Shell Programming and Scripting 9 08-17-2010 09:00 AM
Using AWK to parse a delimited field Fatbob Shell Programming and Scripting 13 06-18-2010 09:07 AM
Replacing a field in pipe delimited TEXT File ravi0435 UNIX for Dummies Questions & Answers 3 01-08-2009 01:54 PM
how to include field in the output filename of awk yahyaaa Shell Programming and Scripting 4 08-15-2008 12:10 PM
How to parse a text file with \034 as field and \035 as end of message delimiter? indianya Shell Programming and Scripting 1 08-26-2005 09:20 PM