awk to parse field and include the text of 1 pipe in field 4


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to parse field and include the text of 1 pipe in field 4
# 1  
Old 11-07-2015
awk to parse field and include the text of 1 pipe in field 4

I am trying to parse the input in awk to include the |gc= in $4 but am not able to. The below is close:
awk so far:
Code:
awk '{sub(/\|[^[:blank:]]+[[:blank:]]+[0-9]+/, ""); print }' input.txt

Input
Code:
chr1    955543  955763  AGRN-6|pr=2|gc=75   0   + 
chr1    957571  957852  AGRN-7|pr=3|gc=61.2 0   + 
chr1    970621  970740  AGRN-8|pr=1|gc=57.1 0   +

Current Output
Code:
chr1    955543  955763  AGRN-6  + 
chr1    957571  957852  AGRN-7  + 
chr1    970621  970740  AGRN-8  +

Desired Output (each field separated by a tab)
Code:
chr1    955543  955763  AGRN-6|gc=75    + 
chr1    957571  957852  AGRN-7|gc=61.2  + 
chr1    970621  970740  AGRN-8|gc=57.1  +

# 2  
Old 11-07-2015
Code:
awk '{
          printf("%s\t%s\t%s\t%s\t%s\n", $1,$2,$3,$4,$6)
         }' oldfile >newfile

Just do not print column #5, assuming your examples for input are correct. You can also play with the awk OS variable to get tab separation.
This User Gave Thanks to jim mcnamara For This Post:
# 3  
Old 11-07-2015
That `awk` produces:

Code:
chr1    955543    955763    AGRN-6|pr=2|gc=75    +    
 
chr1    957571    957852    AGRN-7|pr=3|gc=61.2    +  
   
chr1    970621    970740    AGRN-8|pr=1|gc=57.1    +

The |pr=2, |pr=3, and pr=1 is not needed and there looks to be a line skipped each after each row and that will may be problematic for later analysis.

Thank you Smilie.
# 4  
Old 11-07-2015
Code:
awk '{n=split($4, a, "|"); print $1, $2, $3, a[1]"|"a[n], $6}' cmccabe.file

or:
Code:
awk '{n=split($4, a, "|"); print $1,$2,$3,a[1]"|"a[n],$6}' OFS="\t" cmccabe.file


Last edited by Aia; 11-07-2015 at 12:31 PM.. Reason: Add alternative tab output separator
This User Gave Thanks to Aia For This Post:
# 5  
Old 11-07-2015
I had something similar @Aia

Code:
awk '{split($4,a,"|"); print $1,$2,$3,a[1],"|",a[3],$6}' input
chr1 955543 955763 AGRN-6 | gc=75 + 
chr1 957571 957852 AGRN-7 | gc=61.2 + 
chr1 970621 970740 AGRN-8 | gc=57.1 +

but that outputs everything on one line. Your awkis much better, thank you Smilie.
This User Gave Thanks to cmccabe For This Post:
# 6  
Old 11-07-2015
Quote:
Originally Posted by cmccabe
I had something similar @Aia

Code:
awk '{split($4,a,"|"); print $1,$2,$3,a[1],"|",a[3],$6}' input
chr1 955543 955763 AGRN-6 | gc=75 + 
chr1 957571 957852 AGRN-7 | gc=61.2 + 
chr1 970621 970740 AGRN-8 | gc=57.1 +

but that outputs everything on one line. Your awkis much better, thank you Smilie.
Yes, those highlighted red commas get translated into OFS.

Here's a Perl alternative:
Code:
 perl -pe 's/(\|\w+=[\w\.]+){1,2}\s+\d+/$1/' cmccabe.file

# 7  
Old 11-07-2015
How about
Code:
awk '{sub ("\|.*\|", "|")}1' file
chr1    955543  955763  AGRN-6|gc=75   0   + 
chr1    957571  957852  AGRN-7|gc=61.2 0   + 
chr1    970621  970740  AGRN-8|gc=57.1 0   +

?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to create separate files but not include specific field in output

I am trying to use awk to create (in this example) 3 seperate text file from the unique id in $1 in file, if it starts with the pattern aa. The contents of each row is used to populate each text file except for $1 which is not needed. It seems I am close but not quite get there. Thank you :). ... (3 Replies)
Discussion started by: cmccabe
3 Replies

2. Shell Programming and Scripting

awk Parse And Create Multiple Files Based on Field Value

Hello: I am working parsing a large input file which will be broken down into multiples based on the second field in the file, in this case: STORE. The idea is to create each file with the corresponding store number, for example: Report_$STORENUM_$DATETIMESTAMP , and obtaining the... (7 Replies)
Discussion started by: ec012
7 Replies

3. Shell Programming and Scripting

Awk Search text string in field, not all in field.

Hello, I am using awk to match text in a tab separated field and am able to do so when matching the exact word. My problem is that I would like to match any sequence of text in the tab-separated field without having to match it all. Any help will be appreciated. Please see the code below. awk... (3 Replies)
Discussion started by: rocket_dog
3 Replies

4. Shell Programming and Scripting

how to parse with awk (using different fields), then group by a field?

When parsing multiple fields in a file using AWK, how do you group by one of the fields and parse by delimiters? to clarify If a file had tom | 223-2222-4444 , randofield ivan | 123-2422-4444 , random filed ... | and , are the delimiters ... How would you group by the social security... (4 Replies)
Discussion started by: Josef_Stalin
4 Replies

5. Shell Programming and Scripting

awk, comma as field separator and text inside double quotes as a field.

Hi, all I need to get fields in a line that are separated by commas, some of the fields are enclosed with double quotes, and they are supposed to be treated as a single field even if there are commas inside the quotes. sample input: for this line, 5 fields are supposed to be extracted, they... (8 Replies)
Discussion started by: kevintse
8 Replies

6. Shell Programming and Scripting

How to pass a field from awk in a pipe?

Thanks in advance : ) I try for a long time searching for a way to split a large gzip csv file into many gzip files (except for the last sub-file which is to joint the next big file's children.) All the subfiles are to be named by the field. But I only managed to split them into the... (9 Replies)
Discussion started by: Kingsley
9 Replies

7. Shell Programming and Scripting

Using AWK to parse a delimited field

Hi everyone! How can I parse a delimited field using AWK? For example, if I have lastName#firstName or lastName*firstName. I'd like an AWK script that would return lastName and then another that would return firstName? Is this possible? (13 Replies)
Discussion started by: Fatbob
13 Replies

8. UNIX for Dummies Questions & Answers

Replacing a field in pipe delimited TEXT File

Hi, I want to replace a field in a text delimited file with the actual number of records in the same file. HDR|ABCD|10-13-2008 to 10-19-2008.txt|10-19-2008|XYZ DTL|0|5464-1|0|02-02-2008|02-03-2008||||F||||||||| DTL|1|5464-1|1|02-02-2008|02-03-2008|1||JJJ... (3 Replies)
Discussion started by: ravi0435
3 Replies

9. Shell Programming and Scripting

how to include field in the output filename of awk

Im using awk and I want the output filename to contain the first field of the input file. Ex. 1 dddd wwwww 1 eeeee wwww 1 wwww eerrrr 2 eeee eeeeee I want the output files to be xxx1 and xxx2 Thank you (4 Replies)
Discussion started by: yahyaaa
4 Replies

10. Shell Programming and Scripting

How to parse a text file with \034 as field and \035 as end of message delimiter?

I need some tips to write a unix korn shell script that will parse an input text file. Input text file has messages that span several lines, each field in the message is delimited by /034 and the end of message is delimited by /035. Input file looks something similar to ... (1 Reply)
Discussion started by: indianya
1 Replies
Login or Register to Ask a Question