Parse file for fields and specific text


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Parse file for fields and specific text
# 8  
Old 09-17-2015
That code works great.
If I was trying to get the output to look like:
Code:
 chr1    11868    12227    DDX11L1:1 
chr1    12009    12057    DDX11L1:1 
chr1    12178    12227    DDX11L1:2

Basically, just gene:exon in field 4

Code:
 
cmccabe@DTV-A5211QLM:~/Desktop/NGS$ awk '
>         {match ($0, /gene_name [^ ]*/)
>          T1=substr ($0, RSTART+11, RLENGTH-13)
>          match ($0, /exon_number [^ ]*/)
>          T2=substr ($0, RSTART+11, RLENGTH-12)
>          print $1, $2, $3, T1:T2
>         }
> ' FS="\t" OFS="\t" /home/cmccabe/Desktop/NGS/bed/gencode.exons.bed > /home/cmccabe/Desktop/NGS/bed/parse2_gencode.bed

awk: line 6: syntax error at or near :
[/CODE] Thank you Smilie

Last edited by cmccabe; 09-17-2015 at 02:01 PM.. Reason: fixed formatting
# 9  
Old 09-17-2015
T1:T2 is not valid awk syntax. Did you mean T1":"T2 ?
This User Gave Thanks to Corona688 For This Post:
# 10  
Old 09-17-2015
Use quotes: T1":"T2
This User Gave Thanks to RudiC For This Post:
# 11  
Old 09-17-2015
Thank you. I modified (very little) the awk to the below:

Code:
cmccabe@DTV-A5211QLM:~/Desktop/bed$ awk '
>         {match ($0, /gene_name [^ ]*/)
>          T1=substr ($0, RSTART+11, RLENGTH-13)
>          match ($0, /exon_number [^ ]*/)
>          T2=substr ($0, RSTART+11, RLENGTH-12)
>          print $1, $2, $3, T1":""exon"T2
>         }
> ' FS="\t" OFS="\t" /home/cmccabe/Desktop/NGS/bed/gencode.exons.bed > /home/cmccabe/Desktop/NGS/bed/parse2_gencode.bed

There appears to be a space between exon and 1 that may cause an issue later on. I'm not sure why the space is there or how to remove it? Thank you.

parse2.txt
Code:
chr1    11868    12227    DDX11L1:exon 1
chr1    11871    12227    DDX11L1:exon 1
chr1    11873    12227    DDX11L1:exon 1

# 12  
Old 09-17-2015
I suspect the space is actually part of T2.

This regex /exon_number [^ ]*/ I think should be /exon_number *[^ ]*/ in case there's multi spaces
# 13  
Old 09-17-2015
You'd need to count the spaces, then, as they would increase the RSTART+X value.
# 14  
Old 09-18-2015
The RSTART+X removed the space but now the output looks like:

Code:
chr1    11868    12227    DDX11L1:exonex
chr1    11871    12227    DDX11L1:exonex
chr1    11873    12227    DDX11L1:exonex

chould be:

Code:
chr1    11868    12227    DDX11L1:exon1
chr1    11871    12227    DDX11L1:exon1
chr1    11873    12227    DDX11L1:exon1

Thank you Smilie.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Script to parse and compare information in two fields of file

Hello, I am working parsing a large input file1(field CFA) I have to compare the the file1 field(CFA byte 88-96) with the content of the file2(It contains only one field) and and insert rows equal in another file. Here is my code and sample input file: ... (7 Replies)
Discussion started by: GERMANOS
7 Replies

2. Shell Programming and Scripting

Replacing entire fields with specific text at end or beginning of field

Greetings. I've got a csv file with data along these lines: Spumoni's Pizza Place, Placemats n Things, Just Lamps Counterfeit Dollars by Vinnie, Just Shades, Dollar StoreI want to replace the entire comma-delimited field if it matches something ending in "Place" or beginning with "Dollar",... (2 Replies)
Discussion started by: palmfrond
2 Replies

3. Shell Programming and Scripting

awk script to parse case with information in two fields of file

The below awk parser works for most data inputs, but I am having trouble with the last one. The problem is in the below rules steps 1 and 2 come from $2 (NC_000013.10:g.20763686_20763687delinsA) and steps 3 and 4 come from $1 (NM_004004.5:c.34_35delGGinsT). Parse Rules: The header is... (0 Replies)
Discussion started by: cmccabe
0 Replies

4. Shell Programming and Scripting

Parse text file using specific tags

awk -F "" '/<href=>|<href=>|<top>|<top>/ {print $3, OFS=\t}' source.txt > output.txt I'm not quite sure how to parse the attached file, but what I am trying to do is in a output file have the link (href=), name (after the <), and count (<top>) in 3 separate columns. My attempt is the above... (2 Replies)
Discussion started by: cmccabe
2 Replies

5. Shell Programming and Scripting

Extract specific line in an html file starting and ending with specific pattern to a text file

Hi This is my first post and I'm just a beginner. So please be nice to me. I have a couple of html files where a pattern beginning with "http://www.site.com" and ending with "/resource.dat" is present on every 241st line. How do I extract this to a new text file? I have tried sed -n 241,241p... (13 Replies)
Discussion started by: dejavo
13 Replies

6. Shell Programming and Scripting

Capture specific fields in file

Dear Friends, I have a file a.txt 1|3478.12|487|4578.04|4505.5478|rhfj|rehtire|rhj I want to get the field numbers which have decimal values output: Fields: 2,4,5 Plz help (6 Replies)
Discussion started by: i150371485
6 Replies

7. Shell Programming and Scripting

Perl: Parse Hex file into fields

Hi, I want to split/parse certain bits of the hex data into another field. Example: Input data is Word1: 4f72abfd Output: Parse bits (5 to 0) into field word1data1=0x00cd=205 decimal Parse bits (7 to 6) into field word1data2=0x000c=12 decimal etc. Word2: efff3d02 Parse bits (13 to... (1 Reply)
Discussion started by: morrbie
1 Replies

8. Shell Programming and Scripting

Assigning a specific format to a specific column in a text file using awk and printf

Hi, I have the following text file: 8 T1mapping_flip02 ok 128 108 30 1 665000-000008-000001.dcm 9 T1mapping_flip05 ok 128 108 30 1 665000-000009-000001.dcm 10 T1mapping_flip10 ok 128 108 30 1 665000-000010-000001.dcm 11 T1mapping_flip15 ok 128 108 30... (2 Replies)
Discussion started by: goodbenito
2 Replies

9. Shell Programming and Scripting

How to read and parse the content of csv file containing # as delimeter into fields using Bash?

#!/bin/bash i=0 cat 1.csv | while read fileline do echo "$fileline" IFS="#" flds=( $fileline ) nrofflds=${#flds} echo "noof fields$nrofflds" fld=0 while do echo "noof counter$fld" echo "$nrofflds" #fld1="${flds}" trying to store the content of line to fields but i... (4 Replies)
Discussion started by: barani75
4 Replies

10. UNIX for Dummies Questions & Answers

How to parse the specific data from the file

Hi, I need to parse this data FastEthernet0/9,|FastEthernet0/10,|FastEthernet0/11,FastEthernet0/13|, FastEthernet0/12,FastEthernet0/24 . and get only the value like e.g 0/24,0/11. how to do this in shell script. Thanks in Advance. (2 Replies)
Discussion started by: MuthuAlagappan
2 Replies
Login or Register to Ask a Question