Parse two patterns and print next few characters following the pattern


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Parse two patterns and print next few characters following the pattern
# 1  
Old 01-31-2012
Question Parse two patterns and print next few characters following the pattern

Hi all,

I have many large files with data like following in each line:
Code:
1    822381    rs116091741    C    T    .    PASS    ASP;G5;G5A;GMAF=0.014308426073132;KGPilot123;RSPOS=822381;SAO=0;

I want output like this:
rs116091741 0.014308426073132

I tried some of the commands unsuccessfully, for example:

Code:
sed -n 's/.*rs//p' dbsnp_132.b37.vcf

I am very much new to linux and learning basics so please forgive me if it seems to be simple question....but I am not able to understand how to parse both the pattern and print characters following 'rs' including 'rs' itself and print values followed by 'GMAF=' excluding GMAF= itself.
# 2  
Old 01-31-2012
Try:
Code:
perl -lane '/GMAF=([^;]+)/;print "$F[2] $1"' dbsnp_132.b37.vcf

This User Gave Thanks to bartus11 For This Post:
# 3  
Old 01-31-2012
MySQL

Quote:
Originally Posted by bartus11
Try:
Code:
perl -lane '/GMAF=([^;]+)/;print "$F[2] $1"' dbsnp_132.b37.vcf

Thanks bartus11,

your code worked like a charm...

it will be really helpful if you can explain the code a little bit...I am learning languages and would like to know more...
# 4  
Old 01-31-2012
perl -lautomatically add newline to the print commands (among other things)
asplit every line ($_) into @F array, using one or more whitespace characters (space, tab) as delimiter
nload each line of the file into $_ for further processing, don't print the line at the end of the procesing (-p is working similarly, but it prints the line)
eexecute code that follows
'/GMAF=([^;]+)/;match string that is between "GMAF=" and ";" into variable "$1"
print "$F[2] $1"'print 3rd field from each line (see -a option), followed by string matched in the previous regular expression
This User Gave Thanks to bartus11 For This Post:
# 5  
Old 01-31-2012
MySQL

Quote:
Originally Posted by bartus11
perl -lautomatically add newline to the print commands (among other things)
asplit every line ($_) into @F array, using one or more whitespace characters (space, tab) as delimiter
nload each line of the file into $_ for further processing, don't print the line at the end of the procesing (-p is working similarly, but it prints the line)
eexecute code that follows
'/GMAF=([^;]+)/;match string that is between "GMAF=" and ";" into variable "$1"
print "$F[2] $1"'print 3rd field from each line (see -a option), followed by string matched in the previous regular expression
Thank you very much bartus11....that is really very helpful...
can you also suggest me how to learn such PERL?? Any book or anything else which starts at very basic and takes to this level??
# 6  
Old 01-31-2012
I started with that book: Learning Perl, 3rd Edition-O'Reilly Media
This User Gave Thanks to bartus11 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Egrep patterns in a file and limit number of matches to print for each pattern match

Hi I need to egrep patterns in a file and limit number of matches to print for each matched pattern. -m10 option is not working out in my sun solaris 5.10 Please guide me the options to achieve. if i do head -10 , i wont be getting all pattern match results as output since for a... (10 Replies)
Discussion started by: ananan
10 Replies

2. Shell Programming and Scripting

sed -- Find pattern -- print remainder -- plus lines up to pattern -- Minus pattern

The intended result should be : PDF converters 'empty line' gpdftext and pdftotext?xml version="1.0"?> xml:space="preserve"><note-content version="0.1" xmlns:/tomboy/link" xmlns:size="http://beatniksoftware.com/tomboy/size">PDF converters gpdftext and pdftotext</note-content>... (9 Replies)
Discussion started by: Klasform
9 Replies

3. Shell Programming and Scripting

Find matched patterns and print them with other patterns not the whole line

Hi, I am trying to extract some patterns from a line. The input file is space delimited and i could not use column to get value after "IN" or "OUT" patterns as there could be multiple white spaces before the next digits that i need to print in the output file . I need to print 3 patterns in a... (3 Replies)
Discussion started by: redse171
3 Replies

4. Shell Programming and Scripting

Print line between two patterns when a certain pattern matched

Hello Friends, I need to print lines in between two string when a keyword existed in those lines (keywords like exception, error, failed, not started etc). for example, input: .. Begin Edr ab12 ac13 ad14 bc23 exception occured bd24 cd34 dd44 ee55 ff66 End Edr (2 Replies)
Discussion started by: EAGL€
2 Replies

5. UNIX for Dummies Questions & Answers

Match Pattern after certain pattern and Print words next to Pattern

Hi experts , im new to Unix,AWK ,and im just not able to get this right. I need to match for some patterns if it matches I need to print the next few words to it.. I have only three such conditions to match… But I need to print only those words that comes after satisfying the first condition..... (2 Replies)
Discussion started by: 100bees
2 Replies

6. Shell Programming and Scripting

Need to print between patterns AND a few lines before

I need to print out sections (varying numbers of lines) of a file between patterns. That alone is easy enough: sed -n '/START/,/STOP/' I also need the 3 lines BEFORE the start pattern. That alone is easy enough: grep -B3 START But I can't seem to combine the two so that I get everything between the... (2 Replies)
Discussion started by: Finja
2 Replies

7. Shell Programming and Scripting

To print certain patterns in a column

Hi, From my input files, I want to print $1, $2 and only certain pattern in $4 (EC). I use this code but it print all the words in $4 awk -F"\t" '$4 {print $1,$2,$4}I just want EC follows by the numbers in $4 The input file as follows:- Entry Entry name Status Names Q01284 ... (7 Replies)
Discussion started by: redse171
7 Replies

8. Shell Programming and Scripting

Print characters till the next space when the pattern is found

i have a file which contains alphanumeric data in every line. what i need is the data after certain pattern. the data after the pattern is not of fixed length so i need the data till the space after the pattern. Input file: bfdkasfbdfg khffkf lkdhfhdf pattern (datarequired data not required)... (2 Replies)
Discussion started by: gpk_newbie
2 Replies

9. Shell Programming and Scripting

xmlstarlet parse non en_US characters

I'm parsing around 600K xml files, with roughly 1500 lines of text in each, some of the lines include Chinese, Russian, whatever language, with a bash script that uses cat $i | xmlstarlet sel -t -m "//section1/section2/section3/section4/section5" -v "@VALUE" -n > somefile which works, but I... (15 Replies)
Discussion started by: unclecameron
15 Replies

10. Shell Programming and Scripting

print range between two patterns if it contains a pattern within the range

I want to print between the range two patterns if a particular pattern is present in between the two patterns. I am new to Unix. Any help would be greatly appreciated. e.g. Pattern1 Bombay Calcutta Delhi Pattern2 Pattern1 Patna Madras Gwalior Delhi Pattern2 Pattern1... (2 Replies)
Discussion started by: joyan321
2 Replies
Login or Register to Ask a Question