awk regular expression


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk regular expression
# 1  
Old 03-24-2013
awk regular expression

Hello,

I have big files which I wanna filter them based on first column.

first column should be one of these strings: chr2L || chr2R || chr3L || chr3R || chr4 || chrX
and something like chr2Lh or chrY or chrM3L is not accepted.

I used the following command:
Code:
 awk '{ if ($1=="chr2L" || $1=="chr2R" || $1=="chr3L" || $1=="chr3R" || $1== "chr4" || $1=="chrX") print $0 }' input | sort -k1,1 -k2,2n > output

Is there easier way to do it? or is it true anyway?!!
And BTW, I wanna sort based on two column which the first is string and second is number and you see what I used in the code above. Please let me know if it's true also.

Thanks in advance.
# 2  
Old 03-24-2013
You could do:
Code:
awk '$1 ~ /^chr[2-4]*[LRX]*$/' input | sort -k1,1 -k2,2n

EDIT: But If you want to match strictly only chr2L || chr2R || chr3L || chr3R || chr4 || chrX, then above regexp is wrong.

It will match other combinations as well. I think what you did is correct.

Last edited by Yoda; 03-24-2013 at 11:40 AM..
This User Gave Thanks to Yoda For This Post:
# 3  
Old 03-24-2013
Quote:
Originally Posted by Yoda
You could do:
Code:
awk '$1 ~ /^chr[2-4]*[LRX]*$/' input | sort -k1,1 -k2,2n

EDIT: But If you want to match strictly only chr2L || chr2R || chr3L || chr3R || chr4 || chrX, then above regexp is wrong.

It will match other combinations as well. I think what you did is correct.
I was about to respond to this post to say exactly what you added in your edit. I agree completely.

Shorter isn't always better. That said, the original version can be simplified a little bit:
Code:
 awk '$1=="chr2L" || $1=="chr2R" || $1=="chr3L" || $1=="chr3R" || $1=="chr4" || $1=="chrX"' input | sort -k1,1 -k2,2n > output

Regards,
Alister
This User Gave Thanks to alister For This Post:
# 4  
Old 03-24-2013
thank you Yoda and alister.

As you said shorter is not always better! Smilie I was not sure if what I used does the matching strictly or not. Now I'm
once again thanks to both of you.

Cheers!
# 5  
Old 03-24-2013
Not sure if this will work with all awk version, but you might want to try
Code:
awk '$1 ~ /^(chr2L|chr2R|chr3L|chr3R|chr4|chrX)$/' input | sort -k1,1 -k2,2n

It's an ored regex constant fitting exactly within the bounds of $1.
# 6  
Old 03-24-2013
Factoring out the common string chr:
Code:
awk '$1 ~ /^chr(2L|2R|3L|3R|4|X)$/' input | sort -k1,1 -k2,2n

Try to avoid regex's if you can, to improve efficiency. Also, if you happen to know, to a reasonable extent, the frequency of occurrence of those strings, you could use simple string comparisons (with the proper order of comparisons) with the short-circuit logical OR (||) operator.

Last edited by elixir_sinari; 03-24-2013 at 01:13 PM..
This User Gave Thanks to elixir_sinari For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk regular expression search

Hi All, I would like to search a regular expression by passing as an i/p variableto AWK. For Example :: 162.111.101.209.9516 162.111.101.209.41891 162.111.101.209.9516 162.111.101.209.9517 162.111.101.209.41918 162.111.101.209.9517 162.111.101.209.41937 162.111.101.209.41951... (7 Replies)
Discussion started by: Girish19
7 Replies

2. Shell Programming and Scripting

Problem with Regular expression in awk

Hi, I have a file with two fields in it as shown below 14,30 28,30 16,30 22,30 21,30 3,30 Fields are separated by comma ",". I've been trying to validate the file based on the condition "each field must be a numeric value" I am using HP-UX OS. I have tried the following awk... (4 Replies)
Discussion started by: meetsriharsha
4 Replies

3. Programming

Perl: How to read from a file, do regular expression and then replace the found regular expression

Hi all, How am I read a file, find the match regular expression and overwrite to the same files. open DESTINATION_FILE, "<tmptravl.dat" or die "tmptravl.dat"; open NEW_DESTINATION_FILE, ">new_tmptravl.dat" or die "new_tmptravl.dat"; while (<DESTINATION_FILE>) { # print... (1 Reply)
Discussion started by: jessy83
1 Replies

4. Shell Programming and Scripting

Regular expression in AWK

Hello world, I was wondering if there is a nicer way to write the following code (in AWK): awk ' FNR==NR&&$1~/^m$/{tok1=1} FNR==NR&&$1~/^m10$/{tok1=1} ' my_file In fact, it looks for m2, m4, m6, m8 and m10 and then return a positive flag. The problem is how to define 10 thanks... (3 Replies)
Discussion started by: jolecanard
3 Replies

5. UNIX for Advanced & Expert Users

Regular Expression Error in AWK

I have a file "fwcsales_filenames.txt" which has a list of file names that are supposed to be copied to another directory. In addition to that, I am trying to extract the date part and write to the log. I am getting the regular expression error when trying to strip the date part using the "ll"... (1 Reply)
Discussion started by: madhunk
1 Replies

6. Shell Programming and Scripting

Regular expression query in AWK

Hi, I have a string like this-->"After Executing service For 10 Request" in this string i need to extract "10". the contents of the string is variable and "10" appears before "For" and after "Request" i.e, in this format "For x Request" I need to extract the value of x. How to do this in AWK?... (10 Replies)
Discussion started by: omprasad
10 Replies

7. Shell Programming and Scripting

need help guys for Regular expression in awk

Hello Experts, Please help me to cope with the following problem I ve patterens like Input Noptx(5) // remain the same -*Nop(3); Nop(9); --Nop(8); // remain the same d3 **---Nop(7); //remain the same d3 **---Nop(7); *--Nop(6); --**Nop(5); -Nop(4); Nop(3); - represents a space... (2 Replies)
Discussion started by: user_prady
2 Replies

8. UNIX for Dummies Questions & Answers

regular expression and awk

I can print a line with an expression using this: awk '/regex/' I can print the line immediately before an expression using this: awk '/regex/{print x};{x=$0}' How do I print the line immediately before and then the line with the expression? (2 Replies)
Discussion started by: nickg
2 Replies

9. Shell Programming and Scripting

awk and regular expression

Ive got a file with words and also numbers. Bla BLA 10 10 11 29 12 89 13 35 And i need to change "10,29,89,25" and also remove anything that contains actually words... (4 Replies)
Discussion started by: maskot
4 Replies

10. Shell Programming and Scripting

Regular expression query in AWK

I have a varable(var1) in a AWK script that contain data in the following format - I need to extract timestamp,priority and log message.I can extract these by using split function but i don't want to use it, since i want to extract it in one go. I have some difficulties in doing it using... (3 Replies)
Discussion started by: omprasad
3 Replies
Login or Register to Ask a Question