Login or Register to Ask a Question and Join Our Community

Grep for a range of numbers?

Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Grep for a range of numbers?
# 1  
Old 02-11-2013
Grep for a range of numbers?

I am trying to extract specific information from a large *.sam file (it's originally 28Gb).

I want to extract all lines that are on chr3 somewhere in the range of 112,937,439-113,437,438.

Here is a sample line from my file so you can get a feel for what each line looks like:

seq.4 0 chr10 82951725 25 50M * 0 0 GCCACTTCATTATTTTGGGGACTATCTCCCTAGTCATCACAAGAAATTAA bbbeeeeegggggiiiiiiiihiiiiihiiihighiiihiiiihhiehii XO:A:F MD:Z:50 NM:i:0 IH:i:1 HI:i:1

Any suggestions?
# 2  
Old 02-11-2013
If what you're saying is that you want to extract lines that have the 3rd field set to "char3", try:
awk '$3 == "chr3"' file.sam

If you're using a Solaris/SunOS system, use /usr/xpg4/bin/awk or nawk instead of awk.
# 3  
Old 02-11-2013
I want chr3 and additionally ONLY those in the range 112,937,439 - 113,437,438
# 4  
Old 02-11-2013
not checked but you may try follownig

grep -P '^.*ch3*[1][1][2[9][3-9][7-9][4-9][3-9][8-9]|3[0-4][0-3][0-7][0-4][0-3][0-8]]*'
# 5  
Old 02-11-2013
Contrary to the venerable Dons assumption i suppose the field 4 is the number you want to analyze. If i get you correctly your requirement is:

1. only lines with field3="chr3" as Don already mentioned
2. The value of field 4 is in the mentioned range

Is that correct?

I suggest you explain your problem with a bit more detail, because your enthusiasm in explaining what you are after is generally directly proportional to our enthusiasm in providing the desired answer. This is a fundamental truth in computer science, known as the "dialectic of goodwill".

This User Gave Thanks to bakunin For This Post:
# 6  
Old 02-11-2013
My apologies, I didn't intend to be vague. That summary is correct. This is an aligned Next-gen sequencing file that I am attempting to manipulate. There is a family of repeat genes on chr3 in the range of 112,937,439 -113,437,438 that I am interested in exploring within my data. The file will have millions of "reads" on chr3, but I want to only extract the reads that fall:

1. only lines with field3="chr3" as Don already mentioned
2. The value of field 4 is in the mentioned range

Then I would like to have the whole line printed to a separate file.

Thank you for all of your help.
# 7  
Old 02-11-2013
awk '$3 == "chr3" && ($4 + 0) >= 112937439 && $4 <= 113437438' file.sam > subset.sam

These 2 Users Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Print range of numbers

Hi I am getting an argument which specifies the range of numbers. eg: 7-15 Is there a way that i can easily (avoiding loop) print the range of number between and including the specified above. The above example should translate to 7,8,9,10,11,12,13,14,15 (3 Replies)
Discussion started by: tostay2003
3 Replies

2. Shell Programming and Scripting

Match on a range of numbers

Hi, I'm trying to match a filename that could be called anything from vout001 to vout252 and was trying to do a small test but I'm not getting the result I thought I would.. Can some one tell me what I'm doing wrong? *****@********>echo $mynumber ... (4 Replies)
Discussion started by: Jazmania
4 Replies

3. Shell Programming and Scripting

grep for a range of numbers

Dear Friends, I want to know how to grep for the lines that has a number between given range(start and end). I have tried the following sed command. sed -n -e '/20030101011442/,/20030101035519/p' However this requires both start and end to be part of the content being grepped. However... (4 Replies)
Discussion started by: tamil.pamaran
4 Replies

4. Shell Programming and Scripting

Closest Number from a Range of Numbers

out of a range of numbers, how can i pick out the number that is the closest to any arbitrary/random number that a user supplies? say the range of numbers are between 1 - 90000. but that doesn't mean each number exist between 1 - 90000. the range of numbers could be for example: 1, 3, 4, 6,... (6 Replies)
Discussion started by: SkySmart
6 Replies

5. UNIX for Dummies Questions & Answers

How to count how many numbers in a certain range?

Hi I have a data file with two columns which looks like: 1 42 2 40 3 55 4 50 5 38 6 49 7 33 8 46 9 39 10 33 11 33 12 26 13 46 14 44 15 55 16 54 17 30 18 32 (7 Replies)
Discussion started by: marhuu
7 Replies

6. UNIX for Dummies Questions & Answers

Frequency of a range of numbers

Hello, I have a column where there are values from 1 to 150. I want to get the frequency of values in the following ranges: 1-5 6-10 11-15 .... .... .... 146-150 How can I do this in a for loop? Thanks, Guss (1 Reply)
Discussion started by: Gussifinknottle
1 Replies

7. UNIX for Dummies Questions & Answers

List-to-Range of Numbers

Hello, I have two columns with data that look like this: Col1 Col2 ------ ----- a 1 a 2 a 3 a 4 a 7 a 8 a 9 a 10 a 11 b 6 b 7 b 8 b 9 b 14 (5 Replies)
Discussion started by: Gussifinknottle
5 Replies

8. Shell Programming and Scripting

read numbers from file and output which numbers belongs to which range

Howdy experts, We have some ranges of number which belongs to particual group as below. GroupNo StartRange EndRange Group0125 935300 935399 Group2006 935400 935476 937430 937459 Group0324 935477 935549 ... (6 Replies)
Discussion started by: thepurple
6 Replies

9. UNIX for Dummies Questions & Answers

Using grep on a range of numbers

Hi im new to unix and need to find a way to grep the top 5 numbers in a file and put them into another file. For example my file looks like this abcdef 50000 abcdef 45000 abcdef 40000 abcdef 35000 abcdef 30000 abcdef 25000 abcdef 20000 abcdef 15000 abcdef 10000 and so on... How can... (1 Reply)
Discussion started by: ProgChick2oo9
1 Replies

10. Shell Programming and Scripting

grep numbers range

I want to grep a range of numbers in a log file. My log file looks like this: 20050807070609Z;blah blah That is a combination of yr,month,date,hours,minutes,seconds. I want to search in the log file events that happened between a particular time. like between 20050807070000 to 20050822070000... (1 Reply)
Discussion started by: azmathshaikh
1 Replies
Login or Register to Ask a Question

Featured Tech Videos