Extract Big and continuous regions


Login or Register for Dates, Times and to Reply

 
Thread Tools Search this Thread
# 1  
Extract Big and continuous regions

Hi all,
I have a file like this I want to extract only those regions which are big and continous


Code:
chr1    3280000 3440000
chr1    3440000 3920000
chr1    3600000 3920000 # region coming within the 3440000 3920000. so i don't want it to be printed in output
chr1    3920000 4800000 
chr1    4080000 4800000 # region coming within the 3920000 4800000 . so i don't want it to be printed in output
chr1    4160000 4360000 # region coming within the 3920000 4800000 . so i don't want it to be printed in output
chr1    4160000 4760000 # region coming within the 3920000 4800000 . so i don't want it to be printed in output
chr1    4360000 4760000 # region coming within the 3920000 4800000 . so i don't want it to be printed in output
chr1    4800000 4920000 # region coming within the 4800000 6160000 . so i don't want it to be printed in output
chr1    4800000 5200000 # region coming within the 4800000 6160000 . so i don't want it to be printed in output
chr1    4800000 5280000 # region coming within the 4800000 6160000 . so i don't want it to be printed in output
chr1    4800000 6160000 
chr1    4920000 5200000 # region coming within the 4800000 6160000 . so i don't want it to be printed in output
chr1    5280000 6160000 # region coming within the 4800000 6160000 . so i don't want it to be printed in output



Finally the below regions could be printed in output
Code:
chr1    3280000 3440000
chr1    3440000 3920000
chr1    3920000 4800000
chr1    4800000 6160000

So can anyone help me to get that
# 2  
The following seems to do what you want:
Code:
awk '
$3 > LH {
	D[++c] = $0
	L[c] = $2
	H[c] = LH = $3
}
END {	for(i = c - 1; i > 0; i--)
		if(H[i] > L[i + 1]) {
			for(j = i; j < c; j++) {
				D[j] = D[j + 1]
				L[j] = L[j + 1]
				H[j] = H[j + 1]
			}
			c--
		}
	for(i = 1; i <= c; i++)
		print D[i]
}' file

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
# 3  
Code:
perl -anle 'if($F[2]>$h){ $s{$F[1]} = $_; $h=$F[2] } END{ for(sort keys %s){ print $s{$_} } }' amrutha_sastry.input

Code:
chr1    3280000 3440000
chr1    3440000 3920000
chr1    3920000 4800000
chr1    4800000 6160000

Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #757
Difficulty: Medium
The MOSFET is the least common semiconductor device in digital and analog circuits.
True or False?

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Want to extract certain lines from big file

Hi All, I am trying to get some lines from a file i did it with while-do-loop. since the files are huge it is taking much time. now i want to make it faster. The requirement is the file will be having 1 million lines. The format is like below. ##transaction, , , ,blah, blah... (38 Replies)
Discussion started by: mad man
38 Replies

2. Shell Programming and Scripting

Extract certain columns from big data

The dataset I'm working on is about 450G, with about 7000 colums and 30,000,000 rows. I want to extract about 2000 columns from the original file to form a new file. I have the list of number of the columns I need, but don't know how to extract them. Thanks! (14 Replies)
Discussion started by: happypoker
14 Replies

3. Shell Programming and Scripting

Obtain the names of the flanking regions

Hi I have 2 files; usually the end position in the file1 is the start position in the file2 and the end position in file2 will be the start position in file1 (flanks) file1 Id start end aaa1 0 3000070 aaa1 3095270 3095341 aaa1 3100822 3100894 aaa1 ... (1 Reply)
Discussion started by: anurupa777
1 Replies

4. UNIX for Dummies Questions & Answers

extract regions of file based on start and end position

Hi, I have a file1 of many long sequences, each preceded by a unique header line. file2 is 3-columns list: headers name, start position, end position. I'd like to extract the sequence region of file1 specified in file2. Based on a post elsewhere, I found the code: awk... (2 Replies)
Discussion started by: pathunkathunk
2 Replies

5. Shell Programming and Scripting

Extract certain entries from big file:Request to check

Hi all I have a big file which I have attached here. And, I have to fetch certain entries and arrange in 5 columns Name Drug DAP ID disease approved or notIn the attached file data is arranged with tab separated columns in this way: and other data is... (2 Replies)
Discussion started by: manigrover
2 Replies

6. Shell Programming and Scripting

awk: union regions

Hi all, I have difficulty to solve the followign problem. mydata: StartPoint EndPoint 22 55 2222 2230 33 66 44 58 222 240 11 25 22 60 33 45 The union of above... (2 Replies)
Discussion started by: phoeberunner
2 Replies

7. UNIX for Dummies Questions & Answers

How big is too big a config.log file?

I have a 5000 line config.log file with several "maybe" errors. Any reccomendations on finding solvable problems? (2 Replies)
Discussion started by: NeedLotsofHelp
2 Replies

8. UNIX for Dummies Questions & Answers

How to view a big file(143M big)

1 . Thanks everyone who read the post first. 2 . I have a log file which size is 143M , I can not use vi open it .I can not use xedit open it too. How to view it ? If I want to view 200-300 ,how can I implement it 3 . Thanks (3 Replies)
Discussion started by: chenhao_no1
3 Replies

Featured Tech Videos