Print the overlapping entries in 2 files to separate file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Print the overlapping entries in 2 files to separate file
# 1  
Old 01-21-2014
Print the overlapping entries in 2 files to separate file

I have two files that contain overlapping positions. i want to put them together each overlapping entries in both files in to a new file (the entries of first file first and the entries of second file next) followed by blank line then next overlapping entries and so on.

Code:
input1
chr1    22      25      rbc     nmb
chr1    23      25      bbc     ddd
chr1    23      25      rds     jkj
chr1    28      36      rds     jkj

Code:
inpu2
chr1    24      25      qws     bbv
chr1    21      26      nbn     mnm
chr1    27      32      rds     jkj

Code:
output
chr1    22      25      rbc     nmb
chr1    23      25      bbc     ddd
chr1    23      25      rds     jkj
chr1    24      25      qws     bbv
chr1    21      26      nbn     mnm

chr1    28      36      rds     jkj
chr1    27      32      rds     jkj

# 2  
Old 01-21-2014
Sorry but I did not really get the logic. can you specify in terms of specific fields on how did you separate the data sets.
In short, Please elaborate.
# 3  
Old 01-21-2014
Hi,

I'd suggest that you might want to use the 'comm' command for this - try 'man comm' and see if what you want is there.

Regards

Gull04
# 4  
Old 01-22-2014
In file1 and we have to check for the overlpping regions based on column 2 and 3. and if overlapps are found whole lines have to be printed. the lines of first file followed by the lines of second file
Eg:
Code:
22	25      
23	25      
23	25  

24	25
21	26

so
output
Code:
chr1    22      25      rbc     nmb
chr1    23      25      bbc     ddd
chr1    23      25      rds     jkj
chr1    24      25      qws     bbv
chr1    21      26      nbn     mnm

please let me know if there is still any confusion

Last edited by raj_k; 01-22-2014 at 06:38 AM..
# 5  
Old 01-22-2014
Quote:
Originally Posted by raj_k
I have two files that contain overlapping positions. i want to put them together each overlapping entries in both files in to a new file (the entries of first file first and the entries of second file next) followed by blank line then next overlapping entries and so on.

Code:
input1
chr1    22      25      rbc     nmb
chr1    23      25      bbc     ddd
chr1    23      25      rds     jkj
chr1    28      36      rds     jkj

Code:
inpu2
chr1    24      25      qws     bbv
chr1    21      26      nbn     mnm
chr1    27      32      rds     jkj

Code:
output
chr1    22      25      rbc     nmb
chr1    23      25      bbc     ddd
chr1    23      25      rds     jkj
chr1    24      25      qws     bbv
chr1    21      26      nbn     mnm

chr1    28      36      rds     jkj
chr1    27      32      rds     jkj


Quote:
Originally Posted by raj_k
In file1 and we have to check for the overlpping regions based on column 2 and 3. and if overlapps are found whole lines have to be printed. the lines of first file followed by the lines of second file
Eg:
Code:
22    25      
23    25      
23    25  

24    25
21    26

so
output
Code:
chr1    22      25      rbc     nmb -->file1
chr1    23      25      bbc     ddd -->file1
chr1    23      25      rds     jkj -->file1
chr1    24      25      qws     bbv -->file2
chr1    21      26      nbn     mnm -->file2

please let me know if there is still any confusion
I still didn't understand please explain. where it is overlapped ?
# 6  
Old 01-22-2014
we have to check for overlaps based on 2nd and 3rd column

in file1 first line 2nd and 3rd columns are 22 and 25. in file2 the first line contains 24 and 25. i,e., we have to check
Code:
 if($column2[file1] <= $column2[file2]) && ($column3[file1] <=$column3[file2] )

as well as
Code:
if($column2[file2] <= $column2[file1]) && ($column3[file2] <=$column3[file1] )

.

in either cases if it is true then it has to be printed in the mentioned format. i hope this is clear.
# 7  
Old 01-22-2014
Are the input files sorted somehow?
E.g. is column3 sorted in ascending order?
Or does the overlap happen with neighbor lines only i.e. never happens with distant lines? (Then the resulting algorithm can become a merge.)
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to print line is values between two fields in separate file

I am trying to use awk to find all the $3 values in file2 that are between $2 and $3 in file1. If a value in $3 of file2 is between the file1 fields then it is printed along with the $6 value in file1. Both file1 and file2 are tab-delimited as well as the desired output. If there is nothing to... (4 Replies)
Discussion started by: cmccabe
4 Replies

2. Shell Programming and Scripting

Identify the overlapping and non overlapping regions

file1 chr pos1 pos2 pos3 pos4 1)chr1 1000 2000 3000 4000 2)chr1 1380 1480 6800 7800 3)chr1 6700 7700 1200 2200 4)chr2 8500 9500 5670 6670 file2 chr pos1 pos2 pos3 pos4 1)chr2 8500 9500 5000 6000 2)chr1 6700 7700 1200 2200 3)chr1 1380 1480 6700 7700 4)chr1 1000 2000 4900 5900 I... (2 Replies)
Discussion started by: data_miner
2 Replies

3. Programming

Read text from file and print each character in separate line

performing this code to read from file and print each character in separate line works well with ASCII encoded text void preprocess_file (FILE *fp) { int cc; for (;;) { cc = getc (fp); if (cc == EOF) break; printf ("%c\n", cc); } } int main(int... (1 Reply)
Discussion started by: khaled79
1 Replies

4. Shell Programming and Scripting

Compare 2 files and print matches and non-matches in separate files

Hi all, I have two files, chap.txt and complex.txt. chap.txt looks like this: a d l m r k complex.txt looks like this: a c d e l m n j a d l p q r c p r m ......... (7 Replies)
Discussion started by: AshwaniSharma09
7 Replies

5. UNIX for Dummies Questions & Answers

Merge two files with non-overlapping identities

Hi All, I wish to merge two files: file1: with header rsSNP-ID Chromosome Chr-Pos rs171 1 175261679 rs242 1 20869461 rs538 1 6160958 file2: without header disease:AAT deficiency:M0525101 rs1243168 20109307 1 disease:AAT deficiency:M0525101 rs4900229 20109307 1... (3 Replies)
Discussion started by: luoruicd
3 Replies

6. UNIX for Dummies Questions & Answers

Awk: Print out overlapping chunks of file - rows 0-20,10-30,20-40 etc.

First time poster, but the forum has saved my bacon more times than... Lots. Anyway, I have a text file, and wanted to use Awk (or any other sensible program) to print out overlapping sections, or arbitrary length. To describe by example, for file 1 2 3 4 5 etc... I want the out put... (3 Replies)
Discussion started by: matfald
3 Replies

7. Shell Programming and Scripting

awk print header as text from separate file with getline

I would like to print the output beginning with a header from a seperate file like this: awk 'BEGIN{FS="_";print ((getline < "header.txt")>0)} { if (! ($0 ~ /EL/ ) print }" input.txtWhat am i doing wrong? (4 Replies)
Discussion started by: sdf
4 Replies

8. Shell Programming and Scripting

awk/sed script to print each line to a separate named file

I have a large 3479 line .csv file, the content of which looks likes this: 1;0;177;170;Guadeloupe;x 2;127;171;179;Antigua and Barbuda;x 3;170;144;2;Umpqua;x 4;170;126;162;Coos Bay;x ... 1205;46;2;244;Unmak Island;x 1206;47;2;248;Yunaska Island;x 1207;0;2;240;north sea;x... (5 Replies)
Discussion started by: kalelovil
5 Replies

9. Shell Programming and Scripting

extract nth line of all files and print in output file on separate lines.

Hello UNIX experts, I have 124 text files in a directory. I want to extract the 45678th line of all the files sequentialy by file names. The extracted lines should be printed in the output file on seperate lines. e.g. The input Files are one.txt, two.txt, three.txt, four.txt The cat of four... (1 Reply)
Discussion started by: yogeshkumkar
1 Replies

10. Shell Programming and Scripting

Break a file into separate files

Hello I am facing a scenario where I have a file with XML content and I am running shell script over it. But the problem is the XML is getting updated with new services. In the below scenario, my script takes values from the xml file from one service name say ABCD. Since there are multiple, it is... (8 Replies)
Discussion started by: chiru_h
8 Replies
Login or Register to Ask a Question