Compare intervals (columns) from two files (awk, grep, Perl?)


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Compare intervals (columns) from two files (awk, grep, Perl?)
# 1  
Old 01-17-2012
Compare intervals (columns) from two files (awk, grep, Perl?)

Hi dear users,

I need to compare numeric columns in two files. These files have the following structure.

K.txt (4 columns)

Code:
A001      chr21      9805831      9846011
A002      chr21      9806202      9846263
A003      chr21      9887188      9988593
A003      chr21      9887188      9988593
A004      chr21      9895249      9988593
......
......

K.txt file's columns 3 and 4 are the starting and ending positions of an interval for each gene name in column 1.

S.txt (4 columns)

Code:
chr21    9411326    9411327    rs75025155
chr21    9411409    9411410    rs71235072
chr21    9805830    9805831    rs78200054
chr21    9887190    9887191    rs71235073
chr21    9895220    9895221    rs78302045
chr21    9988593    9988594    rs71220654
......
......

S.txt file's columns 2 and 3 are also intervals (but shorter than K.txt). Also S.txt file is larger than K.txt

These are the possible outcomes, (or intersections among the intervals):

S$3 <= K$3 (don't print to output)
S$2 <= K$3 AND S$3 >= K$3 (print to output)
S$2 >= K$3 AND S$3 <= K$4 (print to output)
S$2 <= K$4 AND S$3 >= K$4 (print to output)
S$2 >= K$4 (don't print to output)


output should have 2 columns (tab separated): first is column 4 from S.txt (S$4) and second is column 1 from K.txt (K$1). If there are multiple matches like in the example, they should be separated by commas.

Code:
rs71235073    A003
rs78200054    A001,B001
rs78302045    A004
.....
.....

Any suggestion will be very welcome.
Thank you!
Moderator's Comments:
Mod Comment
Please use code tags when posting data and code samples!

Last edited by vgersh99; 01-17-2012 at 03:40 PM.. Reason: code tags, please!
# 2  
Old 01-17-2012
They both have the same number of rows then, to be read one by one and compared?
# 3  
Old 01-17-2012
Quote:
Originally Posted by Corona688
They both have the same number of rows then, to be read one by one and compared?
They have different number of rows, but I'm afraid that either K.txt or S.txt should be read one by one.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Compare 2 columns of files awk

hello everybody I have 2 files the file1 has 10 columns and the form: ... 110103 0802 1.16 38 20.16 22 1.21 8.77 0.00 20 120103 0832 23.40 38 22.10 21 46.35 10.17 0.00 28 120103 1413 45.00 38 24.50 21 48.85 7.89 0.00 38 130103 1112 23.40 38 22.10 21 48.85 ... (5 Replies)
Discussion started by: phaethon
5 Replies

2. Shell Programming and Scripting

Compare 2 csv files by columns, then extract certain columns of matcing rows

Hi all, I'm pretty much a newbie to UNIX. I would appreciate any help with UNIX coding on comparing two large csv files (greater than 10 GB in size), and output a file with matching columns. I want to compare file1 and file2 by 'id' and 'chain' columns, then extract exact matching rows'... (5 Replies)
Discussion started by: bkane3
5 Replies

3. Shell Programming and Scripting

[Solved] awk compare two different columns of two files and print all from both file

Hi, I want to compare two columns from file1 with another two column of file2 and print matched and unmatched column like this File1 1 rs1 abc 3 rs4 xyz 1 rs3 stu File2 1 kkk rs1 AA 10 1 aaa rs2 DD 20 1 ccc ... (2 Replies)
Discussion started by: justinjj
2 Replies

4. Shell Programming and Scripting

Compare columns in two different files using awk

Hi, I want to compare the columns of two files excluding column 2 from both the files. I tried this awk command. awk -F":" 'NR==FNR{++a;next} !(a)' file1.txt file2.txt . Example: File1.txt 123:09-15-2011:abc:123456 123:09-15-2011:abc:234567 123:09-15-2011:abc:345678 ... (5 Replies)
Discussion started by: shell_newbie
5 Replies

5. Shell Programming and Scripting

Perl: compare columns of two files

Hi I have file 1 like this: file 2 is like this: The files are tab separated. I want to search for the first column values of file 1 in the first column of file 2 and merge the 3rd column value of file 2 to the corresponding line on first file. so the desired output is; I tried following... (2 Replies)
Discussion started by: polsum
2 Replies

6. Shell Programming and Scripting

awk compare specific columns from 2 files, print new file

Hello. I have two files. FILE1 was extracted from FILE2 and modified thanks to help from this post. Now I need to replace the extracted, modified lines into the original file (FILE2) to produce the FILE3. FILE1 1466 55.27433 14.72050 -2.52E+03 3.00E-01 1.05E+04 2.57E+04 1467 55.27433... (1 Reply)
Discussion started by: jm4smtddd
1 Replies

7. Shell Programming and Scripting

awk compare 2 columns, 2 files, output whole line

Hello, I have not been able to find what I'm looking for via searching the forum. I could use some help with an awk script or one-liner to solve this simple problem. I have two files. If $1 and $2 from file1 match $1 and $2 from file2, print the whole line from file2. Example file1 ... (2 Replies)
Discussion started by: jm4smtddd
2 Replies

8. Shell Programming and Scripting

Compare two files and set a third one using awk or perl

Folks I need your help cuz I've a file with 100,000 records that need to be compared against a passwd file (300) and then create a third one with the data in the first one and the passwd from the second one set in it. The format of the first file is: host xxxxxx "" 0,0 Closed control00/... (4 Replies)
Discussion started by: ranrodrig
4 Replies

9. Shell Programming and Scripting

How to compare 2 files & get only few columns based on a condition related to both files?

Hiiiii friends I have 2 files which contains huge data & few lines of it are as shown below File1: b.dat(which has 21 columns) SSR 1976 8 12 13 10 44.00 39.0700 70.7800 7.0 0 0.00 0 2.78 0.00 0.00 0 0.00 2.78 0 NULL ISC 1976 8 12 22 32 37.39 36.2942 70.7338... (6 Replies)
Discussion started by: reva
6 Replies

10. UNIX for Dummies Questions & Answers

Join 2 files with multiple columns: awk/grep/join?

Hello, My apologies if this has been posted elsewhere, I have had a look at several threads but I am still confused how to use these functions. I have two files, each with 5 columns: File A: (tab-delimited) PDB CHAIN Start End Fragment 1avq A 171 176 awyfan 1avq A 172 177 wyfany 1c7k A 2 7... (3 Replies)
Discussion started by: InfoSeeker
3 Replies
Login or Register to Ask a Question