Comparing two files and count number of lines that match


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Comparing two files and count number of lines that match
# 1  
Old 04-20-2010
Comparing two files and count number of lines that match

Hello all,

I always found help for my problems using the search option, but this time my request is too specific. I have two files that I want to compare. File1 is the index and File2 contains the data:

File1:

Code:
chr1    protein_coding    exon 500 600    .    +    .     gene_id "20532"; transcript_id "278"; exon_number "1"; gene_name "K"; transcript_name "K"; exon_cluster "9";
chr1    protein_coding    exon    203426072    203426162    .    +    .     gene_id "20532"; gene_id "20532"; transcript_id "278"; exon_number "2"; gene_name "K";  transcript_name "K"; exon_cluster "9";

File2:
Code:
chr1    wtp    read 100 125    35    +    .    ID=read18_1254_1296
chr1    wtp    read 150 175    43    +    .    ID=read199_1254_1252
chr1    wtp    read 580 600    43    +    .    ID=read200_1234_444
chr1    wtp    read 900 915    43    +    .    ID=read200_1234_444
chr1    wtp    read 500 525    35    +    .    ID=read18_1254_1296
chr1    wtp    read 700 725    43    +    .    ID=read199_1254_1252

In File2, always 2 lines with the same ID in the last column belong together as a pair. The files are tab-separated, the info section is ;-separated.

Now I want to create a 3rd file that consists of all lines of file1 with two additional columns that include the following numbers:

All pairs from file 2 that match in column 1 and 7 and either:
- 4th and 5th column of any of the two lines are in the range of the 4th and 5th column of line in file1
- or the 4th and 5th column of one line of the pair are before and the other is after the range described in col. 4 and 5 in file 1

The desired output of the above files would be:
Code:
chr1    protein_coding    exon 500 600    . +    . 2 1     gene_id "20532"; transcript_id "278"; exon_number "1"; gene_name "K";  transcript_name "K"; exon_cluster "9";
chr1    protein_coding    exon    203426072    203426162    .    +    . 0 0 gene_id "20532"; gene_id "20532"; transcript_id "278"; exon_number "2";  gene_name "K";  transcript_name "K"; exon_cluster "9";

Usually I was able to solve these things with awk, but I#m not sure this is possible anymore.

Thanks for your help!

Last edited by Franklin52; 04-20-2010 at 07:24 AM.. Reason: Replaced TABLE tags with CODE tags
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Count the number of files to delete doesnt match

Good evening, need your help please Need to delete certain files before octobre 1 2016, so need to know how many files im going to delete, for instance ls -lrt file_20160*.lis!wc -l but using grep -c to another file called bplist which contains the list of all files backed up doesn match... (7 Replies)
Discussion started by: alexcol
7 Replies

2. Shell Programming and Scripting

Compare two files and count number of matching lines

Dear All, I would like to compare two files and return the number of matches found. Example File A Lx2 L1_Mus1 L1Md_T Lx5 L1M2 L1_Mus3 Lx3_Mus Lx9 Lx2A L1Md_A L1Md_F2 File B L1_Mus3 L1_Mus3 (3 Replies)
Discussion started by: paolo.kunder
3 Replies

3. Shell Programming and Scripting

How to count number of files in directory and write to new file with number of files and their name?

Hi! I just want to count number of files in a directory, and write to new text file, with number of files and their name output should look like this,, assume that below one is a new file created by script Number of files in directory = 25 1. a.txt 2. abc.txt 3. asd.dat... (20 Replies)
Discussion started by: Akshay Hegde
20 Replies

4. Shell Programming and Scripting

Count number of match words

Input: some random text SELECT TABLE1 some more random text some random text SELECT TABLE2 some more random text some random text SELECT TABLE3 some more random text some random text SELECT TABLE1 some more random text Output: 'SELECT TABLE1' 2 'SELECT TABLE2' 1 'SELECT TABLE3' 1 I... (5 Replies)
Discussion started by: chitech
5 Replies

5. UNIX for Dummies Questions & Answers

Count Number Of lines in text files and append values to beginning of file

Hello, I have 50 text files in a directory called "AllFiles" I want to make a program that will go inside of the "AllFiles" Directory and count the number of lines in each individual text file. Then, the program will calculate how many more lines there are over 400 in each text file and... (7 Replies)
Discussion started by: motoxeryz125
7 Replies

6. Shell Programming and Scripting

How to find lines that match exact input and count?

I am writing a package manager in BASH and I would like a small snippet of code that finds lines that match exact input and count them. For example, my file contains: xyz xyz-lib2.0+ xyz-lib2.0 xyz-lib1.5 and "grep -c xyz" returns 4. The current function is: # $1 is the package name.... (3 Replies)
Discussion started by: cooprocks123e
3 Replies

7. Shell Programming and Scripting

perl script on how to count the total number of lines of all the files under a directory

how to count the total number of lines of all the files under a directory using perl script.. I mean if I have 10 files under a directory then I want to count the total number of lines of all the 10 files contain. Please help me in writing a perl script on this. (5 Replies)
Discussion started by: adityam
5 Replies

8. Shell Programming and Scripting

Match and count the number of times

ile1 Beckham Ronaldo file2 Beckham Beckham_human Ronaldo Ronaldo_spain Ronaldo Ronaldo_brazil Beckham Beckham_manch Zidane Zidane_Fran Rooney Rooney_Eng Output shud be (1 Reply)
Discussion started by: cdfd123
1 Replies

9. UNIX for Dummies Questions & Answers

Read directory files and count number of lines

Hello, I'm trying to create a BASH file that can read all the files in my working directory and tell me how many words and lines are in that file. I wrote the following code: FILES="*" for f in "$FILES" do echo -e `wc -l -w $f` done My issue is that my file is outputting in one... (4 Replies)
Discussion started by: jl487
4 Replies

10. Shell Programming and Scripting

count the number of lines that start with the number

I have a file with contents similar to this. abcd 1234 4567 7666 jdjdjd 89289 9382 92 jksdj 9823 298 I want to write a shell script which count the number of lines that start with the number (disregard the lines starting with alphabets) (1 Reply)
Discussion started by: grajp002
1 Replies
Login or Register to Ask a Question