Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Comparing two files and count number of lines that match Post 302414425 by DerSeb on Tuesday 20th of April 2010 06:04:41 AM
Old 04-20-2010
Comparing two files and count number of lines that match

Hello all,

I always found help for my problems using the search option, but this time my request is too specific. I have two files that I want to compare. File1 is the index and File2 contains the data:

File1:

Code:
chr1    protein_coding    exon 500 600    .    +    .     gene_id "20532"; transcript_id "278"; exon_number "1"; gene_name "K"; transcript_name "K"; exon_cluster "9";
chr1    protein_coding    exon    203426072    203426162    .    +    .     gene_id "20532"; gene_id "20532"; transcript_id "278"; exon_number "2"; gene_name "K";  transcript_name "K"; exon_cluster "9";

File2:
Code:
chr1    wtp    read 100 125    35    +    .    ID=read18_1254_1296
chr1    wtp    read 150 175    43    +    .    ID=read199_1254_1252
chr1    wtp    read 580 600    43    +    .    ID=read200_1234_444
chr1    wtp    read 900 915    43    +    .    ID=read200_1234_444
chr1    wtp    read 500 525    35    +    .    ID=read18_1254_1296
chr1    wtp    read 700 725    43    +    .    ID=read199_1254_1252

In File2, always 2 lines with the same ID in the last column belong together as a pair. The files are tab-separated, the info section is ;-separated.

Now I want to create a 3rd file that consists of all lines of file1 with two additional columns that include the following numbers:

All pairs from file 2 that match in column 1 and 7 and either:
- 4th and 5th column of any of the two lines are in the range of the 4th and 5th column of line in file1
- or the 4th and 5th column of one line of the pair are before and the other is after the range described in col. 4 and 5 in file 1

The desired output of the above files would be:
Code:
chr1    protein_coding    exon 500 600    . +    . 2 1     gene_id "20532"; transcript_id "278"; exon_number "1"; gene_name "K";  transcript_name "K"; exon_cluster "9";
chr1    protein_coding    exon    203426072    203426162    .    +    . 0 0 gene_id "20532"; gene_id "20532"; transcript_id "278"; exon_number "2";  gene_name "K";  transcript_name "K"; exon_cluster "9";

Usually I was able to solve these things with awk, but I#m not sure this is possible anymore.

Thanks for your help!

Last edited by Franklin52; 04-20-2010 at 07:24 AM.. Reason: Replaced TABLE tags with CODE tags
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

count the number of lines that start with the number

I have a file with contents similar to this. abcd 1234 4567 7666 jdjdjd 89289 9382 92 jksdj 9823 298 I want to write a shell script which count the number of lines that start with the number (disregard the lines starting with alphabets) (1 Reply)
Discussion started by: grajp002
1 Replies

2. UNIX for Dummies Questions & Answers

Read directory files and count number of lines

Hello, I'm trying to create a BASH file that can read all the files in my working directory and tell me how many words and lines are in that file. I wrote the following code: FILES="*" for f in "$FILES" do echo -e `wc -l -w $f` done My issue is that my file is outputting in one... (4 Replies)
Discussion started by: jl487
4 Replies

3. Shell Programming and Scripting

Match and count the number of times

ile1 Beckham Ronaldo file2 Beckham Beckham_human Ronaldo Ronaldo_spain Ronaldo Ronaldo_brazil Beckham Beckham_manch Zidane Zidane_Fran Rooney Rooney_Eng Output shud be (1 Reply)
Discussion started by: cdfd123
1 Replies

4. Shell Programming and Scripting

perl script on how to count the total number of lines of all the files under a directory

how to count the total number of lines of all the files under a directory using perl script.. I mean if I have 10 files under a directory then I want to count the total number of lines of all the 10 files contain. Please help me in writing a perl script on this. (5 Replies)
Discussion started by: adityam
5 Replies

5. Shell Programming and Scripting

How to find lines that match exact input and count?

I am writing a package manager in BASH and I would like a small snippet of code that finds lines that match exact input and count them. For example, my file contains: xyz xyz-lib2.0+ xyz-lib2.0 xyz-lib1.5 and "grep -c xyz" returns 4. The current function is: # $1 is the package name.... (3 Replies)
Discussion started by: cooprocks123e
3 Replies

6. UNIX for Dummies Questions & Answers

Count Number Of lines in text files and append values to beginning of file

Hello, I have 50 text files in a directory called "AllFiles" I want to make a program that will go inside of the "AllFiles" Directory and count the number of lines in each individual text file. Then, the program will calculate how many more lines there are over 400 in each text file and... (7 Replies)
Discussion started by: motoxeryz125
7 Replies

7. Shell Programming and Scripting

Count number of match words

Input: some random text SELECT TABLE1 some more random text some random text SELECT TABLE2 some more random text some random text SELECT TABLE3 some more random text some random text SELECT TABLE1 some more random text Output: 'SELECT TABLE1' 2 'SELECT TABLE2' 1 'SELECT TABLE3' 1 I... (5 Replies)
Discussion started by: chitech
5 Replies

8. Shell Programming and Scripting

How to count number of files in directory and write to new file with number of files and their name?

Hi! I just want to count number of files in a directory, and write to new text file, with number of files and their name output should look like this,, assume that below one is a new file created by script Number of files in directory = 25 1. a.txt 2. abc.txt 3. asd.dat... (20 Replies)
Discussion started by: Akshay Hegde
20 Replies

9. Shell Programming and Scripting

Compare two files and count number of matching lines

Dear All, I would like to compare two files and return the number of matches found. Example File A Lx2 L1_Mus1 L1Md_T Lx5 L1M2 L1_Mus3 Lx3_Mus Lx9 Lx2A L1Md_A L1Md_F2 File B L1_Mus3 L1_Mus3 (3 Replies)
Discussion started by: paolo.kunder
3 Replies

10. UNIX for Beginners Questions & Answers

Count the number of files to delete doesnt match

Good evening, need your help please Need to delete certain files before octobre 1 2016, so need to know how many files im going to delete, for instance ls -lrt file_20160*.lis!wc -l but using grep -c to another file called bplist which contains the list of all files backed up doesn match... (7 Replies)
Discussion started by: alexcol
7 Replies
COMM(1) 						    BSD General Commands Manual 						   COMM(1)

NAME
comm -- select or reject lines common to two files SYNOPSIS
comm [-123f] file1 file2 DESCRIPTION
The comm utility reads file1 and file2, which should be sorted lexically, and produces three text columns as output: lines only in file1; lines only in file2; and lines in both files. The filename ``-'' means the standard input. The following options are available: -1 Suppress printing of column 1. -2 Suppress printing of column 2. -3 Suppress printing of column 3. -f Fold case in line comparisons. Each column will have a number of tab characters prepended to it equal to the number of lower numbered columns that are being printed. For example, if column number two is being suppressed, lines printed in column number one will not have any tabs preceding them, and lines printed in column number three will have one. comm assumes that the files are lexically sorted; all characters participate in line comparisons. EXIT STATUS
comm exits 0 on success, >0 if an error occurred. SEE ALSO
cmp(1), diff(1), sort(1), uniq(1) STANDARDS
The comm utility conforms to IEEE Std 1003.2-1992 (``POSIX.2''). BSD
June 6, 1993 BSD
All times are GMT -4. The time now is 08:58 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy