Find common lines between multiple files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Find common lines between multiple files
# 1  
Old 01-08-2013
Find common lines between multiple files

Hello everyone

A few years Ago the user radoulov posted a fancy solution for a problem, which was about finding common lines (gene variation names) between multiple samples (files). The code was:

Code:
awk 'END {
  for (R in rec) {
    n = split(rec[R], t, "/")
    if (n > 1) 
      dup[n] = dup[n] ? dup[n] RS sprintf("\t%-20s -->\t%s", rec[R], R) : \
        sprintf("\t%-20s -->\t%s", rec[R], R)
    }
  for (D in dup) {
    printf "records found in %d files:\n\n", D
    printf "%s\n\n", dup[D]
    }  
  }
{  
  rec[$0] = rec[$0] ? rec[$0] "/" FILENAME : FILENAME
  }' f10.lista f12.lista f13.lista f14.lista fs6.lista

The problem now is that I want to find intersectons of lines between 3, 4 and 5 files, but the program is only showing the results for 3 files.
I'm very newbie at AWK so help me please to modify this code to get my solution.
Thank yo in advance.

Last edited by joeyg; 01-08-2013 at 01:44 PM.. Reason: Corrected title spelling
# 2  
Old 01-08-2013
Sort each file unique, sort merge not unique all those, and count the duplicates:
Code:
sort -m <( sort -u file1 ) <( sort -u file2 ) ... | uniq -c | sort -nr | pg

# 3  
Old 01-08-2013
Thank you DGPickett for your answer but what I need is to modify the given code to obtain the intersection results for 4 and 5 or more files than just 3.

Actually, I want this kind of result:

records found in 3 files:
.
.
.
.
records found in 4 files:
.
.
.
.
.
records found in 5 files:
.
.
.
records found in 'n' files:

but the program now is only showing this:

records found in 3 files:

I hope this would clarify any doubts

Last edited by bibb; 01-08-2013 at 02:44 PM..
# 4  
Old 01-08-2013
try:
Code:
awk '
! f[FILENAME]++ {fc++}
! b[$0,FILENAME] {a[$0]++; b[$0,FILENAME]=$0}
END {
for (j=3; j<=fc; j++) {
   print "records found in " j " files:"
   for (i in a) {if (a[i]==j) print i}}
}
' file*

This User Gave Thanks to rdrtx1 For This Post:
# 5  
Old 01-08-2013
Thank you so much rdrtx1, It works as I wanted!
# 6  
Old 01-08-2013
If a line is in 5 files, it comes up prefixed with 5. You can add "grep -v '^ 1 ' |" before the final sort to toss those with only 1 file.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find common lines between all of the files in one folder

Could it be possible to find common lines between all of the files in one folder? Just like comm -12 . So all of the files two at a time. I would like all of the outcomes to be written to a different files, and the file names could be simply numbers - 1 , 2 , 3 etc. All of the file names contain... (19 Replies)
Discussion started by: Eve
19 Replies

2. Shell Programming and Scripting

Find common lines with one file and with all of the files in another folder

Hi! I would like to comm -12 with one file and with all of the files in another folder that has a 100 files or more (that file is not in that folder) to find common text lines. I would like to have each case that they have common lines to be written to a different output file and the names of the... (6 Replies)
Discussion started by: Eve
6 Replies

3. Shell Programming and Scripting

Shell Script to find common lines and replace next line

I want to find common line in two files and replace the next line of first file with the next line of second file. (sed,awk,perl,bash any solution is welcomed ) Case Ignored. Multiple Occurrence of same line. File 1: hgacdavd sndm,ACNMSDC msgid "Rome" msgstr "" kgcksdcgfkdsb... (4 Replies)
Discussion started by: madira
4 Replies

4. Shell Programming and Scripting

Join common patterns in multiple lines into one line

Hi I have a file like 1 2 1 2 3 1 5 6 11 12 10 2 7 5 17 12 I would like to have an output as 1 2 3 5 6 10 7 11 12 17 any help would be highly appreciated Thanks (4 Replies)
Discussion started by: Harrisham
4 Replies

5. Shell Programming and Scripting

Find common patterns in multiple file

Hi, I need help to find patterns that are common or matched in a specified column in multiple files. File1.txt ID1 555 ID23 8857 ID4 4454 ID05 555 File2.txt ID74 4454 ID96 555 ID322 4454 (4 Replies)
Discussion started by: redse171
4 Replies

6. Shell Programming and Scripting

Merge multiple lines in same file with common key using awk

I've been a Unix admin for nearly 30 years and never learned AWK. I've seen several similar posts here, but haven't been able to adapt the answers to my situation. AWK is so damn cryptic! ;) I have a single file with ~900 lines (CSV list). Each line starts with an ID, but with different stuff... (6 Replies)
Discussion started by: protosd
6 Replies

7. Shell Programming and Scripting

Script to find & replace a multiple lines string across multiple php files and subdirectories

Hey guys. I know pratically 0 about Linux, so could anyone please give me instructions on how to accomplish this ? The distro is RedHat 4.1.2 and i need to find and replace a multiple lines string in several php files across subdirectories. So lets say im at root/dir1/dir2/ , when i execute... (12 Replies)
Discussion started by: spfc_dmt
12 Replies

8. Shell Programming and Scripting

Get common lines from multiple files

FileA chr1 31237964 NP_001018494.1 PUM1 M340L chr1 31237964 NP_055491.1 PUM1 M340L chr1 33251518 NP_037543.1 AK2 H191D chr1 33251518 NP_001616.1 AK2 H191D chr1 57027345 NP_001004303.2 C1orf168 P270S FileB chr1 ... (9 Replies)
Discussion started by: genehunter
9 Replies

9. UNIX for Dummies Questions & Answers

find common lines using just one column to compare and result with all columns

Hi. If we have this file A B C 7 8 9 1 2 10 and this other file A C D F 7 9 2 3 9 2 3 4 The result i´m looking for is intersection with A B C D F so the answer here will be (10 Replies)
Discussion started by: alcalina
10 Replies

10. Shell Programming and Scripting

To find all common lines from 'n' no. of files

Hi, I have one situation. I have some 6-7 no. of files in one directory & I have to extract all the lines which exist in all these files. means I need to extract all common lines from all these files & put them in a separate file. Please help. I know it could be done with the help of... (11 Replies)
Discussion started by: The Observer
11 Replies
Login or Register to Ask a Question