Find common lines between all of the files in one folder


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Find common lines between all of the files in one folder
# 8  
Old 03-06-2018
Hi! I only need to compare file contents to find common lines between all of the files in one folder. Could it be done with comm since all of the files are sorted? I just need that every outcome where they had common lines would be in a different file.

---------- Post updated at 06:57 AM ---------- Previous update was at 06:52 AM ----------

A question to jim mcnamara . Where is the edited code with a correction?
# 9  
Old 03-06-2018
Note that based on her other thread, Eve uses csh on a Windows 7 system and is unwilling to use bash, ksh, or any other POSIX-conforming shell to run any script we might propose to help solve her problems.
# 10  
Old 03-06-2018
Maybe something like this?
It simply displays duplicates as soon as they occur (==2).
Code:
awk '++cnt[$0]==2' * > outfile

The * matches all files in the current directory; adapt to your need.
All the files must be strictly unique.

Last edited by MadeInGermany; 03-06-2018 at 04:51 PM.. Reason: strictly unique (sorted not required)
# 11  
Old 03-08-2018
Hi!

Thank you for your help!

All of the files in my folder are unique.

In case of this code -

Code:
 awk '++cnt[$0]==2' * > outfile

I made a test with four files and with this code the outfile consists of only three of the possible six outcomes, since 4 files two at a time always gives six outcomes. The three outcomes that it had were all correct, but three outcomes were missing.

Last edited by MadeInGermany; 03-08-2018 at 12:02 PM.. Reason: added code tags
# 12  
Old 03-08-2018
Hmm, I doubt that.
Maybe you have some trailing spaces or even trailing ^M characters (MS-DOS line ends)?
You can strip them off with
Code:
awk '{ sub(/[[:space:]]+$/, "") } ++cnt[$0]==2' * > /tmp/outfile

Or I have misunderstood your requirement. Then please post an example, e.g. four 10 lines files, and expected outcome.
# 13  
Old 03-09-2018
Hi!

Here are three examples of the contents of the files:

file1
Code:
2   78   99  129  665   765
   3   88   99  543  876   988
   7   45   54   99  120   987
  13   23  167  334 2378  8765
  15   17   18 1125 2356  6765
  54   78   79   90  344  3399
 111  233  788  999 3421  7654
 223  299  388  455  477   566

file2
Code:
3   22   78   87  773   876
   4    9   77  890  977  7655
   7    8   23  854 1276  3343
  33  122  665  888  997   999
  54   78   79   90  344  3399
 223  299  388  455  477   566
 228  332  339  453  988  1299

file3
Code:
1  112  134  235  734  1123
   5   35   84   98 1889  2300
   7    8   23  854 1276  3343
  15   17   18 1125 2356  6765
  45  443  556  887  889   987
 111  233  788  999 3421  7654


And the desired outcome would be three files with one file containg these two lines
Code:
54   78   79   90  344  3399
 223  299  388  455  477   566

and the second file containing these two lines
Code:
15   17   18 1125 2356  6765
 111  233  788  999 3421  7654

and the third file containing this lines
Code:
7    8   23  854 1276  3343

The file names look like this
Code:
AC-FOUR-136-ZEL2-ZECO-111
AC-SEVEN-56-ZEL4-ZECO-68
AC-NINE-994-ZEL3-ZECO-811
AC-ONE-4-ZEL1-ZECO-544
AC-NINE-4-53-ZEL3-ZECO-811
AC-ELEVEN-66-788-ZEL4-ZECO-87
AC-TWO-32-7788-ZEL4-ZECO-95
AC-SIX-56-111-ZEL4-ZECO-87
AC-FOURTEEN-59-1561-ZEL2-ZECO-5

I have to work with 100-200 new files every week to find common lines two at a time. The files contain somewhere between 1000-100000 lines. The examples above are of course a lot shorter.

This code unfortunately didn't help

Code:
awk '{ sub(/[[:space:]]+$/, "") } ++cnt[$0]==2' * > /tmp/outfile

I hope this post is helpful!

Last edited by vgersh99; 03-09-2018 at 10:43 AM.. Reason: code tags, please!
# 14  
Old 03-09-2018
It's good to finally have some decent samples at hand for testing. This is a "proof of concept" for your problem and your data given, adapted from the (working!) solution to your previous problem. Not sure if it will work on the larger datasets mentioned.

Code:
awk '{CNT[$0]++; FN[$0] = FN[$0] FILENAME "-"} END {for (c in CNT) if (CNT[c]>1) {print c >> FN[c]; close (FN[c])}} ' file[123]
cf file?-*

---------- file1-file2-: ----------

  54   78   79   90  344  3399
 223  299  388  455  477   566

---------- file1-file3-: ----------

  15   17   18 1125 2356  6765
 111  233  788  999 3421  7654

---------- file2-file3-: ----------

   7    8   23  854 1276  3343


Last edited by RudiC; 03-10-2018 at 01:21 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Bash to trim folder and files within a path that share a common file extension

The bash will trim the folder to trim folder. Within each of the folders (there may be more than 1) and the format is always the same, are several .bam and matching .bam.bai files (file structure) and the bashunder that executes and trims the .bam as expected but repeats the.bam.bai extentions... (9 Replies)
Discussion started by: cmccabe
9 Replies

2. Shell Programming and Scripting

Find common lines with one file and with all of the files in another folder

Hi! I would like to comm -12 with one file and with all of the files in another folder that has a 100 files or more (that file is not in that folder) to find common text lines. I would like to have each case that they have common lines to be written to a different output file and the names of the... (6 Replies)
Discussion started by: Eve
6 Replies

3. Shell Programming and Scripting

Shell Script to find common lines and replace next line

I want to find common line in two files and replace the next line of first file with the next line of second file. (sed,awk,perl,bash any solution is welcomed ) Case Ignored. Multiple Occurrence of same line. File 1: hgacdavd sndm,ACNMSDC msgid "Rome" msgstr "" kgcksdcgfkdsb... (4 Replies)
Discussion started by: madira
4 Replies

4. UNIX for Dummies Questions & Answers

Filter lines common in two files

Thanks everyone. I got that problem solved. I require one more help here. (Yes, UNIX definitely seems to be fun and useful, and I WILL eventually learn it for myself. But I am now on a different project and don't really have time to go through all the basics. So, I will really appreciate some... (6 Replies)
Discussion started by: latsyrc
6 Replies

5. Shell Programming and Scripting

Find common lines between multiple files

Hello everyone A few years Ago the user radoulov posted a fancy solution for a problem, which was about finding common lines (gene variation names) between multiple samples (files). The code was: awk 'END { for (R in rec) { n = split(rec, t, "/") if (n > 1) dup = dup ?... (5 Replies)
Discussion started by: bibb
5 Replies

6. Shell Programming and Scripting

Common lines from files

Hello guys, I need a script to get the common lines from two files with a criteria that if the first two columns match then I keep the maximum value of the 5th column.(tab separated columns) . 3rd and 4th columns corresponds to the row which has highest value for the 5th column. Sample... (2 Replies)
Discussion started by: jaysean
2 Replies

7. Shell Programming and Scripting

Common lines from files

Hello guys, I need a script to get the common lines from two files with a criteria that if the first two columns match then I keep the maximum value of the 3rd column.(tab separated columns) Sample input: file1: 111 222 0.1 333 444 0.5 555 666 0.4 file 2: 111 222 0.7 555 666... (5 Replies)
Discussion started by: jaysean
5 Replies

8. Shell Programming and Scripting

Find all text files in folder and then copy to a new folder

Hi all, *I use Uwin and Cygwin emulator. I´m trying to search for all text files in the current folder (C/Files) and its sub folders using find -depth -name "*.txt" The above command worked for me, but now I would like to copy all found text files to a new folder (C/Files/Text) with ... (4 Replies)
Discussion started by: cgkmal
4 Replies

9. UNIX for Dummies Questions & Answers

find common lines using just one column to compare and result with all columns

Hi. If we have this file A B C 7 8 9 1 2 10 and this other file A C D F 7 9 2 3 9 2 3 4 The result i´m looking for is intersection with A B C D F so the answer here will be (10 Replies)
Discussion started by: alcalina
10 Replies

10. Shell Programming and Scripting

To find all common lines from 'n' no. of files

Hi, I have one situation. I have some 6-7 no. of files in one directory & I have to extract all the lines which exist in all these files. means I need to extract all common lines from all these files & put them in a separate file. Please help. I know it could be done with the help of... (11 Replies)
Discussion started by: The Observer
11 Replies
Login or Register to Ask a Question