Find common patterns in multiple file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Find common patterns in multiple file
# 1  
Old 09-08-2013
Find common patterns in multiple file

Hi,

I need help to find patterns that are common or matched in a specified column in multiple files.


File1.txt
Code:
ID1     555
ID23    8857
ID4      4454
ID05    555

File2.txt
Code:
ID74   4454
ID96   555
ID322 4454

File3.txt
Code:
ID03   1245
ID885  4454

File4.txt
Code:
ID120  555
ID047  4454

The output file should be like this:-

Code:
ID4      4454
ID74    4454
ID322  4454
ID885  4454
ID047  4454

If one of the file contains the same pattern such as ID74 and ID322 in file2.txt that matches the rest of the files, then the output should have both. I dont know how to do this. i have been searching for the script for handling patterns in multiple files but to no avail. I have thousands of data in multiple files that i need to work on. Hope u guys can help me on this. Thanks.
# 2  
Old 09-08-2013
I do not understand your requirements.

Are you saying:
  1. You want to create a script with a synopsis like:
    Code:
    matcher field_number file...

  2. and, for this example you want to invoke your script as:
    Code:
    matcher 2 File[1-4].txt

  3. and, if a value in column field_number appears more than once in a file named "File2.txt" (or do you mean the 2nd file given as as a file operand to your script) AND also appears at least once in every file named as a file operand to your script, THEN you want to print all lines from all files that contain that value in column field_number?
If so, is the file determining the patterns to match named "File2.txt" or is it the 2nd file operand? Otherwise, please clearly define what you are trying to do.
# 3  
Old 09-08-2013
Hi Don Cragun,

I am not sure what is best to explain this. actually, what i am trying to do :-

1) To print out lines whenever a same pattern in $2 appears in "all" files.
2) If one of the files has same pattern with the rest of the files and it has duplicates in $2, then the duplicates should be printed out too because the value in $1 is unique for all the files with the same pattern in $2.

hope this clarifies. Thanks
# 4  
Old 09-08-2013
As long as the total size of your files is small enough to fit into awk's address space, the following script does what I think you want:
Code:
awk '
# Variable dictionary:
# fc                    # of input files.
# i, j                  loop control variables
# l[line#]=$0           Array of input lines (from all input files).
# ln[$2,pc[$2]]=x       Array of line numbers containing $2; (x = NR)
# p[$2]=x               Array of # of files containing pattern ($2 input
#                       values).
# pf[$2,fc]=x           Array of # of occurrences of $2 in file fc.
# pc[$2]=x              Array of pattern counts ($2 input values); x = # of
#                       lines containing $2.
FNR == 1 {
        fc++
}
{       l[NR] = $0
        ln[$2,++pc[$2]] = NR
        if(pf[$2,fc]++ == 0) p[$2]++
}
END {   for(i in p)     # Loop through pattern values
                if(p[i] == fc)  # If pattern appears in every file...
                        for(j = 1; j <= pc[i]; j++)     # print all lines
                                                        # containing pattern i.
                                printf("%s\n", l[ln[i,j]])
}' File[1-4].txt

but the lines in red in the output from this script have a different number of spaces between fields than were present in what you said should be the output:
Code:
ID4      4454
ID74   4454
ID322 4454
ID885  4454
ID047  4454

The number of spaces in the output here matches the number of spaces in your sample input files; the output you said you wanted had more spaces between fields on those lines.

If you are using a Solaris/SunOS system, use /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk instead of awk.

If you don't have enough memory to keep all of your input file data in memory, the script could be rewritten to read the input files twice. The 1st pass through the input would determine which patterns appear in every file and hhe 2nd pass would print lines with $2 matching the selected patterns.
# 5  
Old 09-08-2013
Another approach, reading the input files twice:

Code:
awk '!s{if(FNR==1) c++; if(!A[c,$2]++) B[$2]++; next} B[$2]==c' File[1-4].txt s=1 File[1-4].txt


Last edited by Scrutinizer; 09-08-2013 at 05:52 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Bash - Find files excluding file patterns and subfolder patterns

Hello. For a given folder, I want to select any files find $PATH1 -f \( -name "*" but omit any files like pattern name ! -iname "*.jpg" ! -iname "*.xsession*" ..... \) and also omit any subfolder like pattern name -type d \( -name "/etc/gconf/gconf.*" -o -name "*cache*" -o -name "*Cache*" -o... (2 Replies)
Discussion started by: jcdole
2 Replies

2. Shell Programming and Scripting

Find files not matching multiple patterns and then delete anything older than 10 days

Hi, I have multiple files in my log folder. e.g: a_m1.log b_1.log c_1.log d_1.log b_2.log c_2.log d_2.log e_m1.log a_m2.log e_m2.log I need to keep latest 10 instances of each file. I can write multiple find commands but looking if it is possible in one line. m file are monthly... (4 Replies)
Discussion started by: wahi80
4 Replies

3. Shell Programming and Scripting

Find matched patterns in multiple files

Hi, I need help to find matched patterns in 30 files residing in a folder simultaneously. All these files only contain 1 column. For example, File1 Gr_1 st-e34ss-11dd bt-wwd-fewq pt-wq02-ddpk pw-xsw17-aqpp Gr_2 srq-wy09-yyd9 sqq-fdfs-ffs9 Gr_3 etas-qqa-dfw ddw-ppls-qqw... (10 Replies)
Discussion started by: redse171
10 Replies

4. Shell Programming and Scripting

Join common patterns in multiple lines into one line

Hi I have a file like 1 2 1 2 3 1 5 6 11 12 10 2 7 5 17 12 I would like to have an output as 1 2 3 5 6 10 7 11 12 17 any help would be highly appreciated Thanks (4 Replies)
Discussion started by: Harrisham
4 Replies

5. Shell Programming and Scripting

Grep from multiple patterns multiple file multiple output

Hi, I want to grep multiple patterns from multiple files and save to multiple outputs. As of now its outputting all to the same file when I use this command. Input : 108 files to check for 390 patterns to check for. output I need to 108 files with the searched patterns. Xargs -I {} grep... (3 Replies)
Discussion started by: Diya123
3 Replies

6. Shell Programming and Scripting

Find common lines between multiple files

Hello everyone A few years Ago the user radoulov posted a fancy solution for a problem, which was about finding common lines (gene variation names) between multiple samples (files). The code was: awk 'END { for (R in rec) { n = split(rec, t, "/") if (n > 1) dup = dup ?... (5 Replies)
Discussion started by: bibb
5 Replies

7. Shell Programming and Scripting

Searching for multiple patterns in a file

Hi All, I have a file in which i have to search for a pattern from the beginning of the file and if the pattern is found , then i have to perform a reverse search from that line to the beginning of the file to get the first occurrence of another pattern. sample input file hey what are you... (8 Replies)
Discussion started by: Kesavan
8 Replies

8. UNIX for Dummies Questions & Answers

Combine multiple files with common string into one new file.

I need to compile a large amount of data with a common string from individual text files throughout many directories. An example data file is below. I want to search for the following string, "cc_sectors_1" and combine all the data from each file which contains this string, into one new... (2 Replies)
Discussion started by: GradStudent2010
2 Replies

9. Shell Programming and Scripting

Find multiple patterns on multiple lines and concatenate output

I'm trying to parse COBOL code to combine variables into one string. I have two variable names that get literals moved into them and I'd like to use sed, awk, or similar to find these lines and combine the variables into the final component. These variable names are always VAR1 and VAR2. For... (8 Replies)
Discussion started by: wilg0005
8 Replies

10. Shell Programming and Scripting

How to cut multiple patterns from a file?

Hi, I need to cut values after searching for similar patterns in a file. For example, I have the following pattern in a file: ####<Nov12 2007> <user: Vijay> <user id:123456 college:anna univ> <error code: runtime exception> I need the values for date: User: User id: College:... (5 Replies)
Discussion started by: Vijay06
5 Replies
Login or Register to Ask a Question