Searching for similar row(s) across multiple files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Searching for similar row(s) across multiple files
# 1  
Old 07-18-2012
Searching for similar row(s) across multiple files

Hello Esteemed Members,

I need to write a script to search for files that have one or more than one rows similar.
Please note that there is no specific pattern that I am searching for. The rows can be different, I just need to find out two or more similar records in two or more files.
There are around 5000 such files that I need to search amongst.

The files are scattered in same directory but different sub-directories.

$/abc/xyz/ap/*.prm
$/abc/xyz/dd/*.prm
$/abc/xyz/rt/*.prm

that is, the path until xyz is same and I need to check all the prm files in sub-folders under ap, dd, rt and so on.

My basic criteria is to check for all the prm files which have exactly the same rows.

Like in the example below, two different prm files are under diffenret sub-directories but they both contain a similar row

Code:
none >> host_buf.tpl      >> $CCWSCA_sbcdl/shr/dl_fnaccv_host_buf.raw


Code:
prompt>~/sbc_generated/v760 [49]> grep dl_fnaccv_host_buf */*.prm

cdw/fn_account_ca_view.prm:none >> host_buf.tpl      >> $CCWSCA_sbcdl/shr/dl_fnaccv_host_buf.raw

fn/fn_account_audit_view.prm:none >> host_buf.tpl    >> $CCWSCA_sbcdl/shr/dl_fnaccv_host_buf.raw

How do I find all such files?

I tried searching for answers to this kind of script and created the following


Code:
filecnt=$( find /sbc_generated/v77_0/*/*.prm -type f )
  awk -v 5000=$filecnt  ' 
          {arr[$0]++; next} 
          END{for (i in arr) { 
            if(arr[i]==n) {
                     print arr[i]
            }
          } '  $( find /sbc_generated/v77_0/*/*.prm -type f) > common_rec

but it is giving me the following error

Code:
exec(2): Could not load a.out due to swap reservation failure
or due to insufficient user stack size

Please help me rectify the above error and create a running script in k shell (ksh).

Thank you!

Last edited by Yoodit; 07-18-2012 at 01:15 AM..
# 2  
Old 07-24-2012
Hi.

What are you expecting for the output format? ... cheers, drl
# 3  
Old 07-25-2012
Hello drl,

My question basically is to compare a large number of files and find all the records that exist in more than one file.

Is it possible to do this using sort, uniq and awk or do I need to write a script for the same. Any help would be highly appreciated.
# 4  
Old 07-25-2012
Hi

If there is like 5000 files

Something like
Code:
filecnt=$( find /sbc_generated/v77_0/*/*.prm -type f )

can crash because of the huge size.
# 5  
Old 07-25-2012
Yes, I understand Chirel, so what is the workaround?

What if I append the contents of all the files into one file, sort the records and remove the unique records from the resulting file.
Then I need to back track all the rows that have multiple instances and find out what are the original files to which they belong.

Will this work, if yes, then how do I write the same?
# 6  
Old 07-25-2012
That's the idea i'm trying to do.

prefix each line of all file with "filename : ", then sort from field 3 to end
then uniq -d on the result ignoring the x chars at the start of lines.
The output will be the something like

filename1 : pattern
filename2 : pattern
etc

But i'm having issue with the uniq -d part.

Actually i have
Code:
find . -name '*.prm' -exec  awk '{printf("%-50.50s : %s\n",FILENAME,$0);}' {} \;|sort -k3|uniq -s53 -D

Edit: well no issue, it's working like a charm Smilie

Last edited by Chirel; 07-25-2012 at 06:51 AM.. Reason: Working
This User Gave Thanks to Chirel For This Post:
# 7  
Old 07-25-2012
Thanks for helping me out Chirel. Can you please explain me what does the awk part do?

Code:
awk '{printf("%-50.50s : %s\n",FILENAME,$0);}'

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. BSD

Searching in multiple files

I am new to unix and I would like to search multiple log files to find earliest occurrence of a text. Ex: Say I have 10 logs file each ending with .log and I want to find the text “CustomeError” . I want to find the which log file “CustomeError” comes first and lines which surround’s ... (4 Replies)
Discussion started by: jim john
4 Replies

2. Shell Programming and Scripting

Help on searching for a string on multiple files

Hi all, I am sure some gurus will find a better way of doing this. FYI, I've manually modified some of the data 'coz they are somewhat confidential, so there may be some typo errors. At the moment, I have 3 files that I am trying to search for. Sometime in the future, it may go beyond 3... (2 Replies)
Discussion started by: newbie_01
2 Replies

3. UNIX for Dummies Questions & Answers

Grep in Perl - Searching through multiple files

I'm attempting to use grep in Perl with very little success. What I would like to do in Perl is get the output of the following grep code: grep -l 'pattern' * This gives me a list of all the files in a directory that contain the pattern that was searched. My attempts to do this in Perl... (4 Replies)
Discussion started by: WongSifu
4 Replies

4. Shell Programming and Scripting

Help converting column to row for multiple files

Hi all, I am pretty new at this so be gentle. Also, if there is any chance you could explain what the code you use is actually doing, that would really help me out, Im learning after all :) So I am trying to convert a selected column of numbers from input file1 into a row in output file2 ... (3 Replies)
Discussion started by: StudentServitor
3 Replies

5. Shell Programming and Scripting

Joining multiple files based on one column with different and similar values (shell or perl)

Hi, I have nine files looking similar to file1 & file2 below. File1: 1 ABCA1 1 ABCC8 1 ABR:N 1 ACACB 1 ACAP2 1 ACOT1 1 ACSBG 1 ACTR1 1 ACTRT 1 ADAMT 1 AEN:N 1 AKAP1File2: 1 A4GAL 1 ACTBL 1 ACTL7 (4 Replies)
Discussion started by: seqbiologist
4 Replies

6. Shell Programming and Scripting

Searching across multiple files if pattern is available in all files searched

I have a list of pattern in a file, I want each of these pattern been searched from 4 files. I was wondering this can be done in SED / AWK. say my 4 files to be searched are > cat f1 abc/x(12) 1 abc/x 3 cde 2 zzz 3 fdf 4 > cat f2 fdf 4 cde 3 abc 2... (6 Replies)
Discussion started by: novice_man
6 Replies

7. Shell Programming and Scripting

Searching a word in multiple files

Hi All, I have a issue in pulling some heavy records , I have my input file has 10,000 records which i need to compare with daily appended log files from (sep 1st 2009 to till date) . I tried to use grep fgrep and even sed , but the as time is factor for me , i cannot wait for 5 days to get the... (3 Replies)
Discussion started by: rakesh_411
3 Replies

8. Shell Programming and Scripting

Searching for multiple patterns in files

I have a situation where I need to search for multiple strings (error messages) such as 'aborted' 'file not found' etc in directory having logs. I have put all the error messages in a text file and using the command. grep -f <textfile> <filetobegrepped> I'm doing this thru a script where I... (5 Replies)
Discussion started by: bornon2303
5 Replies

9. Shell Programming and Scripting

Searching for multiple criteria in log files?

I would like a simple shell script that will allow me to display to screen all unsuccessful su attempts in my sulog file, for the present date. I have been trying several different combinations of commands, but I can't quite get the syntax correct. The mess I have right now (don't laugh) is... (4 Replies)
Discussion started by: Relykk
4 Replies

10. Shell Programming and Scripting

Searching multiple files with multiple expressions

I am using a DEC ALPHA running Digital UNIX (formly DEC OSF/1) and ksh. I have a directory with hundreds of files that only share the extension .rpt. I would like to search that directory based on serial number and operation number and only files that meet both requirements to be printed out. I... (6 Replies)
Discussion started by: Anahka
6 Replies
Login or Register to Ask a Question