find duplicate string in many different files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting find duplicate string in many different files
# 1  
Old 10-10-2012
find duplicate string in many different files

I have more than 100 files like this:

HTML Code:
SVEAVLTGPYGYT	2	
SVEGNFEETQY	10	
SVELGQGYEQY	28	
SVERTGTGYT	6	
SVGLADYNEQF	21	
SVGQGYEQY	32	
SVKTVLGYEQF	2	
SVNNEQF	       12	
SVRDGLTNSPLH	3	
SVRRDREGLEQF	11	
SVRTSGSYEQY	17	
SVSVSGSPLQETQY	78	
SVVHSTSPEAF     59
SVVPGNGYT	75	
There is a string in $1 and its frequency in $2.
I have two questions. How can I merge these file into one file, which include all the string in $1 and each frequency in different fields?

How can I find the same string included in the 100 files, and output its each frequency?

I can do this using awk between two files, but failed to deal with so many.

Thank you!
# 2  
Old 10-10-2012
What is your desired output and what have you tried so far? Why did it work with two files, but not with 100 files?
# 3  
Old 10-10-2012
something along these lines to search for a 'string' - not tested - should get you started.
Code:
awk -v str2find='SVELGQGYEQY'  '{
  fileA[$1]=($1 in fileA)?fileA[$1] FS FILENAME:FILENAME
  freq[$1,FILENAME]=$2
}
END {
  if ( str2find in fileA) {
     print "string", "file", "frequency"
     n=split(fileA[str2find], tmp, FS)
     for (i=1;i<=n;i++)
        print str2find, tmp[i], freq[str2find,tmp[i]]
  }
}' my100filesGohere


Last edited by vgersh99; 10-10-2012 at 04:28 PM..
This User Gave Thanks to vgersh99 For This Post:
# 4  
Old 10-10-2012
I use the following code when dealing with two files.

Code:
awk 'NR==FNR{A[$1]=$0; next} $1=A[$1]' file1 FS=, OFS='\t' file2

The file like this is what I want: (in case the strings shown here are duplicate between these files)

HTML Code:
string         file1       file2     file3    file4    file5   ...............
SVERTGTGYT	6	      4           5
SVGLADYNEQF	21	      3           7
SVGQGYEQY	32	      5           6
SVKTVLGYEQF	2	      4          9
SVNNEQF	       12	      4           6
# 5  
Old 10-10-2012
Code:
awk '
{if (length(fns[FILENAME])<1) {
  fn[fc++]=FILENAME;
  fns[FILENAME]=FILENAME;
 }
 wd[$1]=$1;
 ws[$1 FILENAME]=$2;
}
END{
 printf("%-20s", "string");
 for (i=0; i<fc; i++) {
  printf("%-20s", fn[i]);
 }
 print;
 for (i in wd) {
  printf("%-20s", i);
  for (j=0; j<fc; j++) {
   printf("%-20s", ws[i fn[j]]);
  }
  print
 }
}' file1 file2 file3 ...


Last edited by rdrtx1; 10-10-2012 at 06:02 PM..
This User Gave Thanks to rdrtx1 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

AIX find duplicate backup files

I would like find and delete old backup files in aix. How would I go about doing this? For example: server1_1-20-2020 server1_1-21-2020 server1_1-22-2020 server1_1-23-2020 server2_1-20-2020 server2_1-21-2020 server2_1-22-2020 server2_1-23-2020 How would I go about finding and... (3 Replies)
Discussion started by: cokedude
3 Replies

2. Shell Programming and Scripting

To Find Duplicate files using latest in Linux

I have tried the following code and with that i couldnt achieve what i want. #!/usr/bin/bash find ./ -type f \( -iname "*.xml" \) | sort -n > fileList sed -i '/\.\/fileList/d' fileList NAMEOFTHISFILE=$(echo $0|sed -e 's/\/()$*.^|/\\&/g') sed -i "/$NAMEOFTHISFILE/d"... (2 Replies)
Discussion started by: gold2k8
2 Replies

3. Shell Programming and Scripting

Find help in shell - that clears away duplicate files

I am so frustrated!!! I want a nice command that clears away duplicate files: find . -type f -regex '.*{1,3}\..*' | xargs -I## rm -v '##' should work in my opinion. But it finds nothing even though I have files that have the file name: Scooby-Doo-1.txt Himalaya-2.jpg Camping... (8 Replies)
Discussion started by: Mr.Glaurung
8 Replies

4. Shell Programming and Scripting

Find duplicate rows between files

Hi champs, I have one of the requirement, where I need to compare two files line by line and ignore duplicates. Note, I hav files in sorted order. I have tried using the comm command, but its not working for my scenario. Input file1 srv1..development..employee..empname,empid,empdesg... (1 Reply)
Discussion started by: Selva_2507
1 Replies

5. Shell Programming and Scripting

Find duplicate files but with different extensions

Hi ! I wonder if anyone can help on this : I have a directory: /xyz that has the following files: chsLog.107.20130603.gz chsLog.115.20130603 chsLog.111.20130603.gz chsLog.107.20130603 chsLog.115.20130603.gz As you ca see there are two files that are the same but only with a minor... (10 Replies)
Discussion started by: fretagi
10 Replies

6. Shell Programming and Scripting

Find duplicate files by file size

Hi! I want to find duplicate files (criteria: file size) in my download folder. I try it like this: find /Users/frodo/Downloads \! -type d -exec du {} \; | sort > /Users/frodo/Desktop/duplicates_1.txt; cut -f 1 /Users/frodo/Desktop/duplicates_1.txt | uniq -d | grep -hif -... (9 Replies)
Discussion started by: Dirk Einecke
9 Replies

7. Shell Programming and Scripting

Find duplicate files

What utility do you recommend for simply finding all duplicate files among all files? (4 Replies)
Discussion started by: kiasas
4 Replies

8. Shell Programming and Scripting

Remove duplicate files based on text string?

Hi I have been struggling with a script for removing duplicate messages from a shared mailbox. I would like to search for duplicate messages based on the “Message-ID” string within the messages files. I have managed to find the duplicate “Message-ID” strings and (if I would like) delete... (1 Reply)
Discussion started by: spangberg
1 Replies

9. Shell Programming and Scripting

Find Duplicate files, not by name

I have a directory with images: -rw-r--r-- 1 root root 26216 Mar 19 21:00 020109.210001.jpg -rw-r--r-- 1 root root 21760 Mar 19 21:15 020109.211502.jpg -rw-r--r-- 1 root root 23144 Mar 19 21:30 020109.213002.jpg -rw-r--r-- 1 root root 31350 Mar 20 00:45 020109.004501.jpg -rw-r--r-- 1 root... (2 Replies)
Discussion started by: Ikon
2 Replies

10. Shell Programming and Scripting

how to find duplicate files with find ?

hello all I like to make search on files , and the result need to be the files that are duplicated? (8 Replies)
Discussion started by: umen
8 Replies
Login or Register to Ask a Question