Today (Saturday) We will make some minor tuning adjustments to MySQL.

You may experience 2 up to 10 seconds "glitch time" when we restart MySQL. We expect to make these adjustments around 1AM Eastern Daylight Saving Time (EDT) US.


Extract common data out of multiple files


 
Thread Tools Search this Thread
# 1  
Extract common data out of multiple files

I am trying to extract common list of Organisms from different files
For example I took 3 files and showed expected result. In real I have more than 1000 files. I am aware about the useful use of awk and grep but unaware in depth so need guidance regarding it.

I want to use awk/ grep/ cut/ perl/ python to get the needful result.
File A:
Pseudomonas stutzeri A1501
Pseudomonas fragi A22
Pseudomonas fluorescens A506
Aeromonas caviae Ae398
Rickettsiella grylli
Aeromonas veronii AMC34
File B:
Rickettsiella grylli
Pseudomonas fulva 12-X
Pseudomonas extremaustralis 14-3 substr. 14-3b
Aeromonas caviae Ae398
Gallaecimonas xiamenensis 3-C-1
Pseudomonas stutzeri A1501
File C:
Pseudomonas extremaustralis
Pseudomonas fulva 12-X
Pseudomonas extremaustralis 14-3 substr. 14-3b
Aeromonas caviae Ae398
Rickettsiella grylli
Pseudomonas stutzeri A1501
Expected Result file : Common organism
Aeromonas caviae Ae398
Pseudomonas stutzeri A1501
Rickettsiella grylli
Hoping for your suggestions and support.
Thank you in advance
# 2  
If there is maximum of 1 entry per file:
Code:
awk '++A[$0]>=ARGC-1' file*

This would then be a bit more robust:
Code:
awk '{$1=$1} ++A[$0]>=ARGC-1' file*

But 1000 files is probably going to be too many for the command line length.

Otherwise try:
Code:
( 
  set -- file*
  for f
  do
    cat "$f"
  done | awk '{$1=$1} ++A[$0]>=c' c=$# 
)


Last edited by Scrutinizer; 12-24-2012 at 08:14 AM..
This User Gave Thanks to Scrutinizer For This Post:
# 3  
Try this
Code:
$ cat file?|sort|uniq -c|sort -rnb|grep "^ *3"| cut -d" " -f8-30
Rickettsiella grylli
Pseudomonas stutzeri A1501
Aeromonas caviae Ae398

# 4  
A simple while loop will do the work for you

Code:
file_count=$(ls -lrt file? |wc -l)
sort -u file1 > temp;cat temp > file1;rm temp
while read i
do
result_count=$(grep -lw "$i" file? | wc -l)
if [ $result_count -eq $file_count ]; then
 echo $i
fi
done < file1


Last edited by sathyaonnuix; 12-31-2012 at 02:00 AM..
# 5  
Hi Sathyaonnuix,

The solution shared by you is very impressive..... but in case if file1 has same lines multiple times (which is common to other files as well) then it will result in multiple occurrence of that line in output.
Maybe we can use sort and unique to overcome this little problem somewhat like:
Code:
cat file1|sort|uniq > tmp.tmp

and then apply while loop on this tmp. file
# 6  
Hello Mukul,
Thanks for your feedback. When grep -l command is used, it suppresses the repetition.

Code:
# cat file
repeat
repeat
repeat
123
456
567

Code:
# grep -lw repeat file
file

# 7  
Hi sathyaonnuix,
Consider the below scenario :
Code:
cat filea
line1
line2
repeat1
line3
repeat2
line4
repeat1

Code:
cat fileb
1234567
repeat1
repeat2
bbbbbb

Code:
cat filec
line1
repeat1
repeat2
line2
repeat1
line3

Now executing the script for these three files would result in below output
Code:
repeat1
repeat2
repeat1

repetition of repeat1 which i think was not required

P.S. I am running while loop on filea
 

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
Extract data in tabular format from multiple files
belalr
Hi, I have directory with multiple files from which i need to extract portion of specif lines and insert it in a new file, the new file will contain a separate columns for each file data. Example: I need to extract Value_1 & Value_3 from all files and insert in output file as below: ...... Shell Programming and Scripting
2
Shell Programming and Scripting
Compare multiple files, and extract items that are common to ALL files only
castrojc
I have this code awk 'NR==FNR{a=$1;next} a' file1 file2 which does what I need it to do, but for only two files. I want to make it so that I can have multiple files (for example 30) and the code will return only the items that are in every single one of those files and ignore the ones...... Shell Programming and Scripting
7
Shell Programming and Scripting
Extract common words from two/more csv files
nick2011
I have two (or more, to make it generic) csv files. Each line contains words separated by comma. None of words have any space. The number of words per line is not fixed. Some may have one, and some may have 12... The number of lines per file is also not fixed. What I need is to find common words...... Shell Programming and Scripting
1
Shell Programming and Scripting
Using AWK: Extract data from multiple files and output to multiple new files
Liverpaul09
Hi, I'd like to process multiple files. For example: file1.txt file2.txt file3.txt Each file contains several lines of data. I want to extract a piece of data and output it to a new file. file1.txt ----> newfile1.txt file2.txt ----> newfile2.txt file3.txt ----> newfile3.txt Here is...... UNIX for Dummies Questions & Answers
3
UNIX for Dummies Questions & Answers
AWK, extract data from multiple files
Liverpaul09
Hi, I'm using AWK to try to extract data from multiple files (*.txt). The script should look for a flag that occurs at a specific position in each file and it should return the data to the right of that flag. I should end up with one line for each file, each containing 3 columns:...... UNIX for Dummies Questions & Answers
8
UNIX for Dummies Questions & Answers

Featured Tech Videos