|
|||||||
| Forums | Search Forums | Register | Forum Rules | Man Pages | Albums | FAQ | Members | Calendar | Search | Today's Posts | Mark Forums Read |
| UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !! |
|
|
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
Extract common data out of multiple files
I am trying to extract common list of Organisms from different files
For example I took 3 files and showed expected result. In real I have more than 1000 files. I am aware about the useful use of awk and grep but unaware in depth so need guidance regarding it. I want to use awk/ grep/ cut/ perl/ python to get the needful result. File A: Pseudomonas stutzeri A1501File B: Rickettsiella grylliFile C: Pseudomonas extremaustralisExpected Result file : Common organism Aeromonas caviae Ae398Hoping for your suggestions and support. Thank you in advance |
| Sponsored Links | ||
|
|
#2
|
||||
|
||||
|
If there is maximum of 1 entry per file: Code:
awk '++A[$0]>=ARGC-1' file* This would then be a bit more robust: Code:
awk '{$1=$1} ++A[$0]>=ARGC-1' file*But 1000 files is probably going to be too many for the command line length. Otherwise try: Code:
(
set -- file*
for f
do
cat "$f"
done | awk '{$1=$1} ++A[$0]>=c' c=$#
)Last edited by Scrutinizer; 12-24-2012 at 07:14 AM.. |
| Sponsored Links | ||
|
|
#3
|
|||
|
|||
|
Try this Code:
$ cat file?|sort|uniq -c|sort -rnb|grep "^ *3"| cut -d" " -f8-30 Rickettsiella grylli Pseudomonas stutzeri A1501 Aeromonas caviae Ae398 |
|
#4
|
||||
|
||||
|
A simple while loop will do the work for you Code:
file_count=$(ls -lrt file? |wc -l) sort -u file1 > temp;cat temp > file1;rm temp while read i do result_count=$(grep -lw "$i" file? | wc -l) if [ $result_count -eq $file_count ]; then echo $i fi done < file1 Last edited by sathyaonnuix; 12-31-2012 at 01:00 AM.. |
| Sponsored Links | |
|
|
#5
|
|||
|
|||
|
Hi Sathyaonnuix, The solution shared by you is very impressive..... but in case if file1 has same lines multiple times (which is common to other files as well) then it will result in multiple occurrence of that line in output. Maybe we can use sort and unique to overcome this little problem somewhat like: Code:
cat file1|sort|uniq > tmp.tmp and then apply while loop on this tmp. file |
| Sponsored Links | |
|
|
#6
|
||||
|
||||
|
Hello Mukul, Thanks for your feedback. When grep -l command is used, it suppresses the repetition. Code:
# cat file repeat repeat repeat 123 456 567 Code:
# grep -lw repeat file file |
| Sponsored Links | |
|
|
#7
|
|||
|
|||
|
Hi sathyaonnuix, Consider the below scenario : Code:
cat filea line1 line2 repeat1 line3 repeat2 line4 repeat1 Code:
cat fileb 1234567 repeat1 repeat2 bbbbbb Code:
cat filec line1 repeat1 repeat2 line2 repeat1 line3 Now executing the script for these three files would result in below output Code:
repeat1 repeat2 repeat1 repetition of repeat1 which i think was not required P.S. I am running while loop on filea |
| Sponsored Links | ||
|
![]() |
| Tags |
| awk, cut, grep, perl, python |
| Thread Tools | Search this Thread |
| Display Modes | |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Extract common words from two/more csv files | nick2011 | Shell Programming and Scripting | 1 | 03-06-2012 06:06 AM |
| Using AWK: Extract data from multiple files and output to multiple new files | Liverpaul09 | UNIX for Dummies Questions & Answers | 3 | 10-12-2010 03:59 AM |
| AWK, extract data from multiple files | Liverpaul09 | UNIX for Dummies Questions & Answers | 8 | 09-29-2010 08:43 AM |
| Get common lines from multiple files | genehunter | Shell Programming and Scripting | 9 | 09-02-2010 03:47 AM |
| How to rename multiple files with a common suffix | er_ashu | UNIX for Dummies Questions & Answers | 1 | 09-28-2007 10:52 AM |
|
|