How to find similar values in different files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to find similar values in different files
# 1  
Old 09-26-2012
How to find similar values in different files

Hello,

I have 4 files like this:

file1:
Code:
cg24163616    15    297
cg09335911    123    297
cg13515808    565    776
cg12242345    499    705
cg22905282    225    427
cg16674860    286    779
cg14251734    303    724
cg19316579    211    717
cg00612625    422    643

file2:
Code:
cg02792168    230    498
cg00272971    223    330
cg26439963    252    532
cg23206032    660    861
cg19507206    118    134
cg20641465    233    507
cg02631092    459    772
cg19018709    390    481
cg14302224    106    125
cg12421087    442    479

file 2:
Code:
cg22905282    385    475
cg08927006    93    257
cg13737332    107    257
cg17863743    34    257
cg16401360    62    257
cg09511126    100    257
cg16825290    44    503
cg13690864    13    213
cg18577511    62    213
cg25651562    193    475

file 4:
Code:
cg27449572    486    873
cg14244636    518    719
cg23078268    155    585
cg05732883    395    763
cg26712743    478    789
cg16674860    89    329
cg15996984    448    809
cg26329178    39    357
cg00612625    265    476
cg02800607    88    366

I wrote a perl script to find the id's (first columns) that occur in all 4 files (total length ~400.000 lines), and create an output file that contains that id plus the values in every file for that particular id, but this takes forever!
Is there an easy way by using for example grep?

My output would be:
Code:
id    valuefile1    valuefile2    valuefile3    valuefile4

Thanks!

Moderator's Comments:
Mod Comment Please use code tags next time for your code and data.

Last edited by radoulov; 09-26-2012 at 04:28 PM..
# 2  
Old 09-26-2012
Code:
awk 'END {
  for (ID in id)
    if ( count[ID] == ARGC - 1 )
      print ID, id[ID]
  }
{
  v = $2 OFS $3
  id[$1] = $1 in id ? id[$1] OFS v : v
  tmp[$1, FILENAME]++ || count[$1]++
  }' file1 file2 [...]

# 3  
Old 09-26-2012
Code:
#! /bin/perl

for($i=0;$i<$#ARGV+1;$i++)
{
$FH="F${i}";
open($FH,"< $ARGV[${i}]") || die "couldn't open $ARGV[${i}]";
while(<$FH>){
chomp($_);
@field=split(/  */);
$count{$field[0]}++;
   if($count{$field[0]}==(${i}+1)){
       $hash{$field[0]}="$hash{$field[0]} $field[1] $field[2]";
       }
   else{
       delete $hash{$field[0]};
       }
}
close $FH;
}
foreach $key ( keys %hash){
if($count{$key}==($#ARGV+1)){
print "$key $hash{$key}\n";
}
}

Pass the arguments as below
Code:
 perl file.pl file1 file2 file3 file4


Last edited by msabhi; 09-26-2012 at 06:21 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Add values of similar patterns with awk

so my output is this: session_closed=157 session_opened=151 session_closed=18 session_opened=17 there are two patterns here, but with different values. the two patterns are "session_opened" and "session_closed". i expect there will be many more other patterns. what i want to do is... (8 Replies)
Discussion started by: SkySmart
8 Replies

2. Shell Programming and Scripting

Find and compare values from different txt files

Hello, i am new in Bash. Actually i have a directory : /home/resultfiles and inside i have these txt files: 531_1.out.res, 531_2.out.res , 531_3.out.res 532_1.out.res, 532_2.out.res , 532_3.out.res 533_1.out.res, 533_2.out.res, 533_3.out.res All these txt files has this format : num_q all... (3 Replies)
Discussion started by: nimpoura
3 Replies

3. Shell Programming and Scripting

To find ls of similar pattern files in a directory by passing the variable.

Hi, I have file in my $datadir as below :- SAT_1.txt SAT_2.txt BAT_UD.lst BAT_DD1.lst DUTT_1.txt DUTT_la.txt Expected result :- should get all the above file in $<Filename>_file.lst Below is my code :- for i in SAT BAT DUTT do touch a.lst cd $datadir (1 Reply)
Discussion started by: satishmallidi
1 Replies

4. Shell Programming and Scripting

Code to exclude lines with similar values

Hi!!! I have a problem with txt file. For example: File: CATEGORY OF XXX AAA 1 XXX BBB CCC AAA 1 XXX DDD EEE AAA 1 XXX FFF GGG AAA 1 XXX KKK LLL AAA 1 XXX MMM ... (4 Replies)
Discussion started by: Tzole
4 Replies

5. Shell Programming and Scripting

Find Common Values Across Two Files

Hi All, I have two files like below: File1 MYFILE_28012012_1112.txt|4 MYFILE_28012012_1113.txt|51 MYFILE_28012012_1114.txt|57 MYFILE_28012012_1115.txt|57 MYFILE_28012012_1116.txt|57 MYFILE_28012012_1117.txt|57 File2 MYFILE_28012012_1110.txt|57 MYFILE_28012012_1111.txt|57... (2 Replies)
Discussion started by: angshuman
2 Replies

6. Shell Programming and Scripting

Find values in multiple csv files

Hi, I'd like to find the values of certain fields in multiple csv files stored in 1 directory based upon an input search string. An fgrep returns the complete record, I only want certain fields. Thanks in advance for your help. Perry (6 Replies)
Discussion started by: biscayne
6 Replies

7. Shell Programming and Scripting

Looking to find files that are similar.

Hello all, I have a server that is running AIX, running a tool that converts various printstreams (AFP/Metadata) to PDF. This is done using a rexx script and an off the shelf utility. Each report (there's around 125) uses a certain script file, it's basically a text file. I am trying... (5 Replies)
Discussion started by: jeffs42885
5 Replies

8. Shell Programming and Scripting

removing lines with similar values from file

Hello, got a file with this structure: 33274 171030 02/29/2012 37897 P_GEH 2012-02-29 10:31:26 33275 171049 02/29/2012 38132 P_GEH 2012-02-29 10:35:27 33276 171058 02/29/2012 38515 P_GEH 2012-02-29 10:43:26 33277 170748 02/29/2012 40685 P_KOM ... (3 Replies)
Discussion started by: krecik28
3 Replies

9. Shell Programming and Scripting

Joining multiple files based on one column with different and similar values (shell or perl)

Hi, I have nine files looking similar to file1 & file2 below. File1: 1 ABCA1 1 ABCC8 1 ABR:N 1 ACACB 1 ACAP2 1 ACOT1 1 ACSBG 1 ACTR1 1 ACTRT 1 ADAMT 1 AEN:N 1 AKAP1File2: 1 A4GAL 1 ACTBL 1 ACTL7 (4 Replies)
Discussion started by: seqbiologist
4 Replies

10. Shell Programming and Scripting

printing 3 files side by side based on similar values in rows

Hi I'm trying to compare 3 or more files based on similar values and outputting them into 3 columns. For example: file1 ABC DEF GHI file2 DEF DER file3 ABC DER The output should come out like this file1 file2 file3 ABC ABC (4 Replies)
Discussion started by: zerofire123
4 Replies
Login or Register to Ask a Question