Howdy!
I have multiple files with tab-separated data:
HTML Code:
File1_filtered.txt
gnl|Amel_4.0|Group3.29 1 G R 42 42 60 15 ,.AAA.aa,aa.A.. hh00/f//hD/h/hh
gnl|Amel_4.0|Group3.29 2 C Y 36 36 60 5 T.,T, LggJh
gnl|Amel_4.0|Group3.29 3 A R 27 27 60 9 Gg,,.gg., B6hcc22_c
HTML Code:
File2_filtered.txt
gnl|Amel_4.0|Group3.29 1 C K 12 56 60 3 TGT L6L
gnl|Amel_4.0|Group3.29 2 C Y 63 63 60 5 ,$,$tt, EEZZe
HTML Code:
File3_filtered.txt
gnl|Amel_4.0|Group3.29 2 C Y 36 36 60 5 T.,T, LggJh
gnl|Amel_4.0|Group3.29 4 A R 27 27 60 9 Gg,,.gg., B6hcc22_c
I created a master list containing all the different rows based on the first two columns (without duplicates)
HTML Code:
masterList.txt
gnl|Amel_4.0|Group3.29 1
gnl|Amel_4.0|Group3.29 2
gnl|Amel_4.0|Group3.29 3
gnl|Amel_4.0|Group3.29 4
I need to go through each file once, and extract the data on the column 4, and match it to its corresponding line in the master list based on columns 1 and 2 (they need to match exactly).
If there is no entry for a particular line in a data file that matches the masterlist, add and asterisk.
HTML Code:
Like this:
pos1 pos2 pos3 File1 File2 File3
gnl|Amel_4.0|Group3.29 1 R K *
gnl|Amel_4.0|Group3.29 2 Y Y Y
gnl|Amel_4.0|Group3.29 3 Y * R
gnl|Amel_4.0|Group3.29 4 * * *
In the code I have so far, I loaded the master list into a hash. Then each data file is loaded in an array of arrays (split by columns).
Everything works except the matching of the hash and the arrays for each file.
As usual, many thanks in advance for any help you may provide.
Cheers!
HTML Code:
#!/usr/bin/perl
use strict;
use warnings;
##dump the results in this file
my $outfile = ">> matrix.txt";
open (MATRIX,$outfile);
#open the master list
open(MASTER,"folder/MasterList.txt") || die "open MASTER failed";
#load MASTER list into hash of arrays
my %m_hash=();
while(<MASTER>){
chomp;
my @fieldsM = split (/\s|\t/, $_);
my $scaff = $fieldsM[0];
my $pos = $fieldsM[1];
my $key = $scaff.",".$pos;
my $value= $fieldsM[2];
$m_hash{$key} = $value;
#print "$key\t$value\n";
}
close MASTER;
#Load files into an array
my @itemsToUse;
my $directory= "folder";
opendir (DIR, $directory) or die "cant OPEN directory with files!\n";
my @allitems = readdir(DIR);
foreach my $fs (@allitems) {
if ($fs =~ /filtered.txt/) {
my $files = $fs;
push (@itemsToUse, $files);
}
}
#open the data files
foreach my $fs (@itemsToUse){
while(<>){ # sequentially read files and do the comparison on the fly
chomp;
my @fieldsSNP=split/\s|\t/; # split by space or tab
#print "$fields[1]\n";
foreach my $i ( 0 .. $#{ $m_hash{$fieldsSNP[0]} } ) {
if (($fieldsSNP[0] == $m_hash{$fieldsSNP[0]}) && ($fieldsSNP[1] == $m_hash{$fieldsSNP[1]})){
print MATRIX "$m_hash{$fieldsSNP[0]}[$i][0] $m_hash{$fieldsSNP[0]}[$i][1] $fieldsSNP[4]\n";
}
}#close if
}#close foreach
}#close foreachs
close MASTER;
close MATRIX;
exit 0;