Hello,
This porblem bugged me for some time. It is to merge different files of hundred rows to have a union with the ID as key column (kind of similar to join!) and absence with 0.
Code:
ID File1
A 1
C 3
D 4
M 6
ID File2
A 5
B 10
C 15
Z 26
ID File3
A 2
B 6
O 20
X 9
I want the output as
Code:
ID File File2 File3
A 1 5 2
B 0 10 6
C 3 15 0
D 4 0 0
M 6 0 0
O 0 0 20
X 0 0 9
Z 0 26 0
I search the site that there some posts about merge two files, by a common column, but my case is different. I tried my code which is working but the output lost some of the information
Code:
#!/usr/bin/perl -w
use strict;
my $Fname1="./path/file1.txt"; #tab delimited format
my $Fname2="./path/file2.txt";
my $Fname3="./path/file3.txt";
my %combinedfile;
my key;
open(F1, "<$Fname1") or die "Cann't find the input file $Fname1 becuase of $!";
while (my line1 = <F1>) {
chomp ($line1);
my ($ID1, $count)=split("\t", $line1);
$key=$ID1;
$combinedfile{$key}=$count;
}
close (F1);
open(F2, "<$Fname2") or die "Cann't find the input file $Fname1 becuase of $!";
while (my line2 = <F2>) {
chomp ($line2);
my ($ID2, $count2)=split("\t", $line2);
$key=$ID2;
if (exists($combinedfile{$key}
{ $combinedfile{$key}.="\n$count2";}
else {
$combinedfile{$key}="0\n$count2";
}
close (F2);
open(F3, "<$Fname3") or die "Cann't find the input file $Fname1 becuase of $!";
while (my line3 = <F3>) {
chomp ($line3);
my ($ID3, $count3)=split("\t", $line3);
$key=$ID3;
if (exists($combinedfile{$key}
{ $combinedfile{$key}.="\n$count3";}
else {
$combinedfile{$key}="0\n0\n$count3";
}
close (F3);
foreach (my $member (keys %combinedfile)){
split ("/n", $combinedfile{$member));
print $member, "\t", (join("\t", split ("/n", $combinedfile{$member)), "\n";
}
The output is:
Code:
ID File File2 File3
A 1 5 2
B 0 10 6
C 3 15
D 4
M 6
O 0 0 20
X 0 0 9
Z 0 26
I know there is a bug with the algorithm, e.g. D in File1, when reading File2, D is supposed to be saved as:
Code:
D 4\n0
and when reading File3, it should be saved as:
Code:
D 4\n0\n0
But it was skipped because it is not in File2 or File3. The fact seems only the new "KEY" of the hash is properly added, and the existing KEY not listed in latter files (File2 or File3) will be skipped.
How to fix this bug? I met in my work occasionally, and seems a common job similar to join but different. Hope there is command like "union" for this job (leave all the 0 with NA!, my wish though!)
Thanks a lot in advance!
Use associative array is easier to check a specified key is exists or not.
Since the format of your files are the same, duplicating the code for each file is not a good idea. I prefer using command line arguments to pass the filenames and loop through them. That is run the command likes this:
Code:
./yourscript.pl file1.txt file2.txt file3.txt
Code:
#!/usr/bin/perl -w
use strict;
my %combinedfile = ();
my @file_list = ();
foreach my $file (@ARGV)
{
my $basename = substr($file, rindex($file, '/') + 1);
my $name = uc(substr($basename, 0, rindex($basename, '.')));
push(@file_list, $name);
if (open(F, $file))
{
while (my $line = <F>)
{
chomp($line);
my @item = split("\t", $line);
my $id = defined($item[0]) ? $item[0] : '';
my $count = defined($item[1]) ? $item[1] : 0;
next if ($id eq '');
$combinedfile{$id} = () if (!defined($combinedfile{$id}));
$combinedfile{$id}->{$name} = 0 if (!defined($combinedfile{$id}->{$name}));
$combinedfile{$id}->{$name} = $count;
}
close(F);
}
}
print "ID";
foreach my $name (@file_list)
{
print "\t$name";
}
print "\n";
foreach my $id (sort keys %combinedfile)
{
print "$id";
foreach my $name (@file_list)
{
my $count = defined($combinedfile{$id}->{$name}) ? $combinedfile{$id}->{$name} : 0;
print "\t$count";
}
print "\n";
}
exit(0);
A little too advance to me, as I can't catch your algorithm although I seem understand each line.
Code:
while (my $line = <F>){
chomp($line);
my @item = split("\t", $line); # Understand
my $id = defined($item[0]) ? $item[0] : ''; #Start to get lost, the purpose of the empty string
my $count = defined($item[1]) ? $item[1] : 0; #??? if there is no count there, how can I assign $item[1] with 0. Biggest trick
next if ($id eq '');
$combinedfile{$id} = () if (!defined($combinedfile{$id}));
$combinedfile{$id}->{$name} = 0 if (!defined($combinedfile{$id}->{$name}));
$combinedfile{$id}->{$name} = $count;
}
This part seems to me playing the trick. Can you explain a little bit more, even by pseudo code? Thanks a lot!
# Add "id" to "$combinedfile" and initialize the element as a hash
$combinedfile{$id} = () if (!defined($combinedfile{$id}));
# Add "name" to "$combinedfile{$id}" and initialize the element as an integer zero
$combinedfile{$id}->{$name} = 0 if (!defined($combinedfile{$id}->{$name}));
# Assign the value to the element
$combinedfile{$id}->{$name} = $count;
The "defined" function is to check the element exists or not. If not exist, initialize a value to it. This avoids using an undefined index.
Hi Folks -
I'm encountering an issue:
Scenario:
We have automated GL data loads utilizing FDMEE. The problem is that some of our Locations could have multiple files. I think we are running into a situation where the script is trying to name the 2 files the same name and it is bombing out.... (8 Replies)
Hello,
Im using the g++(g++ Ubuntu/Linaro 4.4.4-14ubuntu5 4.4.5) and im trying to compile a small snippet code and got into an endless loop.I recompiled that in VS2010 under Windows 7 and the answer is as expected.so i wonder is this a bug of g++?here is my code.
#include<iostream>
using... (5 Replies)
#!/bin/bash
if then
#echo "infinite loop"
exit 0
fi
when I run this file I get the following error:
./test_infinite_loop: line 5: syntax error near unexpected token `fi'
./test_infinite_loop: line 5: `fi'
:confused: (4 Replies)
This single line of code in a sh script file
top -b -n 1 -U $USER
causes the script to prematurely exit with an exit code of 1 (i.e. an error) if the script is run with the -e option (e.g. if
set -e
is executed near the top of the script file).
Alternatively, you can execute it like
top... (8 Replies)
Hi All,
I am using Red Hat Linux on my servers. The problem that I am facing is, sometimes the /opt usage on the server shows used percentage as 100% , when actually it is simply 20%.
When I reboot the system, it comes back to 20%.Is this a bug in the system or my settings have gone wrong... (1 Reply)
I have done a script and IT WORKS JUST PERFECT from command line...but in cron it has problems executing...
nawk -F"|" '
{ s=substr($104,2,18)}
{b ++s}
END { for (i in b) print i, b } ' $1 > /path/to/files/TranId_w$2
q=`cat /path/to/files/TranId_w$2 | wc -l`
echo $q >... (1 Reply)