Algorthm bug of my code


 
Thread Tools Search this Thread
Top Forums Programming Algorthm bug of my code
# 1  
Old 10-01-2011
Algorthm bug of my code

Hello,
This porblem bugged me for some time. It is to merge different files of hundred rows to have a union with the ID as key column (kind of similar to join!) and absence with 0.
Code:
ID File1
A 1
C 3
D 4
M 6

ID File2
A 5
B 10
C 15
Z 26

ID File3
A 2
B 6
O 20
X 9

I want the output as
Code:
ID  File File2 File3
A 1 5 2 
B 0 10 6
C 3 15 0
D 4 0 0
M 6 0 0 
O 0 0 20
X 0 0 9
Z 0 26 0

I search the site that there some posts about merge two files, by a common column, but my case is different. I tried my code which is working but the output lost some of the information
Code:
#!/usr/bin/perl -w

use strict;

my $Fname1="./path/file1.txt"; #tab delimited format 
my $Fname2="./path/file2.txt";
my $Fname3="./path/file3.txt";

my %combinedfile;
my key;

open(F1, "<$Fname1") or die "Cann't find the input file $Fname1 becuase of $!";
while (my line1 = <F1>) {
  chomp ($line1);
my ($ID1, $count)=split("\t", $line1);
$key=$ID1;
$combinedfile{$key}=$count;
}
close (F1);

open(F2, "<$Fname2") or die "Cann't find the input file $Fname1 becuase of $!";
while (my line2 = <F2>) {
  chomp ($line2);
my ($ID2, $count2)=split("\t", $line2);
$key=$ID2;
if (exists($combinedfile{$key}
 { $combinedfile{$key}.="\n$count2";}
else {
 $combinedfile{$key}="0\n$count2";
}
close (F2);

open(F3, "<$Fname3") or die "Cann't find the input file $Fname1 becuase of $!";
 while (my line3 = <F3>) {
   chomp ($line3);
 my ($ID3, $count3)=split("\t", $line3);
 $key=$ID3;
if (exists($combinedfile{$key}
 { $combinedfile{$key}.="\n$count3";}
else {
 $combinedfile{$key}="0\n0\n$count3";
}
close (F3);

foreach (my $member (keys %combinedfile)){
 split ("/n", $combinedfile{$member));
print $member, "\t", (join("\t", split ("/n", $combinedfile{$member)), "\n";
}

The output is:
Code:
 ID File File2 File3
A 1 5 2 
B 0 10 6
C 3 15 
D 4
M 6
O 0 0 20
X 0 0 9
Z 0 26

I know there is a bug with the algorithm, e.g. D in File1, when reading File2, D is supposed to be saved as:
Code:
D 4\n0

and when reading File3, it should be saved as:
Code:
D 4\n0\n0

But it was skipped because it is not in File2 or File3. The fact seems only the new "KEY" of the hash is properly added, and the existing KEY not listed in latter files (File2 or File3) will be skipped.

How to fix this bug? I met in my work occasionally, and seems a common job similar to join but different. Hope there is command like "union" for this job (leave all the 0 with NA!, my wish though!)
Thanks a lot in advance!

Yifang
# 2  
Old 10-01-2011
Code:
join -a 1 -a 2 -e 0 -o0,1.2,2.2 f1 f2 |join -a 1 -a 2 -e 0 -o0,1.2,1.3,2.2 - f3

# 3  
Old 10-02-2011
Use associative array is easier to check a specified key is exists or not.

Since the format of your files are the same, duplicating the code for each file is not a good idea. I prefer using command line arguments to pass the filenames and loop through them. That is run the command likes this:

Code:
./yourscript.pl file1.txt file2.txt file3.txt

Code:
#!/usr/bin/perl -w

use strict;

my %combinedfile = ();
my @file_list = ();

foreach my $file (@ARGV)
{
    my $basename = substr($file, rindex($file, '/') + 1);
    my $name = uc(substr($basename, 0, rindex($basename, '.')));
    push(@file_list, $name);

    if (open(F, $file))
    {
        while (my $line = <F>)
        {
            chomp($line);

            my @item = split("\t", $line);
            my $id = defined($item[0]) ? $item[0] : '';
            my $count = defined($item[1]) ? $item[1] : 0;
            next if ($id eq '');

            $combinedfile{$id} = () if (!defined($combinedfile{$id}));
            $combinedfile{$id}->{$name} = 0 if (!defined($combinedfile{$id}->{$name}));
            $combinedfile{$id}->{$name} = $count;
        }

        close(F);
    }
}

print "ID";

foreach my $name (@file_list)
{
    print "\t$name";
}

print "\n";

foreach my $id (sort keys %combinedfile)
{
    print "$id";

    foreach my $name (@file_list)
    {
        my $count = defined($combinedfile{$id}->{$name}) ? $combinedfile{$id}->{$name} : 0;
        print "\t$count";
    }

    print "\n";
}

exit(0);

# 4  
Old 10-02-2011
A little too advance to me, as I can't catch your algorithm although I seem understand each line.

Code:
 while (my $line = <F>){            
chomp($line);              
my @item = split("\t", $line);                                # Understand
my $id = defined($item[0]) ? $item[0] : '';             #Start to get lost, the purpose of the empty string
my $count = defined($item[1]) ? $item[1] : 0;           #??? if there is no count there, how can I assign $item[1] with 0. Biggest trick
next if ($id eq '');             
$combinedfile{$id} = () if (!defined($combinedfile{$id}));
$combinedfile{$id}->{$name} = 0 if (!defined($combinedfile{$id}->{$name}));
$combinedfile{$id}->{$name} = $count;      
}

This part seems to me playing the trick. Can you explain a little bit more, even by pseudo code? Thanks a lot!
# 5  
Old 10-02-2011
Those annoying lines are for avoiding the following warnings:

Code:
Use of uninitialized value in string eq at ...

They initialize the values in case the line has less than 2 columns.

If you don't use "-w", the lines can simply rewrite to:

Code:
my ($id, $count) = split("\t", $line);

# 6  
Old 10-05-2011
Thanks MacMonster!
I tried to understand this part of your script which is the trick of the whole thing to me.

Code:
 $combinedfile{$id} = () if (!defined($combinedfile{$id}));
 $combinedfile{$id}->{$name} = 0 if (!defined($combinedfile{$id}->{$name}));
 $combinedfile{$id}->{$name} = $count;

Could you explain a little more about it so that I can have full catch of it? Thanks a lot!
Yifang
# 7  
Old 10-05-2011
Code:
 

# Add "id" to "$combinedfile" and initialize the element as a hash
$combinedfile{$id} = () if (!defined($combinedfile{$id}));

# Add "name" to "$combinedfile{$id}" and initialize the element as an integer zero
 $combinedfile{$id}->{$name} = 0 if (!defined($combinedfile{$id}->{$name}));

# Assign the value to the element
 $combinedfile{$id}->{$name} = $count;

The "defined" function is to check the element exists or not. If not exist, initialize a value to it. This avoids using an undefined index.
Login or Register to Ask a Question

Previous Thread | Next Thread

7 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Can't find the bug in my code - bombing with rename

Hi Folks - I'm encountering an issue: Scenario: We have automated GL data loads utilizing FDMEE. The problem is that some of our Locations could have multiple files. I think we are running into a situation where the script is trying to name the 2 files the same name and it is bombing out.... (8 Replies)
Discussion started by: SIMMS7400
8 Replies

2. Programming

is this a bug of g++?

Hello, Im using the g++(g++ Ubuntu/Linaro 4.4.4-14ubuntu5 4.4.5) and im trying to compile a small snippet code and got into an endless loop.I recompiled that in VS2010 under Windows 7 and the answer is as expected.so i wonder is this a bug of g++?here is my code. #include<iostream> using... (5 Replies)
Discussion started by: homeboy
5 Replies

3. UNIX for Dummies Questions & Answers

where's the bug?

#!/bin/bash if then #echo "infinite loop" exit 0 fi when I run this file I get the following error: ./test_infinite_loop: line 5: syntax error near unexpected token `fi' ./test_infinite_loop: line 5: `fi' :confused: (4 Replies)
Discussion started by: jon80
4 Replies

4. Shell Programming and Scripting

top's exit code indicates error--is this a bug?

This single line of code in a sh script file top -b -n 1 -U $USER causes the script to prematurely exit with an exit code of 1 (i.e. an error) if the script is run with the -e option (e.g. if set -e is executed near the top of the script file). Alternatively, you can execute it like top... (8 Replies)
Discussion started by: fabulous2
8 Replies

5. AIX

bug in 43 ???

xxxxserver# lsattr -El inet0 | grep 255.240.0.0,32.224.0.0,32.78.120.254 | grep '.40' route net,-hopcount,1,-netmask,255.240.0.0,32.224.0.0,32.78.120.254 How this is possible? (1 Reply)
Discussion started by: itik
1 Replies

6. Shell Programming and Scripting

Is it a bug ..?

Hi All, I am using Red Hat Linux on my servers. The problem that I am facing is, sometimes the /opt usage on the server shows used percentage as 100% , when actually it is simply 20%. When I reboot the system, it comes back to 20%.Is this a bug in the system or my settings have gone wrong... (1 Reply)
Discussion started by: nua7
1 Replies

7. Shell Programming and Scripting

Can anyone find a bug in this code?? shell script

I have done a script and IT WORKS JUST PERFECT from command line...but in cron it has problems executing... nawk -F"|" ' { s=substr($104,2,18)} {b ++s} END { for (i in b) print i, b } ' $1 > /path/to/files/TranId_w$2 q=`cat /path/to/files/TranId_w$2 | wc -l` echo $q >... (1 Reply)
Discussion started by: amon
1 Replies
Login or Register to Ask a Question