Perl - multiple keys and merging two files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Perl - multiple keys and merging two files
# 1  
Old 09-17-2013
Hammer & Screwdriver Perl - multiple keys and merging two files

Hi,

I'm not a regular coder but some times I write some basic perl script, hence Perl is bit difficult for me Smilie.

I'm merging two files a.txt and b.txt into c.txt:

Code:
a.txt
------
x001;frtb70;xyz;109
x001;frvt65;sec;239
x003;wqax34;jul;659
x004;yhud43;yhn;760

b.txt
------
x001;abcd80;xyz;193
x001;crrp28;xse;456
x002;lmno10;xyz;784
x002;jfds65;jfd;739
x002;juop88;jup;879
x003;yulo90;rem;542
x003;kihl98;dnt;312
x004;urel25;ewb;342


c.txt [output]
------
x001;frtb70;xyz;109
x001;frvt65;sec;239
x002;lmno10;xyz;784
x002;jfds65;jfd;739
x003;wqax34;jul;659
x004;yhud43;yhn;760




Only condition is: I need all the lines from a.txt into c.txt.
But while selecting lines from b.txt into c.txt, first I need to look into a.txt. If the line is already present in a.txt, then I shouldn't consider that b.txt line while writing into c.txt [output]. In all the files, we can consider first column as key, but it may contain duplicates. That is becoming challenge for me.

Below are the script I've writen. problem is, as I'm using hash for both input files, its not considering the lines which has same key value. But I should use all a.txt eventhough keys are same. Same is true for b.txt, except it should skip the lines, if the key is already present in a.txt.

Code:
#!/usr/bin/env perl

sub prepareHash {
	#my ($in_file, $primary_Key, $delimiter) = @_;
	my $in_file   = shift;
	my $key       = shift;
	my $delimiter = shift;
	
  my @line_tokens;
  my %FILE_Hash;
  open( IN_FILE, "< $in_file" ) or die "Can't open $in_file : $!";
	  
  while (<IN_FILE>) {
     my $in_line = $_;
     chomp($in_line);
     @line_tokens = split(/$delimiter/, $in_line);
	   $FILE_Hash{$line_tokens[$key]} = $in_line; 
  }
  
  close IN_FILE;

  return %FILE_Hash;
}

my $input1 = "/export/home/a.txt";
my $input2 = "/export/home/b.txt";
my $output = "/export/home/c.txt";

my %A_Hash  = prepareHash($input1, 0 , ";" );
my %B_Hash  = prepareHash($input2, 0 , ";" );

open( OUT_FILE, "> $c.txt" ) or die "Can't open $c.txt : $!";

for my $a_key ( sort keys %A_Hash ) {
   $a_key =~ s/\s+$//;
   my $a_line = $A_Hash{$a_key};
   print OUT_FILE $a_line . "\n";
}

  # Compare OBL and REPOOBL. Only write extra REPOOBL lines which are not in OBL into BOND file
  for my $b_key ( sort keys %B_Hash ) {
     $b_key =~ s/\s+$//;
     
     if ( ! exists $A_Hash{$b_key} ) {
      my $b_line = $B_Hash{$b_key};
      print OUT_FILE $b_line . "\n";
     } else {
      print "$B_Hash{$b_key} is the already writen into c.txt using a.txt, hence skipping\n";
     }
  }

close OUT_FILE;

Can any of you help me please?
# 2  
Old 09-18-2013
Code:
$ 
$ cat a.txt
x001;frtb70;xyz;109
x001;frvt65;sec;239
x003;wqax34;jul;659
x004;yhud43;yhn;760
$ 
$ cat b.txt
x001;abcd80;xyz;193
x001;crrp28;xse;456
x002;lmno10;xyz;784
x002;jfds65;jfd;739
x002;juop88;jup;879
x003;yulo90;rem;542
x003;kihl98;dnt;312
x004;urel25;ewb;342
$ 
$ 
$ perl -F";" -lane 'if ($ARGV eq "a.txt") { push @{$x{$F[0]}},$_ }
                    else { push @{$y{$F[0]}},$_ if not defined $x{$F[0]} }
                    END {
                      @x {keys %y} = values %y;
                      foreach $k (sort keys %x) { print foreach (@{$x{$k}}) }
                    }' a.txt b.txt
x001;frtb70;xyz;109
x001;frvt65;sec;239
x002;lmno10;xyz;784
x002;jfds65;jfd;739
x002;juop88;jup;879
x003;wqax34;jul;659
x004;yhud43;yhn;760
$ 
$

This User Gave Thanks to durden_tyler For This Post:
# 3  
Old 09-18-2013
Hi durden_tyler, Thank you very much for your reply.

But, the mentioned piece of code is very high level for me. I need it in a script instead of running it on command line. How can I convert your code line into script?

Thanks & Regards,
Lokesha
# 4  
Old 09-18-2013
Somehow this code gives the expected output Smilie But still figuring out how I got the output even, when I am not specifying the delimiter ';'
Code:
perl -lane '$hash{@F[0]} = $_; END { foreach (sort keys %hash) {print $hash{$_}}}' b.txt a.txt > c.txt


Last edited by royalibrahim; 09-18-2013 at 09:36 AM..
# 5  
Old 09-18-2013
Oracle

Thanks royalibrahim,

But I need it in a perl script instead of running on command line.
Can you help me?

Regards.
# 6  
Old 09-18-2013
Here's a basic variation in script form. Hope it helps.

Code:
#!/usr/bin/perl
#

use strict;

# vars we need
my $file_a = "a.txt";
my $file_b = "b.txt";
my $file_c = "c.txt";
my %HASH;
my @FILEA;
my @FILEB;
my @UNIQUE;

# open a.txt and b.txt in read mode and c.txt in append mode
open(FILEA, "<$file_a") or die "Unable to open $file_a.\n";
open(FILEB, "<$file_b") or die "Unable to open $file_b.\n";
open(FILEC, ">>$file_c") or die "Unable to write to $file_c.\n";

# store a.txt and b.txt into arrays
@FILEA = <FILEA>;
@FILEB = <FILEB>;

# write the contents of a.txt to c.txt
foreach(@FILEA) {
    print FILEC $_;
}

# map the contents of a.txt to a hash
%HASH = map{$_ => 1} @FILEA;

# use grep function to parse out lines that exist
# in both a.txt and b.txt
@UNIQUE = grep(! defined $HASH{$_}, @FILEB);

# write the results to c.txt
foreach(@UNIQUE) {
    print FILEC $_;
}

# close files
close(FILEA);
close(FILEB);
close(FILEC);

# done
exit(0);

# 7  
Old 09-18-2013
Bug

Thanks in2nix4life.

The problem with your script is below piece of code line:

Code:
@UNIQUE = grep(! defined $HASH{$_}, @FILEB);

I thinks the above code matching for entire line. As each line in a.txt varies when compared to b.txt, the given script is simply combining both file contents into output file 'c.txt' as the entire line of a.txt not matches with b.txt.

We need to only match for the first field of b.txt with a.txt. If the first field varies then it has to write inside the output file c.txt.


Code:
input file: a.txt
------------------
x001;frtb70;xyz;109
x001;frvt65;sec;239
m003;wqax34;jul;659
y004;yhud43;yhn;760


input file: b.txt
------------------
x001;abcd80;xyz;193
x001;crrp28;xse;456
p002;lmno10;xyz;784
p002;jfds65;jfd;739
p002;juop88;jup;879
m003;yulo90;rem;542
m003;kihl98;dnt;312
y004;urel25;ewb;342


expected output file: c.txt
---------------------------
x001;frtb70;xyz;109
x001;frvt65;sec;239
p002;lmno10;xyz;784
p002;jfds65;jfd;739
p002;juop88;jup;879
m003;wqax34;jul;659
y004;yhud43;yhn;760

Selecting output lines based on first field of input files are important here and I'm failing there. Any idea will be much useful.

Thanks.

Last edited by Scrutinizer; 09-18-2013 at 07:12 PM.. Reason: code tags
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Merging multiple files into one

Hi guys, could you please help me with this? I have multiple files with this structure: file1 xxx1 1.0 xxx2 3.5 xxx3 2.4 xxx4 3.0 … xxx1890 5.7 file2 xxx1 8.0 xxx3 7.5 xxx4 5.5 …. (4 Replies)
Discussion started by: coppuca
4 Replies

2. Shell Programming and Scripting

Merging Multiple Columns between two files

Hello guys, I have 2 CSV files which goes like this: CSV1: Breaking.csv: UTF-8 "Name","Description","Occupation","Email" "Walter White","","Chemistry Teacher","w.w@bb.com" "Jessie Pinkman","","Junkie","j.p@bb.com" "Hank Schrader","","DEA Agent","h.s@bb.com" CSV2: Bad.csv... (7 Replies)
Discussion started by: jeffreybsu
7 Replies

3. Shell Programming and Scripting

Merging multiple files from multiple columns

Hi guys, I have very basic linux experience so I need some help with a problem. I have 3 files from which I want to extract columns based on common fields between them. File1: --- rs74078040 NA 51288690 T G 461652 0.99223 0.53611 3 --- rs77209296 NA 51303525 T G 461843 0.98973 0.60837 3... (10 Replies)
Discussion started by: bartman2099
10 Replies

4. Shell Programming and Scripting

Merging multiple files using lines from one file

I have been working of this script for a very long time and I have searched the internet for direction but I am stuck here. I have about 3000 files with two columns each. The length of each file is 50000. Each of these files is named this way b.4, b.5, b.6, b.7, b.8, b.9, b.10, b.11, b.12... (10 Replies)
Discussion started by: iconig
10 Replies

5. UNIX for Dummies Questions & Answers

Merging two CSV files by 3 primary keys (columns)

Hi there! I have the following problem: I have a set of files called rates_op_yyyyddmm with the format below (which corresponds to the file rates_op_20090130) 30-JAN-2009,ED,FEB09,C,96.375,,,0,,,,,,2.375,,,,,, 30-JAN-2009,ED,FEB09,C,96.5,,,0,,,,,,2.25,,,,,,... (2 Replies)
Discussion started by: Pep Puigvert
2 Replies

6. Shell Programming and Scripting

Merging columns from multiple files

Hello, I have a number of tab delimited data files consists of two columns. Like that: File1 800.000000 0.002744 799.000000 0.002517 798.000000 0.002836 797.000000 0.002553 FIle2 800.000000 0.000261 799.000000 0.000001 798.000000 0.000551 797.000000 0.000275 File3... (19 Replies)
Discussion started by: erden
19 Replies

7. UNIX for Dummies Questions & Answers

Joining files based on multiple keys

I need a script (perl or awk..anything is fine) to join 3 files based on three key columns. The no of non-key columns can vary in each file. The columns are delimited by semicolon. For example, File1 Dim1;Dim2;Dim3;Fact1;Fact2;Fact3;Fact4;Fact5 ---- data delimited by semicolon --- ... (1 Reply)
Discussion started by: Sebben
1 Replies

8. UNIX for Advanced & Expert Users

Merging multiple .so files

Hi All, How to merge independent .so files into an executable. Thanks in Advance, Regards, Kusu (2 Replies)
Discussion started by: Kusu
2 Replies

9. Shell Programming and Scripting

Merging columns from multiple files in one file

Hi, I want to select columns from multiple files and combine them in one file. The files are simulation-data-files with 23 columns each and about 50 rows. I now use: cut -f 11 Sweep?wing-30?scale=0.?0?fan2?.txt | pr -3 | awk '{printf("\n%s\t%s\t%s",$1,$2,$3)}' > ../Data_Processed/output.txtI... (1 Reply)
Discussion started by: isgoed
1 Replies

10. Shell Programming and Scripting

merging multiple log files

Hello, I have 8 sql loader scripts which produce ".bad" file if there is any errors, how can I join the contents of these files together in one column? file 1 CA-94061-TSS Tkb Sport Shop CA-95133-V Vollyrite ... (3 Replies)
Discussion started by: jack1981
3 Replies
Login or Register to Ask a Question