Perl - multiple keys and merging two files Post: 302854625

Sponsored Content

Top Forums Shell Programming and Scripting Perl - multiple keys and merging two files Post 302854625 by Lokesha on Tuesday 17th of September 2013 07:18:18 PM

09-17-2013

Registered User

Perl - multiple keys and merging two files

Hi,

I'm not a regular coder but some times I write some basic perl script, hence Perl is bit difficult for me Smilie

.

I'm merging two files a.txt and b.txt into c.txt:

Code:

a.txt
------
x001;frtb70;xyz;109
x001;frvt65;sec;239
x003;wqax34;jul;659
x004;yhud43;yhn;760

b.txt
------
x001;abcd80;xyz;193
x001;crrp28;xse;456
x002;lmno10;xyz;784
x002;jfds65;jfd;739
x002;juop88;jup;879
x003;yulo90;rem;542
x003;kihl98;dnt;312
x004;urel25;ewb;342


c.txt [output]
------
x001;frtb70;xyz;109
x001;frvt65;sec;239
x002;lmno10;xyz;784
x002;jfds65;jfd;739
x003;wqax34;jul;659
x004;yhud43;yhn;760

Only condition is: I need all the lines from a.txt into c.txt.
But while selecting lines from b.txt into c.txt, first I need to look into a.txt. If the line is already present in a.txt, then I shouldn't consider that b.txt line while writing into c.txt [output]. In all the files, we can consider first column as key, but it may contain duplicates. That is becoming challenge for me.

Below are the script I've writen. problem is, as I'm using hash for both input files, its not considering the lines which has same key value. But I should use all a.txt eventhough keys are same. Same is true for b.txt, except it should skip the lines, if the key is already present in a.txt.

Code:

#!/usr/bin/env perl

sub prepareHash {
	#my ($in_file, $primary_Key, $delimiter) = @_;
	my $in_file   = shift;
	my $key       = shift;
	my $delimiter = shift;
	
  my @line_tokens;
  my %FILE_Hash;
  open( IN_FILE, "< $in_file" ) or die "Can't open $in_file : $!";
	  
  while (<IN_FILE>) {
     my $in_line = $_;
     chomp($in_line);
     @line_tokens = split(/$delimiter/, $in_line);
	   $FILE_Hash{$line_tokens[$key]} = $in_line; 
  }
  
  close IN_FILE;

  return %FILE_Hash;
}

my $input1 = "/export/home/a.txt";
my $input2 = "/export/home/b.txt";
my $output = "/export/home/c.txt";

my %A_Hash  = prepareHash($input1, 0 , ";" );
my %B_Hash  = prepareHash($input2, 0 , ";" );

open( OUT_FILE, "> $c.txt" ) or die "Can't open $c.txt : $!";

for my $a_key ( sort keys %A_Hash ) {
   $a_key =~ s/\s+$//;
   my $a_line = $A_Hash{$a_key};
   print OUT_FILE $a_line . "\n";
}

  # Compare OBL and REPOOBL. Only write extra REPOOBL lines which are not in OBL into BOND file
  for my $b_key ( sort keys %B_Hash ) {
     $b_key =~ s/\s+$//;
     
     if ( ! exists $A_Hash{$b_key} ) {
      my $b_line = $B_Hash{$b_key};
      print OUT_FILE $b_line . "\n";
     } else {
      print "$B_Hash{$b_key} is the already writen into c.txt using a.txt, hence skipping\n";
     }
  }

close OUT_FILE;

Can any of you help me please?

Lokesha

View Public Profile for Lokesha

Find all posts by Lokesha

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

merging multiple log files

Hello, I have 8 sql loader scripts which produce ".bad" file if there is any errors, how can I join the contents of these files together in one column? file 1 CA-94061-TSS Tkb Sport Shop CA-95133-V Vollyrite ...

2. Shell Programming and Scripting

Merging columns from multiple files in one file

Hi, I want to select columns from multiple files and combine them in one file. The files are simulation-data-files with 23 columns each and about 50 rows. I now use: cut -f 11 Sweep?wing-30?scale=0.?0?fan2?.txt | pr -3 | awk '{printf("\n%s\t%s\t%s",$1,$2,$3)}' > ../Data_Processed/output.txtI...

3. UNIX for Advanced & Expert Users

Merging multiple .so files

Hi All, How to merge independent .so files into an executable. Thanks in Advance, Regards, Kusu

4. UNIX for Dummies Questions & Answers

Joining files based on multiple keys

I need a script (perl or awk..anything is fine) to join 3 files based on three key columns. The no of non-key columns can vary in each file. The columns are delimited by semicolon. For example, File1 Dim1;Dim2;Dim3;Fact1;Fact2;Fact3;Fact4;Fact5 ---- data delimited by semicolon --- ...

5. Shell Programming and Scripting

Merging columns from multiple files

Hello, I have a number of tab delimited data files consists of two columns. Like that: File1 800.000000 0.002744 799.000000 0.002517 798.000000 0.002836 797.000000 0.002553 FIle2 800.000000 0.000261 799.000000 0.000001 798.000000 0.000551 797.000000 0.000275 File3...

6. UNIX for Dummies Questions & Answers

Merging two CSV files by 3 primary keys (columns)

Hi there! I have the following problem: I have a set of files called rates_op_yyyyddmm with the format below (which corresponds to the file rates_op_20090130) 30-JAN-2009,ED,FEB09,C,96.375,,,0,,,,,,2.375,,,,,, 30-JAN-2009,ED,FEB09,C,96.5,,,0,,,,,,2.25,,,,,,...

7. Shell Programming and Scripting

Merging multiple files using lines from one file

I have been working of this script for a very long time and I have searched the internet for direction but I am stuck here. I have about 3000 files with two columns each. The length of each file is 50000. Each of these files is named this way b.4, b.5, b.6, b.7, b.8, b.9, b.10, b.11, b.12...

8. Shell Programming and Scripting

Merging multiple files from multiple columns

Hi guys, I have very basic linux experience so I need some help with a problem. I have 3 files from which I want to extract columns based on common fields between them. File1: --- rs74078040 NA 51288690 T G 461652 0.99223 0.53611 3 --- rs77209296 NA 51303525 T G 461843 0.98973 0.60837 3...

9. Shell Programming and Scripting

Merging Multiple Columns between two files

Hello guys, I have 2 CSV files which goes like this: CSV1: Breaking.csv: UTF-8 "Name","Description","Occupation","Email" "Walter White","","Chemistry Teacher","w.w@bb.com" "Jessie Pinkman","","Junkie","j.p@bb.com" "Hank Schrader","","DEA Agent","h.s@bb.com" CSV2: Bad.csv...

10. Shell Programming and Scripting

Merging multiple files into one

Hi guys, could you please help me with this? I have multiple files with this structure: file1 xxx1 1.0 xxx2 3.5 xxx3 2.4 xxx4 3.0 … xxx1890 5.7 file2 xxx1 8.0 xxx3 7.5 xxx4 5.5 ….

LEARN ABOUT DEBIAN

h5jam

h5jam(1)						      General Commands Manual							  h5jam(1)

NAME

       h5jam - Add a user block to a HDF5 file

SYNOPSIS

       h5jam -u user_block -i in_file.h5 [-o out_file.h5] [--clobber]

DESCRIPTION

       h5jam  concatenates  a  user_block  file  and an HDF5 file to create an HDF5 file with a user block. The user block can be either binary or
       text. The output file is padded so that the HDF5 header begins on byte 512, 1024, etc.. (See the HDF5 File Format.)

       If out_file.h5 is given, a new file is created with the user_block followed by the contents of in_file.h5.   In	this  case,  infile.h5	is
       unchanged.

       If out_file.h5 is not specified, the user_block is added to in_file.h5.

       If  in_file.h5  already	has  a	user  block,  the contents of user_block will be added to the end of the existing user block, and the file
       shifted to the next boundary. If --clobber is set, any existing user block will be overwritten.

EXAMPLE USAGE

       Create new file, newfile.h5, with the text in file mytext.txt as the user block for the HDF5 file file.h5.

	    h5jam -u mytext.txt -i file.h5 -o newfile.h5

       Add text in file mytext.txt to front of HDF5 dataset, file.h5.

	    h5jam -u mytext.txt -i file.h5

       Overwrite the user block (if any) in file.h5 with the contents of mytext.txt.

	    h5jam -u mytext.txt -i file.h5 --clobber

RETURN VALUE

       h5jam returns the size of the output file, or -1 if an error occurs.

CAVEATS

       This tool copies all the data (sequentially) in the file(s) to new offsets. For a large file, this copy will take a long time.

       The most efficient way to create a user block is to create the file with a user block (see H5Pset_user_block), and  write  the  user  block
       data into that space from a program.

       The  user block is completely opaque to the HDF5 library and to the h5jam and h5unjam tools.  The user block is simply read or written as a
       string of bytes, which could be text or any kind of binary data.  It is up to the user to know what the contents of the	user  block  means
       and how to process it.

       When the user block is extracted, all the data is written to the output, including any padding or unwritten data.

       This tool moves the HDF5 file through byte copies, i.e., it does not read or interpret the HDF5 objects.

SEE ALSO

       h5dump(1), h5ls(1), h5diff(1), h5import(1), gif2h5(1), h52gif(1), h5perf(1), h5unjam(1).

																	  h5jam(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

merging multiple log files

Discussion started by: jack1981

2. Shell Programming and Scripting

Merging columns from multiple files in one file

Discussion started by: isgoed

3. UNIX for Advanced & Expert Users

Merging multiple .so files

Discussion started by: Kusu

4. UNIX for Dummies Questions & Answers

Joining files based on multiple keys

Discussion started by: Sebben