Merging data from one file into another

Old 10-01-2011
Merging data from one file into another

I have a master database of a dictionary with the following structure:
a=b (b is a Unicode string)
a is the English part and b is the equivalent in a foreign language
I have also another file which has a database where the /b/ part of the string has been corrected by an expert. let us assume that this becomes a=c. i.e. the left hand side of the delimiter remains always the same i.e. /a/ will not be touched, but now b has been corrected to c
I have solved the problem with a program in c language but would an awk or a perl script do the same job ?
Any help given will be gratefully acknowledged. Since I am a newbie to PERl, my programming skills in that script are really not still up to it. Perl is a swiss kinfe and the short script does the job more efficiently than say a program in C language
Many thanks
Old 10-01-2011
If I understand you question correctly , you have a large dictionary and a smaller dictionary of corrected entries and you want to change a=b to a=c if the a=c record exists in the corrected dictionary?

use feature 'unicode_strings';

add_entries_to_hash(\%dictionary, "large_dictionary.txt");
add_entries_to_hash(\%dictionary, , "corrected_entries.txt");
open (my $new_dic, "<", "merged_directory.txt");
for $english_word (sort keys %dictionary){
   print $new_dic "$english_word=$dictionary{$english_word}\n"; 
close $new_dic;
sub add_entries_to_hash{
   my ($dictionary, $filename)=@_;
   open(my $file_handle, '<', "$filename");
      $dictionary->{$1}=$2 if /^(.+)=(.+)$/;
exit 0;

Old 10-01-2011
Tried what you gave, but there is no output. I even removed the double comma between dictionary and correct.uni and tried. Here is the code:


use feature 'unicode_strings';

add_entries_to_hash(\%dictionary, "master.uni");
add_entries_to_hash(\%dictionary, "correct.uni");
open (my $new_dic, "<", "merged_directory.txt");
for $english_word (sort keys %dictionary){
   print $new_dic "$english_word=$dictionary{$english_word}\n"; 
close $new_dic;
sub add_entries_to_hash{
   my ($dictionary, $filename)=@_;
   open(my $file_handle, '<', "$filename");
      $dictionary->{$1}=$2 if /^(.+)=(.+)$/;
exit 0;

I have renamed the two dics master and correct and am appending them in a zip file. When the script runs (under Windows) I don't get any output. The differences are on lines 4 and 5 in the master file. This was a test. Normally the file would be much bigger.
Many thanks for all the trouble
Old 10-01-2011
Ooops, my fault, the file mergerd_dictionary.txt should be opened for output rather than input

add_entries_to_hash(\%dictionary, "master.uni");
add_entries_to_hash(\%dictionary, "correct.uni");
open (my $new_dic, ">", "merged_directory.txt");
for $english_word (sort keys %dictionary){
   print $new_dic "$english_word=$dictionary{$english_word}\n";
close $new_dic;
sub add_entries_to_hash{
   my ($dictionary, $filename)=@_;
   open(my $file_handle, '<', "$filename");
      $dictionary->{$1}=$2 if /^(.+)=(.+)$/;
exit 0;

Works for me
Old 10-01-2011
Many thanks. It works beautifully for me also.
Old 10-12-2011
hi Skrynesaver. Since your post was so helpful to gimley, I was wondering if you would be kind enough to help me out. I have three files each with a few columns. I want to merge them and produce an output file as shown below. I would really appreciate any help you could provide. Thanks so much.

chromo pos ref refFreq altAllele altFreq
chr1 55 T 0.2 C 0.8
chr1 57 C 0.8 A 0.2
chr1 60 C 0.8 A 0.2
chr2 62 T 0.2 C 0.8
chr2 67 C 0.8 A 0.2
chr2 96 T 0.2 C 0.8
chr2 100 C 0.8 A 0.2
chr3 32 T 0.2 C 0.8

chromo pos ref refFreq altAllele altFreq
chr1 55 T 0.4 C 0.6
chr1 57 C 0.7 A 0.3
chr1 96 G 0.5 A 0.5
chr2 62 T 0.15 C 0.85
chr2 67 C 0.5 A 0.5
chr2 100 C 0.8 A 0.2
chr4 32 G 0.2 C 0.8

chromo pos ref refFreq altAllele altFreq
chr1 27 C 0.7 A 0.3
chr1 55 T 0.4 C 0.6
chr1 57 C 0.7 A 0.3
chr2 62 T 0.15 C 0.85
chr2 67 C 0.5 A 0.5
chr2 100 C 0.8 A 0.2
chr4 32 G 0.2 C 0.8

chromo pos ref F1.refFreq F1.altAllele F1.altFreq F2.refFreq F2.altAllele F2.altFreq F3.refFreq F3.altAllele F3.altFreq
chr1 27 C NA NA NA NA NA NA 0.7 A 0.3
chr1 55 T 0.2 C 0.8 0.4 C 0.6 0.4 C 0.6
chr1 57 C 0.8 A 0.2 0.7 A 0.3 0.7 A 0.3
chr1 60 C 0.8 A 0.2 NA NA NA NA NA NA
chr1 96 G NA NA NA 0.5 A 0.5 NA NA NA
chr2 62 T 0.2 C 0.8 0.15 C 0.85 0.15 C 0.85
chr2 67 C 0.8 A 0.2 0.5 A 0.5 0.5 A 0.5
chr2 96 T 0.2 C 0.8 NA NA NA NA NA NA
chr2 100 C 0.8 A 0.2 0.8 A 0.2 0.8 A 0.2
chr3 32 T 0.2 C 0.8 NA NA NA NA NA NA
chr4 32 G NA NA NA 0.2 C 0.8 0.2 C 0.8
