Merging data from one file into another


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Merging data from one file into another
# 1  
Old 10-01-2011
Merging data from one file into another

Hello,
I have a master database of a dictionary with the following structure:
a=b (b is a Unicode string)
a is the English part and b is the equivalent in a foreign language
I have also another file which has a database where the /b/ part of the string has been corrected by an expert. let us assume that this becomes a=c. i.e. the left hand side of the delimiter remains always the same i.e. /a/ will not be touched, but now b has been corrected to c
I have solved the problem with a program in c language but would an awk or a perl script do the same job ?
Any help given will be gratefully acknowledged. Since I am a newbie to PERl, my programming skills in that script are really not still up to it. Perl is a swiss kinfe and the short script does the job more efficiently than say a program in C language
Many thanks
# 2  
Old 10-01-2011
If I understand you question correctly , you have a large dictionary and a smaller dictionary of corrected entries and you want to change a=b to a=c if the a=c record exists in the corrected dictionary?
Code:
#!/usr/bin/perl

#WARNING: THIS CODE IS COMPLETELY UNTESTED
use feature 'unicode_strings';

add_entries_to_hash(\%dictionary, "large_dictionary.txt");
add_entries_to_hash(\%dictionary, , "corrected_entries.txt");
open (my $new_dic, "<", "merged_directory.txt");
for $english_word (sort keys %dictionary){
   print $new_dic "$english_word=$dictionary{$english_word}\n"; 
}
close $new_dic;
sub add_entries_to_hash{
   my ($dictionary, $filename)=@_;
   open(my $file_handle, '<', "$filename");
   while(<$file_handle>){
      chomp;
      $dictionary->{$1}=$2 if /^(.+)=(.+)$/;
   }
}
exit 0;


Last edited by Skrynesaver; 10-01-2011 at 04:34 AM..
This User Gave Thanks to Skrynesaver For This Post:
# 3  
Old 10-01-2011
Hello,
Tried what you gave, but there is no output. I even removed the double comma between dictionary and correct.uni and tried. Here is the code:

Code:
#!/usr/bin/perl

#WARNING: THIS CODE IS COMPLETELY UNTESTED
use feature 'unicode_strings';

add_entries_to_hash(\%dictionary, "master.uni");
add_entries_to_hash(\%dictionary, "correct.uni");
open (my $new_dic, "<", "merged_directory.txt");
for $english_word (sort keys %dictionary){
   print $new_dic "$english_word=$dictionary{$english_word}\n"; 
}
close $new_dic;
sub add_entries_to_hash{
   my ($dictionary, $filename)=@_;
   open(my $file_handle, '<', "$filename");
   while(<$file_handle>){
      chomp;
      $dictionary->{$1}=$2 if /^(.+)=(.+)$/;
   }
}
exit 0;

I have renamed the two dics master and correct and am appending them in a zip file. When the script runs (under Windows) I don't get any output. The differences are on lines 4 and 5 in the master file. This was a test. Normally the file would be much bigger.
Many thanks for all the trouble
# 4  
Old 10-01-2011
Ooops, my fault, the file mergerd_dictionary.txt should be opened for output rather than input

Code:
add_entries_to_hash(\%dictionary, "master.uni");
add_entries_to_hash(\%dictionary, "correct.uni");
open (my $new_dic, ">", "merged_directory.txt");
for $english_word (sort keys %dictionary){
   print $new_dic "$english_word=$dictionary{$english_word}\n";
}
close $new_dic;
sub add_entries_to_hash{
   my ($dictionary, $filename)=@_;
   open(my $file_handle, '<', "$filename");
   while(<$file_handle>){
      chomp;
      $dictionary->{$1}=$2 if /^(.+)=(.+)$/;
   }
}
exit 0;

Works for me
# 5  
Old 10-01-2011
Many thanks. It works beautifully for me also.
# 6  
Old 10-12-2011
hi Skrynesaver. Since your post was so helpful to gimley, I was wondering if you would be kind enough to help me out. I have three files each with a few columns. I want to merge them and produce an output file as shown below. I would really appreciate any help you could provide. Thanks so much.


File1
chromo pos ref refFreq altAllele altFreq
chr1 55 T 0.2 C 0.8
chr1 57 C 0.8 A 0.2
chr1 60 C 0.8 A 0.2
chr2 62 T 0.2 C 0.8
chr2 67 C 0.8 A 0.2
chr2 96 T 0.2 C 0.8
chr2 100 C 0.8 A 0.2
chr3 32 T 0.2 C 0.8

File2
chromo pos ref refFreq altAllele altFreq
chr1 55 T 0.4 C 0.6
chr1 57 C 0.7 A 0.3
chr1 96 G 0.5 A 0.5
chr2 62 T 0.15 C 0.85
chr2 67 C 0.5 A 0.5
chr2 100 C 0.8 A 0.2
chr4 32 G 0.2 C 0.8

File3
chromo pos ref refFreq altAllele altFreq
chr1 27 C 0.7 A 0.3
chr1 55 T 0.4 C 0.6
chr1 57 C 0.7 A 0.3
chr2 62 T 0.15 C 0.85
chr2 67 C 0.5 A 0.5
chr2 100 C 0.8 A 0.2
chr4 32 G 0.2 C 0.8

OutputFile
chromo pos ref F1.refFreq F1.altAllele F1.altFreq F2.refFreq F2.altAllele F2.altFreq F3.refFreq F3.altAllele F3.altFreq
chr1 27 C NA NA NA NA NA NA 0.7 A 0.3
chr1 55 T 0.2 C 0.8 0.4 C 0.6 0.4 C 0.6
chr1 57 C 0.8 A 0.2 0.7 A 0.3 0.7 A 0.3
chr1 60 C 0.8 A 0.2 NA NA NA NA NA NA
chr1 96 G NA NA NA 0.5 A 0.5 NA NA NA
chr2 62 T 0.2 C 0.8 0.15 C 0.85 0.15 C 0.85
chr2 67 C 0.8 A 0.2 0.5 A 0.5 0.5 A 0.5
chr2 96 T 0.2 C 0.8 NA NA NA NA NA NA
chr2 100 C 0.8 A 0.2 0.8 A 0.2 0.8 A 0.2
chr3 32 T 0.2 C 0.8 NA NA NA NA NA NA
chr4 32 G NA NA NA 0.2 C 0.8 0.2 C 0.8
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Merging data horizontally with newlines in files

Hi Everyone, I have two files file1 and file2 with these contents cat file1 AAAAA 01/03/2014 04:01:23 BBBB 01/03/2014 03:03:34 CCCcc 01/03/2014 03:03:34 cat file2 1 RED 1 HHHH 1 TTTT 1 BBBBB I tried the below... (2 Replies)
Discussion started by: Aditya_001
2 Replies

2. Shell Programming and Scripting

Help with merging data into single line.

Hi, My input is <message> looking for a big <message>Does fit my G74 laptop. Makes the 10 pound. <message> <message>This bag is the only one I could find to fit my awesome ASUS G74S. <message> <message> Great bag my only wish is that they had put a pocket in which to store and... (6 Replies)
Discussion started by: pamu
6 Replies

3. UNIX for Dummies Questions & Answers

Merging data in a file

Hello, Firstly I just wanted to say that I'm not a programmer at all and appreciate any help you can give. I am trying to create a shellscript that reformats the file and adding up colums 5 and 6 for those sections that are continuation of the previous line(s) (signified by beginning with '*')... (4 Replies)
Discussion started by: neilh1703
4 Replies

4. Shell Programming and Scripting

Merging data from 2 files of different lengths?

Hi all, Sorry if someone has answered something like this already, but I have a problem. I am not brilliant with "awk" but think it should be the command to use to get what I am after. I have 2 files: job-file (several hundred lines like): 1018003,LONG MU WAN,1113S 1018004,LONG MU... (4 Replies)
Discussion started by: sgb2301
4 Replies

5. Shell Programming and Scripting

formatting and merging 2 data files

Hi, I have 2 files that I got as an output from another program. They are : File 1 ((((((CtBJa:197.0,CtBTz:197.0):85.0,CtAHr:197.0):116.0,CtDUw:197.0):176.0,CtSwe:197.0):110.0, (CtL2b:197.0,Ct4Bu:197.0):196.0):197.0,CmuNg:197.0);... (5 Replies)
Discussion started by: Lucky Ali
5 Replies

6. Shell Programming and Scripting

Removing Carriage return and merging data

Hi, I am trying to remove the carriage return on the record which starts with ADD, MODIFY, or DELETE keyword as the first value in the record. If the records does not start with anyone of these keywords then combine the records with the previous record (line). Input File name xyz.txt... (6 Replies)
Discussion started by: naveed
6 Replies

7. Shell Programming and Scripting

Merging last and syslog data on time

This is on a HP-UX system. I need to merge the 2 reports, for each line in syslog I need to lookup who was logged in to the pts/# based on the time from the last.txt report. Here is what I get from sulog.log cat syslog | grep "su:" | grep "Jun 14" Jul 14 08:02:48 server1 su: - 2 ... (8 Replies)
Discussion started by: Ikon
8 Replies

8. Shell Programming and Scripting

merging CSV data using a one liner from shell?

I'm trying to merge multiple CSV (comma separated value) files into one large master file. All files have a field that is unique to act as the key for entry/merging into the master file & and all files have the same number of fields that are in the master file. I'll give an example here: ... (2 Replies)
Discussion started by: jjinca
2 Replies

9. Shell Programming and Scripting

Need help for 2 data file merging

Hello Please help me to write Shell script. I want to merge 2 data files . The data files have common columns The data file A have 3 columns Host Version Numberof Failuers The data file B have also 3 coulmns Host Version NumberofFailuers . I want to merge A and B file... (2 Replies)
Discussion started by: getdpg
2 Replies

10. Shell Programming and Scripting

Merging data

Hi, I have the following problem: Input: "num1","num2","num3",num4,num5,"num6" required output: "num1num2","num3",num4,num5,"num6" I need to join field 1 and field 2 together but I always end up getting: "num1""num2","num3",num4,num5,"num6" Note that not all fields have " at both... (8 Replies)
Discussion started by: ReV
8 Replies
Login or Register to Ask a Question