Compare and Merge files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Compare and Merge files
# 1  
Old 01-07-2013
Compare and Merge files

Hi All,
I have two different files as shown below separated by a "|". I need to compare the first column from both the files and if they match merge both the columns.

File 1
Code:
"S00172012"|"CHRONIC RENAL FAILURE"|""|"I"
"S00159962"|"SUBENDO INFRC-INIT EPISD"|""|"I"
"S00255303"|"BENIGN NEOPLASM LG BOWEL"|""|"I"
"S00012361"|"DIABETES MELLITUS"|""|"I"
"S00052174"|"ALZHEIMER'S DISEASE"|""|"I"
"S00035358"|"SPRAIN OF KNEE & LEG NEC"|""|"I"
"S00058846"|"DIAB EYE MANIF TYPE I"|""|"I"
"S00050977"|"ALCOHOLIC GASTRITIS"|""|"I"
"S00005471"|"FX OLECRAN PROC ULNA-CL"|""|"I"
"S00005471"|"FX OLECRAN PROC ULNA-CL"|""|"I"
"S00026563"|"CONVULSIONS"|""|"I"
"S00008617"|"MALIGN NEOPL BREAST NOS"|""|"I"
"S00005471"|"FX OLECRAN PROC ULNA-CL"|""|"I"

File 2
Code:
"S00172012"|"00000274AWJO"|"904395"|"Y"|"20101001"|"420.90"|"262"|"584.9"|"576.1"|"428.0"|"287.5"|"427.31"|"562.11"|"574.51"|"2200"|"20110318"|""|"20110318"|"DRG"|"2200"|"Acute Care Inpatient R&B"
"S00159962"|"00000264B0NN"|"900003"|"Y"|"20100921"|"441.4"|"V45.89"|""|""|""|""|""|""|""|"6120"|"20110401"|""|"20110401"|"APC"|"6120"|"Radiology-MF POS"
"S00255303"|"00000264B0NN"|"900003"|"Y"|"20100921"|"441.4"|"V45.89"|""|""|""|""|""|""|""|"6160"|"20110401"|""|"20110401"|"APC"|"6160"|"Hospital Pharmacy-Drug-MF POS"
"S00012361"|"00000253AOUF"|"900003"|"Y"|"20100910"|"786.50"|"496"|"562.10"|"V12.72"|"600.00"|"285.9"|"564.00"|""|""|"1520"|"20110401"|""|"20110401"|"APC"|"1520"|"Observation Room-Fac"
"S00052174"|"00000253AOUF"|"900003"|"Y"|"20100910"|"786.50"|"496"|"562.10"|"V12.72"|"600.00"|"285.9"|"564.00"|""|""|"5135"|"20110401"|""|"20110401"|"APC"|"5135"|"Prev: Smoking Cessation"
"S00002340"|"00000253AOUF"|"900003"|"Y"|"20100910"|"786.50"|"496"|"562.10"|"V12.72"|"600.00"|"285.9"|"564.00"|""|""|"5840"|"20110401"|""|"20110401"|"APC"|"5840"|"Laboratory & Pathology-MF POS"
"S00002340"|"00000253AOUF"|"900003"|"Y"|"20100910"|"786.50"|"496"|"562.10"|"V12.72"|"600.00"|"285.9"|"564.00"|""|""|"6600"|"20110401"|""|"20110401"|"APC"|"6600"|"Venipuncture-Fac-MF POS"
"S00002364"|"00000035ARXB"|"903700"|"Y"|"20100204"|"802.4"|"812.01"|"294.8"|"427.31"|"802.0"|"412"|"250.00"|"401.9"|"272.4"|"1920"|"20110416"|""|"20110416"|""|"1920"|"Laboratory & Pathology-HF POS"
"S00002364"|"00000035ARXB"|"903700"|"Y"|"20100204"|"802.4"|"812.01"|"294.8"|"427.31"|"802.0"|"412"|"250.00"|"401.9"|"272.4"|"2120"|"20110416"|""|"20110416"|""|"2120"|"Physical Therapy Oth Prov"
"S00002364"|"00000035ARXB"|"903700"|"Y"|"20100204"|"802.4"|"812.01"|"294.8"|"427.31"|"802.0"|"412"|"250.00"|"401.9"|"272.4"|"2200"|"20110415"|""|"20110415"|"DRG"|"2200"|"Acute Care Inpatient R&B"
"S00002364"|"00000035ARXB"|"903700"|"Y"|"20100204"|"802.4"|"812.01"|"294.8"|"427.31"|"802.0"|"412"|"250.00"|"401.9"|"272.4"|"2360"|"20110416"|""|"20110416"|""|"2360"|"Radiology-HF POS"
"S00002364"|"00000035ARXB"|"903700"|"Y"|"20100204"|"802.4"|"812.01"|"294.8"|"427.31"|"802.0"|"412"|"250.00"|"401.9"|"272.4"|"2400"|"20110416"|""|"20110416"|""|"2400"|"Hospital Pharmacy-Drug-Inpt"
"S00002364"|"00000035ARXB"|"903700"|"Y"|"20100204"|"802.4"|"812.01"|"294.8"|"427.31"|"802.0"|"412"|"250.00"|"401.9"|"272.4"|"3999"|"20110416"|""|"20110416"|""|"3999"|"Default Med Def-HF Risk"
"S00003163"|"00000133ARSG"|"900008"|"Y"|"20100513"|"780.4"|"530.81"|"723.1"|""|""|""|""|""|""|"1840"|"20110311"|""|"20110311"|"APC"|"1840"|"Emergency Room-Fac"
"S00003163"|"00000133ARSG"|"900008"|"Y"|"20100513"|"780.4"|"530.81"|"723.1"|""|""|""|""|""|""|"1920"|"20110311"|""|"20110311"|"APC"|"1920"|"Laboratory & Pathology-HF POS"
"S00003163"|"00000133ARSG"|"900008"|"Y"|"20100513"|"780.4"|"530.81"|"723.1"|""|""|""|""|""|""|"1980"|"20110311"|""|"20110311"|"APC"|"1980"|"EKG - ER"
"S00003163"|"00000133ARSG"|"900008"|"Y"|"20100513"|"780.4"|"530.81"|"723.1"|""|""|""|""|""|""|"2360"|"20110311"|""|"20110311"|"APC"|"2360"|"Radiology-HF POS"
"S00003163"|"00000133ARSG"|"900008"|"Y"|"20100513"|"780.4"|"530.81"|"723.1"|""|""|""|""|""|""|"2920"|"20110311"|""|"20110311"|"APC"|"2920"|"Venipuncture-Fac-HF POS"
"S00003552"|"00000281ASID"|"900003"|"Y"|"20101008"|"275.42"|"252.00"|""|""|""|""|""|""|""|"6120"|"20110415"|""|"20110415"|"APC"|"6120"|"Radiology-MF POS"
"S00003842"|"00000068B9QM"|"904395"|"Y"|"20100309"|"820.21"|"584.9"|"276.2"|"263.9"|"428.0"|"416.9"|"427.32"|"276.51"|"427.31"|"2200"|"20110422"|""|"20110422"|"DRG"|"2200"|"Acute Care Inpatient R&B"
"S00003968"|"00000188B0TP"|"900008"|"Y"|"20100707"|"787.20"|"427.31"|"401.9"|"414.01"|""|""|""|""|""|"1840"|"20110325"|""|"20110325"|"APC"|"1840"|"Emergency Room-Fac"
"S00003968"|"00000188B0TP"|"900008"|"Y"|"20100707"|"787.20"|"427.31"|"401.9"|"414.01"|""|""|""|""|""|"1840"|"20110415"|""|"20110415"|"APC"|"1840"|"Emergency Room-Fac"
"S00003968"|"00000188B0TP"|"900008"|"Y"|"20100707"|"787.20"|"427.31"|"401.9"|"414.01"|""|""|""|""|""|"1920"|"20110325"|""|"20110325"|"APC"|"1920"|"Laboratory & Pathology-HF POS"

Desired output :

Code:
"S00172012"|"CHRONIC RENAL FAILURE"|""|"I"|"00000274AWJO"|"904395"|"Y"|"20101001"|"420.90"|"262"|"584.9"|"576.1"|"428.0"|"287.5"|"427.31"|"562.11"|"574.51"|"2200"|"20110318"|""|"20110318"|"DRG"|"2200"|"Acute Care Inpatient R&B" 
"S00159962"|"SUBENDO INFRC-INIT EPISD"|""|"I"|"00000264B0NN"|"900003"|"Y"|"20100921"|"441.4"|"V45.89"|""|""|""|""|""|""|""|"6120"|"20110401"|""|"20110401"|"APC"|"6120"|"Radiology-MF POS"
"S00255303"|"BENIGN NEOPLASM LG BOWEL"|""|"I"|"00000264B0NN"|"900003"|"Y"|"20100921"|"441.4"|"V45.89"|""|""|""|""|""|""|""|"6160"|"20110401"|""|"20110401"|"APC"|"6160"|"Hospital Pharmacy-Drug-MF POS"
"S00012361"|"DIABETES MELLITUS"|""|"I"|"00000253AOUF"|"900003"|"Y"|"20100910"|"786.50"|"496"|"562.10"|"V12.72"|"600.00"|"285.9"|"564.00"|""|""|"1520"|"20110401"|""|"20110401"|"APC"|"1520"|"Observation Room-Fac"
"S00052174"|"ALZHEIMER'S DISEASE"|""|"I"|"00000253AOUF"|"900003"|"Y"|"20100910"|"786.50"|"496"|"562.10"|"V12.72"|"600.00"|"285.9"|"564.00"|""|""|"5135"|"20110401"|""|"20110401"|"APC"|"5135"|"Prev: Smoking Cessation"

I tried this using the following awk code but no luck.

Tried code :
Code:
awk -F, 'NR==FNR { a[$2]=$0; next  }
           a[$2] { print $0, a[$2] }
        ' file2 file1 > newfile

Any help will be appreciated!!

Last edited by Scott; 01-07-2013 at 01:56 PM.. Reason: Code tags, please... not ICODE tags
# 2  
Old 01-07-2013
In what way did it not work?
# 3  
Old 01-07-2013
It created an empty output file

Code:
(rk3388) => awk -F, 'NR==FNR { a[$2]=$0; next  }
>            a[$2] { print $0, a[$2] }
>         ' file2 file1 > newfile
(rk3388) => more newfile
(rk3388) =>

# 4  
Old 01-07-2013
On third look, I see a bug in that:

Code:
-F,

That tells it to use , as the separator.

Your file needs -F'|' instead.
This User Gave Thanks to Corona688 For This Post:
# 5  
Old 01-07-2013
I have code in c, cpp, and python that does this, but I have never tried it in something like awk. Of course, my cpp code is like 300 lines and the interperter version is probably one line. I would be happy to post the src code if you want to compile it and give it a try. You specify the two files and the field to merge on (index). It first counts the number of lines in both files to make sure they are the same (throws exception if they are not), then it compares line by line to make sure the the merge field is the same for both files. If the fields match, the lines are merged for output. The index col is only included once in the output, meaning it's skipped in the output of the second file.

LMHmedchem
# 6  
Old 01-07-2013
Thanks LMHmedchem But I need to do this in awk or sed.

Also found another way of doing it , incase it is helpful for others.


Code:
awk -v input_file=file1 'BEGIN { FS="|"; while((getline<input_file)>0){sub1=substr($0,1,11); subArr[sub1]=""}}{ if ($1 in subArr) print $0;}' file2  > file3


.

Last edited by Scrutinizer; 01-07-2013 at 02:28 PM.. Reason: icode tags to code tags
This User Gave Thanks to nua7 For This Post:
# 7  
Old 01-07-2013
Quote:
Originally Posted by nua7
Thanks LMHmedchem But I need to do this in awk or sed.

Also found another way of doing it , incase it is helpful for others.


Code:
awk -v input_file=file1 'BEGIN { FS="|"; while((getline<input_file)>0){sub1=substr($0,1,11); subArr[sub1]=""}}{ if ($1 in subArr) print $0;}' file2  > file3


.
I used a hard language because I needed some specific exception handling and a variety of checks and such. I also have a version of this where one file is always bigger than the other and the code searches the bigger file for the matching row. I am sure there are ways to do this kind of thing with stream interpreters, but that's well past what I know how to do.

Thanks for the post, it is always nice to see a different way. It's terribly funny that these three lines do, in essence, what it takes me 300 lines to do in cpp, plus your version will work on most any os without needing to be recompiled.

LMHmedchem
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Compare and merge two big CSV files

Hi all, i need help. I have two csv files with a huge amount of data. I need the first column of the first file, to be compared with the data of the second, to have at the end a file with the data not present in the second file. Example File1: (only one column) profile_id 57036226... (11 Replies)
Discussion started by: SirMannu
11 Replies

2. Shell Programming and Scripting

Compare two files and merge into third

Hello: Newbie with Awk. Trying to compare two files and merge data based on CID. Please see the input file format and desired output. Any help is appreciated. TIA Input File1 CID1 --- TYP1 --- DCN1 --- INDATE1 --- IN-DATA1 CID2 --- TYP2 --- DCN2 --- INDATE2 --- IN-DATA2 CID3 ---... (6 Replies)
Discussion started by: wincrazy
6 Replies

3. Shell Programming and Scripting

Checking in a directory how many files are present and basing on that merge all the files

Hi, My requirement is,there is a directory location like: :camp/current/ In this location there can be different flat files that are generated in a single day with same header and the data will be different, differentiated by timestamp, so i need to verify how many files are generated... (10 Replies)
Discussion started by: srikanth_sagi
10 Replies

4. Shell Programming and Scripting

AWK compare/merge

File1 2917,`0722,RDF1+TDEV,90(6),33,03E:0_12E:0,10000000c96c4af3_10000000c96c6e88,BL_db00p01e 2917,`0781,RDF1+TDEV,100(5),33,03E:0_12E:0,10000000c96c4af3_10000000c96c6e88,BL_db00p01e File2 2917,`0722,RDF1+TDEV,90(6),03E:0_12E:0,10000000c96c4af3_10000000c96c6e88,BL_db00p01e... (4 Replies)
Discussion started by: greycells
4 Replies

5. Shell Programming and Scripting

Shell Scripting: Compare pattern in two files and merge the o/p in one.

one.txt ONS.1287677000.820.log 20Oct2010 ONS.1287677000.123.log 21Oct2010 ONS.1287677000.456.log 22Oct2010 two.txt ONS.1287677000.820.log:V AC CC EN ONS.1287677000.123.log:V AC CC EN ONS.1287677000.820.log:V AC CC EN In file two.txt i have to look for pattern which column one... (17 Replies)
Discussion started by: saluja.deepak
17 Replies

6. Shell Programming and Scripting

Require compare command to compare 4 files

I have four files, I need to compare these files together. As such i know "sdiff and comm" commands but these commands compare 2 files together. If I use sdiff command then i have to compare each file with other which will increase the codes. Please suggest if you know some commands whcih can... (6 Replies)
Discussion started by: nehashine
6 Replies

7. UNIX for Dummies Questions & Answers

compare columns from 2 files and merge

Dear all, Being new to Unix i have a problem. I have 2 files: File 1: 118,1,0,2,3,0,5,0.3,0,0.3,0.6,1 118,2,1,2,2,0,5,0.4,0,0.4,0.4,1 118,4,2,0,3,0,5,0.7,0,0.3,0.6,1 118,6,4,1,0,0,5,0.8,0,0.2,0,1 File 2: 118,1,BFGL-NGS-109695,3610326,0,18,1,0.556,0.389,0.056,0.25,0.8183... (2 Replies)
Discussion started by: samwilkinson
2 Replies

8. Shell Programming and Scripting

Merge files of differrent size with one field common in both files using awk

hi, i am facing a problem in merging two files using awk, the problem is as stated below, file1: A|B|C|D|E|F|G|H|I|1 M|N|O|P|Q|R|S|T|U|2 AA|BB|CC|DD|EE|FF|GG|HH|II|1 .... .... .... file2 : 1|Mn|op|qr (2 Replies)
Discussion started by: shashi1982
2 Replies

9. Shell Programming and Scripting

compare the column from 3 files and merge that line

I have 3 file, each of has got 80000 records. file1.txt ----------------------- ABC001;active;modify;accept; ABC002;notactive;modify;accept; ABC003;notactive;no-modify;accept; ABC004;active;modify;accept; ABC005;active;no-modify;accept; file2.txt ---------------------------... (8 Replies)
Discussion started by: ganesh_mak
8 Replies

10. Shell Programming and Scripting

Compare two files and merge columns in a third

Hi, I'm working with snmp, with a little script I'm able to obtain from a switch a list with a couple of values with this format Port Mac 1 00:0A:0B:0C:0D:0E .... (hundred of entries) Now with a simple arp on a router I am able to obtain another list 00:0A:0B:0C:0D:0E... (20 Replies)
Discussion started by: CM64
20 Replies
Login or Register to Ask a Question