How to merge two files with a slight twist


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to merge two files with a slight twist
# 1  
Old 10-28-2009
How to merge two files with a slight twist

Hi, a brief introduction on the soundex python module(english sound comparison):
Code:
import soundex.py
a = "neu yorkk"
b = "new york city"
print soundex.sound_similar(a, b)

output:
Code:
1


Suppose I want to merge two files, called mergeleft.csv and mergeright.csv

Mergeleft.csv:
Code:
gamewin,zip,name
90,10007,neu york yanke
20,10007,new york met
44,10007,manhatten policemens
24,10007,manhatten policemen
64,20005,dc metros
34,20005,dc eagles


Mergeright.csv:
Code:
color,zip,name
blue,10007,new york yankee
yellow,10007,new yorkk mets
red,10007,manhattan's policeman
white,10007,manhattan's policeman
red,20003,philly dog
blue,20005,dc metro
green,20005,dc eagle

These files are sorted first by zip then by name.
I want to merge the two files first by comparing zip code, and if the zip code match, for example, 10007, then compare the each of the name in mergeleft with each of the name in mergeright within the same zip code using soundex.sound_similar()

For exmaple,
in mergeleft.csv the zip code of neu york yanke is 10007

therefore ,"neu york yanke" would be compared with each of the 10007 zip-coded name in mergeright.csv, namely:
Code:
blue,10007,new york yankee
yellow,10007,new yorkk mets
red,10007,manhattan's policeman
white,10007,manhattan's policeman

If the soundex.sound_similar() output is 1, then merge the two lines. And if there is more than one possible match within the same zipcode group, assign 1 to the duplicate_flag without losing data.

desired output.csv:
Code:
gamewin,color,zip,name,duplicate_flag
90,blue,10007,new york yankee,0
20,yellow,10007,new yorkk mets,0
44,,10007,manhatten policemens,1
24,,10007,manhatten policemen,1
,red,10007,manhattan's policeman,1
,white,10007,manhattan's policeman,1
,red,20003,philly dog,0
64,blue,20005,dc metro,0
34,green,20005,dc eagle,0


Last edited by zaxxon; 10-28-2009 at 08:27 AM.. Reason: use code tags
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Slight error with my perl script that I could use some help on

So I have a perl script that prompts the user to enter either q or Q to exit the program or c to continue said program. If the user inputs anything other than those three keys they will be prompted again and again for an appropriate input. My script works for the most part except for one small... (6 Replies)
Discussion started by: Eric1
6 Replies

2. Shell Programming and Scripting

Merge files and generate a resume in two files

Dear Gents, Please I need your help... I need small script :) to do the following. I have a thousand of files in a folder produced daily. I need first to merge all files called. txt (0009.txt, 0010.txt, 0011.txt) and and to output a resume of all information on 2 separate files in csv... (14 Replies)
Discussion started by: jiam912
14 Replies

3. Shell Programming and Scripting

Checking in a directory how many files are present and basing on that merge all the files

Hi, My requirement is,there is a directory location like: :camp/current/ In this location there can be different flat files that are generated in a single day with same header and the data will be different, differentiated by timestamp, so i need to verify how many files are generated... (10 Replies)
Discussion started by: srikanth_sagi
10 Replies

4. Linux

Slight Linux Upgrade

Hello Ya'all: I hope Zaxxon is still around. I read a posting about compiling/updating the kernel from source. I'm doing a very specific upgrade, and am wondering if there is anything different or if there's an easy way to do this: I am using kernel version 2.6.18-92, and have done some... (1 Reply)
Discussion started by: Statue
1 Replies

5. Shell Programming and Scripting

Need help with a slight modification to my PERL script

Hi all, So I have a script that reads a file called FILEA.txt and in that file there are several columns. The ones that are most important are the $name $start and $stop. So currently the script takes values between the start and stop (inside) by using a program called fastamd. But what I... (4 Replies)
Discussion started by: phil_heath
4 Replies

6. Shell Programming and Scripting

Merge files of differrent size with one field common in both files using awk

hi, i am facing a problem in merging two files using awk, the problem is as stated below, file1: A|B|C|D|E|F|G|H|I|1 M|N|O|P|Q|R|S|T|U|2 AA|BB|CC|DD|EE|FF|GG|HH|II|1 .... .... .... file2 : 1|Mn|op|qr (2 Replies)
Discussion started by: shashi1982
2 Replies

7. Shell Programming and Scripting

Compare 2 files yet again but with a twist

Ok so I have a file which contains 2 columns/fields and I have another file with 2 columns. The files look like: file1: 1 33 5 345 18 2 45 1 78 31 file2: 1 c1d2t0 2 c1d3t0 3 c1d4t0 4 c1d4t0 5 c2d1t0 6 c2d1t0 7 c2d1t0 8 c2d1t0 9 c2d1t0 10 c2d1t0 (11 Replies)
Discussion started by: Autumn Tree
11 Replies

8. Shell Programming and Scripting

CMP two files with slight difference and return code

I am comparing two files which are identical except for the timestamp which is incorporated within the otherwise same 372 bytes. I am using the command: cmp -s $Todays_file $Yesterdays_file -i 372 When I run the command without the -i 372 it shows the difference i.e. the timestamp.... (5 Replies)
Discussion started by: gugs
5 Replies

9. UNIX for Dummies Questions & Answers

SCCS - A slight scripting issue

I used %H%M for hours and minutes within a date variable, to latch the date/time onto the end of a file, the script it was in is now under SCCS control and the %H% is a predefined parameter for SCCS, so it tags a date with a "/" character in it. Is there a way to tell SCCS to ignore anything... (0 Replies)
Discussion started by: tangent
0 Replies

10. UNIX for Dummies Questions & Answers

Having a slight problem!?

having a slight problem. any clues would help. Can't seem to get any output when I run a simple echo script. grex.cyberspace.org% chmod a+x test grex.cyberspace.org% ls -l test -rwxrwx--x 1 gordybh cohorts 20 Dec 13 20:22 test grex.cyberspace.org% cat test #!/bin/sh echo test... (2 Replies)
Discussion started by: wmosley2
2 Replies
Login or Register to Ask a Question