Comparing Strings in 2 .csv/txt files?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Comparing Strings in 2 .csv/txt files?
# 8  
Old 02-17-2011
Code:
awk -F"," 'NR==FNR{a[$1]++;next}a[$2]' 1940births.csv ethnicity.csv

NR The total number of input records seen so far.
FNR The input record number in the current input file,FNR restarts the counting at the begining of each input file.

Code:
if NR==FNR then {a[$1]++;next}

# Process the first file(1940births.csv), fill the array 'a' with index as $1 of 1940births.csv
Code:
else a[$2]

' # process the second file(ethnicity.csv),if $2 of second file is present in array 'a' then print the current record for second file
Code:
1940births.csv ethnicity.csv

# input files
This User Gave Thanks to pravin27 For This Post:
# 9  
Old 02-17-2011
Quote:
Originally Posted by bartus11
Code:
join -t, -11 -22 -o1.1,1.2,2.3 1940births.csv ethnicity.csv

"join" is a standard Unix/Linux utility to join files based on common field.
  • -t option specifies separator of the fields (comma in this case).
  • -1n selects "nth" field from first file as field to join on (in this case n=1, so first field from first file is selected).
  • -2n selects "nth" field from second file as field to join on (in this case n=2, so second field from second file is selected).
  • -o specifies desired output (in this case 1.1 - first field from first file, 1.2 - second field from first file and 2.3 - third field from second file).
  • 1940births.csv - first file, ethnicity.csv - second file
Thanks, this was a very concise and easy-to-understand explanation. I will try using this implementation after memorizing how it works.

Quote:
Originally Posted by pravin27
Code:
awk -F"," 'NR==FNR{a[$1]++;next}a[$2]' 1940births.csv ethnicity.csv

NR The total number of input records seen so far.
FNR The input record number in the current input file,FNR restarts the counting at the begining of each input file.

Code:
if NR==FNR then {a[$1]++;next}

# Process the first file(1940births.csv), fill the array 'a' with index as $1 of 1940births.csv
Code:
else a[$2]

' # process the second file(ethnicity.csv),if $2 of second file is present in array 'a' then print the current record for second file
Code:
1940births.csv ethnicity.csv

# input files
Thanks to you as well. You've provided me with code that is easy to understand and gets the job done. I will also try using this implementation and learning it so I can help people like me Smilie

---------- Post updated 02-17-11 at 03:51 PM ---------- Previous update was 02-16-11 at 11:32 PM ----------

@bartus11 - Is there a method that I can use for ALL lines in a file instead of just one?
# 10  
Old 02-18-2011
Post sample input files consisting of multiple lines please.
# 11  
Old 02-18-2011
Say I have
Code:
#names.csv
Jackie Chan,1954,O+
Chuck Norris,1930,A-
Bruce Lee,1940,O+

Where the 1st column is the name, 2nd is DOB, and 3rd is blood type.

And I had a second file
Code:
#o_pos_ssn.csv
Jackie Chan,O+,123-45-6789
Bruce Lee,O+,095-34-1647
Jet Li,O+,067-127-5791

Where the 1st column is name, 2nd is blood type, and 3rd is SSN.

If I use this:
Code:
join -t, -13 -22 -o1.1,1.2,2.2,2.3 names.csv o_pos_ssn.csv > result.csv

The output will be:
Code:
#result.csv
Jackie Chan,1954,O+,123-45-6789

I want it to output ALL lines that fit the requirements and get the code below:
Code:
#result.csv
Jackie Chan,1954,O+,123-45-6789
Bruce Lee,1940,O+,095-34-1647

How do I make the join command go through all lines?
# 12  
Old 02-18-2011
I forgot that files should be sorted. Try:
Code:
join -t, -11 -21 -o1.1,1.2,2.2,2.3 <(sort names.csv) <(sort o_pos_ssn.csv)

This User Gave Thanks to bartus11 For This Post:
# 13  
Old 02-18-2011
Quote:
Originally Posted by bartus11
I forgot that files should be sorted. Try:
Code:
join -t, -11 -21 -o1.1,1.2,2.2,2.3 <(sort names.csv) <(sort o_pos_ssn.csv)

Thanks, you're amazing!

And can you explain to me why you matched " -11 -21 " and still got it to work rather than the blood types? Confused.

Last edited by chickeneaterguy; 02-18-2011 at 04:01 AM..
# 14  
Old 02-18-2011
Well, I just used unique key to join those files Smilie Joining on blood type would create some garbage output IMO (and is a bit of nonsense too). Out of pure curiosity:
Code:
% join -t, -13 -22 -o1.1,1.2,2.2,2.3 <(sort -t, -k3 names.csv) <(sort -t, -k2 o_pos_ssn.csv)
Bruce Lee,1940,O+,067-127-5791
Bruce Lee,1940,O+,095-34-1647
Bruce Lee,1940,O+,123-45-6789
Jackie Chan,1954,O+,067-127-5791
Jackie Chan,1954,O+,095-34-1647
Jackie Chan,1954,O+,123-45-6789

This User Gave Thanks to bartus11 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Comparing two CSV files

I have two csv files and im trying to compare them. e.g. SAMPLE DATA: file one: ZipCode Name 20878 Washington 10023 Missouri 20304 Maryland file two: ID Name City ZipCode 11654 ... (11 Replies)
Discussion started by: dan139
11 Replies

2. Shell Programming and Scripting

Comparing 2 CSV files and sending the difference to a new csv file

(say) I have 2 csv files - file1.csv & file2.csv as mentioned below: file1.csv ID,version,cost 1000,1,30 2000,2,40 3000,3,50 4000,4,60 file2.csv ID,version,cost 1000,1,30 2000,2,45 3000,4,55 6000,5,70 ... (1 Reply)
Discussion started by: Naresh101
1 Replies

3. Shell Programming and Scripting

Comparing two .txt files

i am working on a shell script and need help in the comparing part of it. for e.g. there two text files like this: file1.txt Code: name1 name2 name3 file1 has to be comared with file2 defaultfile.txt Code: name1 name2 (16 Replies)
Discussion started by: draghun9
16 Replies

4. Shell Programming and Scripting

Comparing 2 difference csv files

Hello, I have about 10 csv files which range from csv1 - csv10. Each csv file has same type/set of tabs and we have around 5-6 tabs for each of the csv file which have slightly different content(data). A sample of CSV1 is shown below: Joins: Data related to Joins, it can be any number of... (2 Replies)
Discussion started by: bobby1015
2 Replies

5. UNIX for Dummies Questions & Answers

Comparing two txt files with AWK

Hi, I need to compare two text files with awk. File1: ------- chr1 43815007 43815009 COSM19193 REF=TG;OBS=AA;ANCHOR=G AMPL495041 chr1 43815008 43815009 COSM18918 REF=G;OBS=T;ANCHOR=T AMPL495041 chr1 115256527 115256528 ... (6 Replies)
Discussion started by: RushiK
6 Replies

6. Shell Programming and Scripting

comparing csv files

Hi! I'm just new to shell scripting n simple tasks looks so tough in initial stage. i need to write a script which will read a property file, property file will be containing count of the csv files, and in a folder(same folder) there will be respective csv files. like Property file data1=100... (3 Replies)
Discussion started by: sukhdip
3 Replies

7. Shell Programming and Scripting

Comparing two .txt files in shell scripting...

Hi, I have two big .txt files.and i need to compare those two files and redirect it into some other file. If any body wants to resolve this issue then i can send the two text files. Need some quick responce. Thanks, prakash (10 Replies)
Discussion started by: prakash123
10 Replies

8. UNIX for Dummies Questions & Answers

comparing strings in seperate files

Hello, I am comparing files with for mismatches using fgrep but I've run into a problem. fgrep -vf $file1 $file2 > mismatches.dat file1 and file2 both contain file names on each line file1 has filenames which are up to 92 characters long and contain the "$" char. example file name:... (2 Replies)
Discussion started by: orahi001
2 Replies

9. Shell Programming and Scripting

Comparing two txt files - pls help

Hi, I have some text files. I need a separate shell say 1.sh in which i can open a particular text file and compare with another txt file. For example: 1.log.txt contains apple ball cat goat 2.log.txt contains goat cat lion apple fox In my i.sh i need to write script to... (5 Replies)
Discussion started by: jisha
5 Replies

10. UNIX for Dummies Questions & Answers

text drivers for .txt and .csv files ??

As a newcomer to UNIX I need to know if there are .txt and .csx ODBC drivers available. I want to import some data and the UNIX tech which I am in contact with does not know much about UNIX! He asked me to find out if these drivers existed or use a 'LOAD DATA INFILE' command, which is not too... (4 Replies)
Discussion started by: noodles
4 Replies
Login or Register to Ask a Question