The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
<< match syntax error megh SUN Solaris 4 10-24-2008 04:33 AM
Patern Match Question on file names prismtx Shell Programming and Scripting 1 10-15-2008 06:06 PM
Match words moutaz1983 Shell Programming and Scripting 8 01-07-2008 06:26 AM
record match pavan_test UNIX for Dummies Questions & Answers 1 01-27-2006 10:41 PM
Match and Extract tushar_johri UNIX for Dummies Questions & Answers 4 07-05-2005 11:02 PM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 01-19-2009
grossgermany grossgermany is offline
Registered User
  
 

Join Date: Jul 2007
Posts: 34
use python or awk to match names 'with error tolerance'

I think this is a very challenging problem I am facing and I have no idea how to deal with it
Suppose I have two csv files

A.csv
Toyota Camry,1998,blue
Honda Civic,1999,blue

B.csv
Toyota Inc. Camry, 2000km
Honda Corp Civic,1500km

I want to generate C.csv
Toyota Camry,1998,blue ,2000km
Honda Civic,1999,blue,1500km

The worst part of the task is that there needs to be error tolerance to deal with the variations in the company name
1.extra spaces
2.extra dots
3.phrases such as Inc, corp.

Is this mission impossible?
  #2 (permalink)  
Old 01-19-2009
summer_cherry summer_cherry is offline Forum Advisor  
Registered User
  
 

Join Date: Jun 2007
Location: Beijing China
Posts: 1,099

Code:
#!/usr/bin/perl
open FH,"<a.csv";
while(<FH>){
	chomp;
	my @tmp=split(",",$_);
	$hash{$tmp[0]}=$_;
}
close FH;
open FH,"<b.csv";
while(<FH>){
	chomp;
	my @tmp=split(",",$_,2);
	$tmp[0]=~s/(Inc|Corp)\.* //;
	$hash{$tmp[0]}.=",".$tmp[1];
}
for $key (keys %hash){
	print $hash{$key},"\n";
}

  #3 (permalink)  
Old 01-19-2009
angheloko's Avatar
angheloko angheloko is offline
Registered User
  
 

Join Date: Jul 2008
Location: Philippines
Posts: 125
Lemme give it a try:


Code:
cat a.csv | while read x; do
echo -n "$x,";grep `echo ^$x | awk '{print $1}'` b.csv | awk -F, '{print $NF}' | sed 's/^ *//g;s/ *$//g'
done

  #4 (permalink)  
Old 02-26-2009
grossgermany grossgermany is offline
Registered User
  
 

Join Date: Jul 2007
Posts: 34
I don't know perl, would you please do it in Python or SAS
  #5 (permalink)  
Old 02-27-2009
rikxik's Avatar
rikxik rikxik is offline
Registered User
  
 

Join Date: Dec 2007
Posts: 250

Code:
import re

f1, f2 = ['A.csv', 'B.csv']
a, b = open('A.csv', 'r'), open('B.csv', 'r')
sep = ','
excl = {sep:1, '.':1, 'Inc':1,'Corp':1}

ah, bh = {}, {}
for i in (a):
        l = i.strip().split(sep, 1)
        ah[ l[0] ] = l[1]
a.close()

for i in (b):
        l = i.strip().split(sep, 1)
        n = re.sub("[.,]", "", l[0])
        s = " ".join([i for i in n.split() if(excl.has_key(i) == False)])
        if(ah.has_key(s)):
                print sep.join([s, ah[s], l[1]])
        else:
                print "Could not match", s, "with", f1;
b.close()

Output:

Code:
C:\Projects\Python>type A.csv
Toyota Camry,1998,blue
Honda Civic,1999,blue

C:\Projects\Python>type B.csv
Toyota  Inc. Camry, 2000km
Honda Corp.     Civic,1500km

C:\Projects\Python>match.py
Toyota Camry,1998,blue, 2000km
Honda Civic,1999,blue,1500km

  #6 (permalink)  
Old 02-27-2009
summer_cherry summer_cherry is offline Forum Advisor  
Registered User
  
 

Join Date: Jun 2007
Location: Beijing China
Posts: 1,099

Code:
nawk 'BEGIN{FS=","}
{
if(NR==FNR)
  _[$1]=$0
else
{
  sub(/(Inc.?|Corp.?) /,"",$1)
  _[$1]=sprintf("%s,%s",_[$1],$2)
}
}
END{
  for(i in _)
  print _[i]
}' a b

  #7 (permalink)  
Old 03-01-2009
grossgermany grossgermany is offline
Registered User
  
 

Join Date: Jul 2007
Posts: 34
Thanks a lot for the reply, but is it possible to create manual translation tables:

Suppose the file is now
A.csv
Toyota Camry,1998,blue
Honda Civic,1999,blue
Acura Inf,2000,yellow

B.csv
Toyota Inc. Camry, 2000km
Honda Corp Civic,1500km
HondaUSA Inf, 2000, 2300km

I want to generate C.csv
Toyota Camry,1998,blue ,2000km
Honda Civic,1999,blue,1500km
HondaUSA Inf,2000,yellow,2300km

How to generate a list of translation table which would say: Acura translates to HondaUSA
Closed Thread

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 06:02 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0