Easy unix/sed question that I could have done 10 years ago!


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Easy unix/sed question that I could have done 10 years ago!
# 1  
Old 09-06-2008
Easy unix/sed question that I could have done 10 years ago!

Hi all and greetings from Ireland!

I have not used much unix or awk/sed in years and have forgotten a lot.
Easy enough query tho.

I am cleansing/fixing 10,000 postal addresses using global replacements.
I have 2 pipe delimited files , one is basically a spell checker for geographical areas. The second file is actual addresses.

Sample file 1 - 100+ lines (basically a spell checker):

|Irlllland|Ireland|
|Dubblin|Dublin|
|Corrk|Cork|
etc..

Sample file 2 - 10,000+ lines (Addresses to be cleansed):

|10 Main Street Irlllland|
|11 High Road Irlllland|
|1 High Road, Corrk|

The output required is :

|10 Main Street Ireland|
|11 High Road Ireland|
|1 High Road, Cork|


I am very rusty but reckon I need a loop with a global substition in it.
I used to know unix, awk and sed reasonably well but have forgotten the basic syntax.

All helpers there?
# 2  
Old 09-06-2008
What about this approch in sed?

1. Making a pattern file.

Code:
sed -e 's!|!/!g' -e 's/^/s&/' file1 >sed_pattern_file

2. Using the pattern file to do replacement in file2

Code:
sed -f sed_pattern_file file2

Output:

Quote:
|10 Main Street Ireland|
|11 High Road Ireland|
|1 High Road, Cork|
# 3  
Old 09-06-2008
And the one in awk:

Code:
awk 'BEGIN{ FS="|"; i=1; while((getline < "file1") > 0) { arr[i]=$2; arr_val[i++]=$3; } } { for (j=1;j<i;j++) { gsub(arr[j],arr_val[j],$0); } print; }' file2

# 4  
Old 09-06-2008
Another approach with awk:

Code:
awk 'BEGIN{FS="[ |]"} 
NR==FNR{a[$2]=$3;next}
$5 in a {$5=a[$5]}
{print}' file1 file2

If you get errors use nawk, gawk or /usr/xpg4/bin/awk on Solaris.

Last edited by Franklin52; 09-06-2008 at 09:19 AM.. Reason: fix FS
# 5  
Old 09-06-2008
Thank you all but...

I think I may have confused the issue for the last post. (franklin52)

The $5 was confusing me!

I deliberately spelt Ireland incorrectly to demonstrate the requirement.

Unfortunately I chose the letter "L" (in lower case) to demonstrate the mispelling. A lower case "L" looks the same as the pipe symbol.

Presumably the elegant last post should be adjusted to reflect the letter "L" issue.

Incidentally, I will study the solutions provided in more detail.
The code provided made me realise how much I used to love playing with "awk" and also how much a few lines of code can achieve.

Last edited by dewsbury; 09-06-2008 at 10:25 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. What is on Your Mind?

Anyone remember this Dr. Rootus post from 27 years ago?

From Wed Sep 4 09:35 MDT 1991 Received: from by with SMTP (16.6/15.5+IOS 3.20) id AA25932; Wed, 4 Sep 91 09:35:27 -0600 Return-Path: Received: by (16.6/15.5+IOS 3.20) id AA10424; Wed, 4 Sep 91 09:34:58 -0600 Date: Wed, 4 Sep 91 09:34:58 -0600 From: Message-Id: <> To: ... (0 Replies)
Discussion started by: jpezz
0 Replies

2. UNIX for Dummies Questions & Answers

Unix Command to separate this years files and last years?

Hello - I have a folder that contains files from 2003 till 2010. I am trying to figure out a command that would seperate each years file and show me a count? Even if i can find a command that would give me year by year count, thats good enough too. Thanks (8 Replies)
Discussion started by: DallasT
8 Replies

3. UNIX for Dummies Questions & Answers

easy question

Hi everybody: Could anybody tell me if I have several files which each one it has this pattern name: name1.dat name2.dat name3.dat name4.dat name10.dat name11.dat name30.dat If I would like create one like: name_total.dat If I do: paste name*.dat > name_total.dat (15 Replies)
Discussion started by: tonet
15 Replies

4. UNIX for Dummies Questions & Answers

Easy sed question?

I have a line like: "Jun 19 12:56:22 routername 45454:" I want to keep all information except the seconds of the time. I tried: sed 's/..:..:../..:../g' but apparently I'm on the wrong track, because although that matches on the time, it replaces it with the literal ..:.. How... (6 Replies)
Discussion started by: earnstaf
6 Replies

5. Shell Programming and Scripting

Hopefully an Easy Question

I have a file name in this format ABC_WIRE_TRANS_YYYYMMDD_00.DAT I need to cut out the _00 out of the file name everytime. It could be _00, _01,_02, etc .... How do I cut it out to look as follows? ABC_WIRE_TRANS_YYYYMMDD.DAT (6 Replies)
Discussion started by: lesstjm
6 Replies

6. UNIX for Dummies Questions & Answers

easy unix question

I am trying to check through all of a certain type of file in all main directories, and find the top 10 that are taking up the most space. How can I do that? I was thinking like du *.file | sort -n | head (1 Reply)
Discussion started by: wallacer
1 Replies

7. UNIX for Dummies Questions & Answers

Easy UNIX notation question

can anyone tell me what exactly the following UNIX notation code does cause I need to do the same in windows? for x in webapps/sal/*.htm* do mv $x $x.bak sed 's@bob@sal@g' $x.bak > $x done Thanks (1 Reply)
Discussion started by: lavaghman
1 Replies

8. Cybersecurity

Unix attacks in the last 5 years.

Hi, Could anyone direct me to any sites that have any info on unix attcks or hacks in the last 5 years. This is needed for an assignment. All help would be greatly appreciated. Thanks:) (6 Replies)
Discussion started by: suzant
6 Replies
Login or Register to Ask a Question