The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
column extract help cvm Shell Programming and Scripting 1 04-24-2008 05:19 PM
extract and format information from a file sujoy101 Shell Programming and Scripting 12 04-16-2008 07:58 AM
How to extract a piece of information from a huge file Marcor Shell Programming and Scripting 2 03-13-2008 04:33 PM
AWK to extract information harris2107 Shell Programming and Scripting 2 08-15-2007 11:17 PM
How do you extract the information from a library? mercz Shell Programming and Scripting 3 09-12-2002 10:18 PM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 05-08-2009
grossgermany grossgermany is offline
Registered User
  
 

Join Date: Jul 2007
Posts: 34
Extract zip code information from address, put into new column

Hi, suppose I have a colon delimeterd file with address field like this

blue:john's hospital new haven CT 92881-2322
yellow:La times copr red road los angeles CA90381 1302
red:las vegas hotel sand drive Las vegas NV,21221

How do I create a new field that contain the zip code information only, extracted from the address field
The problem is with error tolerance, in the survey data file, there are many ways people wrote zip codes:


CT 92881-2322 (subzip code with hyphen)
CT 92881 2322 (subzip code with space)
CT 92881 (no subzip code)
CT,92881 (extra comma)
CT92881 (no space between state and zip code)



Do you think you can do this in python or awk? Thanks.
The new zip code field should be in standardized format:
CT 92881-2322 if there is sub zip code
CT 92881 if the sub zip code is empty
  #2 (permalink)  
Old 05-08-2009
ghostdog74 ghostdog74 is offline Forum Advisor  
Registered User
  
 

Join Date: Sep 2006
Posts: 2,538
Python
Code:
#!/usr/bin/env python

states=["CA","CT","NV"] #get all the states code here
for line in open("file"):
    line=line.strip()
    for s in states:        
        if s in line:
            print line[line.index(s):]
output:
Code:
 # ./test.py
CT 92881-2322
CA90381 1302
NV,21221
  #3 (permalink)  
Old 05-08-2009
cfajohnson's Avatar
cfajohnson cfajohnson is offline Forum Advisor  
Shell programmer, author
  
 

Join Date: Mar 2007
Location: Toronto, Canada
Posts: 2,361
Code:
awk '
{
 left = statezip = $0
 sub( / [A-Z][A-Z].*/,"",left)
 sub( left " ","", statezip)
 state=substr(statezip,1,2)
 sub( /[A-Z][A-Z][, ]*/,"", statezip)
 sub( " ", "-", statezip)
 printf "%s:%s %s\n", left, state, statezip
}' "$FILE"
In the shell, if the file is not too big:

Code:
while IFS= read -r line
do
  left=${line% [A-Z][A-Z]*}
  statezip=${line#"$left "}
  zip=${statezip#??}
  state=${statezip%"$zip"}
  state=${state%[, ]}
  zip=${zip#[ ,]}
  case $zip in
    *\ *) lzip=${zip% *}
          rzip=${zip#* }
          zip=$lzip-$rzip
          ;;
  esac
  printf "%s:%s %s\n" "$left" "$state" "$zip"
done < "$FILE"
  #4 (permalink)  
Old 05-10-2009
grossgermany grossgermany is offline
Registered User
  
 

Join Date: Jul 2007
Posts: 34
thanks a lot cfajohnson

But do you think sub( / [A-Z][A-Z].*/,"",left) is prone to unintentionally pick up patterns for example in street names, rather than in zip.

Is there a possible way to make it into sub(/ [two letter patern in the list.[0-9][0-9][0-9][0-9][0-9]/,"",left)

list="AL, AK, AZ, AR, CA, CO, CT, DE, FL, GA,
HI, ID, IL, IN, IA, KS, KY, LA, ME, MD,
MA, MI, MN, MS, MO, MT, NE, NV, NH, NJ,
NM, NY, NC, ND, OH, OK, OR, PA, RI, SC,
SD, TN, TX, UT, VT, VA, WA, WV, WI, WY"

thanks
Closed Thread

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 04:16 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0