![]() |
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here. |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| column extract help | cvm | Shell Programming and Scripting | 1 | 04-24-2008 05:19 PM |
| extract and format information from a file | sujoy101 | Shell Programming and Scripting | 12 | 04-16-2008 07:58 AM |
| How to extract a piece of information from a huge file | Marcor | Shell Programming and Scripting | 2 | 03-13-2008 04:33 PM |
| AWK to extract information | harris2107 | Shell Programming and Scripting | 2 | 08-15-2007 11:17 PM |
| How do you extract the information from a library? | mercz | Shell Programming and Scripting | 3 | 09-12-2002 10:18 PM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
||||
|
Extract zip code information from address, put into new column
Hi, suppose I have a colon delimeterd file with address field like this
blue:john's hospital new haven CT 92881-2322 yellow:La times copr red road los angeles CA90381 1302 red:las vegas hotel sand drive Las vegas NV,21221 How do I create a new field that contain the zip code information only, extracted from the address field The problem is with error tolerance, in the survey data file, there are many ways people wrote zip codes: CT 92881-2322 (subzip code with hyphen) CT 92881 2322 (subzip code with space) CT 92881 (no subzip code) CT,92881 (extra comma) CT92881 (no space between state and zip code) Do you think you can do this in python or awk? Thanks. The new zip code field should be in standardized format: CT 92881-2322 if there is sub zip code CT 92881 if the sub zip code is empty |
|
||||
|
Python
Code:
#!/usr/bin/env python
states=["CA","CT","NV"] #get all the states code here
for line in open("file"):
line=line.strip()
for s in states:
if s in line:
print line[line.index(s):]
Code:
# ./test.py CT 92881-2322 CA90381 1302 NV,21221 |
|
||||
|
thanks a lot cfajohnson
But do you think sub( / [A-Z][A-Z].*/,"",left) is prone to unintentionally pick up patterns for example in street names, rather than in zip. Is there a possible way to make it into sub(/ [two letter patern in the list.[0-9][0-9][0-9][0-9][0-9]/,"",left) list="AL, AK, AZ, AR, CA, CO, CT, DE, FL, GA, HI, ID, IL, IN, IA, KS, KY, LA, ME, MD, MA, MI, MN, MS, MO, MT, NE, NV, NH, NJ, NM, NY, NC, ND, OH, OK, OR, PA, RI, SC, SD, TN, TX, UT, VT, VA, WA, WV, WI, WY" thanks |
![]() |
| Bookmarks |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|