![]() |
|
|
|
|
|||||||
| Forums | Portal | Register | Forum Rules | FAQ | Contribute | Members List | Arcade | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts here. |
|
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| FreeBSD nightmare!!! | kenyatta | BSD | 4 | 08-15-2008 03:49 PM |
| is unix really such a nightmare... or is it me? | mickeymouse | UNIX for Dummies Questions & Answers | 10 | 06-06-2008 12:51 AM |
| installing apache (nightmare for me) | marinob007 | UNIX for Dummies Questions & Answers | 1 | 12-18-2007 09:03 PM |
| Splitting a txt file | mohdtausifsh | UNIX for Dummies Questions & Answers | 6 | 10-03-2006 11:19 PM |
| Viewcvs...a nightmare on HP-UX!!! | goblin79 | HP-UX | 3 | 09-06-2005 01:28 AM |
|
|
Submit Tools | LinkBack | Thread Tools | Search this Thread | Display Modes |
|
#1
|
||||
|
||||
|
Splitting Chunked-FullNames Nightmare
I've got a problem i'm hoping other more experienced programmers have had to deal with sometime in their careers and can help me: how to get fullnames that were chunked together into one field in an old database into separate more meaningful fields.
I'd like to get the records that nicely fit into the pattern of firstname middleinitial lastname into three fields separated by a colon, and skip the other names that don't fit that pattern until i figure out what to do with them (any suggestions welcome). GIVEN INPUT: Code:
DONNIE BERG JERRY M MAGUIRE D A BROWN RICHARD N STYLES & FRANK A PERRY MITCH GARBO & BOBBI MILLS JUDY & STONE RUFFEY MRS K H SCHULTZ JASPER O & SUZI M THOMPSON DAY FRANKLIN-MIZER BO & TYRA J SLACK JERRY B DE TUNA CHARLES C VICTOR III DARREN E MC FANN TOM E VARBLE JR MARY W & CAROLYN SMILEY BAHAMA FARMS TOWNSHIP INC & HERME BLUE INC STAN N GAIL & HIDDEN VALLEY FARMS DRIPP CREEK FARM Y Z & S T OUTTRIM WM AND VI JOYNER SALES A C A SALES ADDONIS SYNDICATE & LOWLAND MEADOW Code:
DONNIE: :BERG JERRY:M:MAGUIRE D:A:BROWN RICHARD:N:STYLES:FRANK:A:PERRY MITCH: :GARBO:BOBBI: :MILLS JUDY: :RUFFEY:STONE:RUFFEY K:H:SCHULTZ JASPER:O:THOMPSON:SUZI:M:THOMPSON DAY: :FRANKLIN-MIZER BO: :SLACK:TYRA:J:SLACK JERRY:B:DE TUNA CHARLES:C:VICTOR III DARREN:E:MC FANN TOM:E:VARBLE JR MARY:W:SMILEY:CAROLYN: :SMILEY BAHAMA FARMS TOWNSHIP INC & HERME BLUE INC STAN:N:GAIL & HIDDEN VALLEY FARMS DRIPP CREEK FARM Y Z & S T OUTTRIM WM AND VI JOYNER SALES A C A SALES ADDONIS SYNDICATE & LOWLAND MEADOW |
| Forum Sponsor | ||
|
|
|
#2
|
||||
|
||||
|
Try and adapt the following awk program :
Code:
# Global Array Desription
#
# Names["cnt" ] = Names count in names list (input line)
# Names["invalid" ] = 0 if all valid names, 1 otherwise
# Names["list" ] = Formated names list
#
# Names[n, "parts"] = Parts of name n in list
# Names[n, "first"] = Firstname for name n in list
# Names[n, "middle"] = Middlename for name n in list
# Names[n, "last"] = Lastname for name n in list
# Names[n, "name"] = Formated name n in list
# Names[n, "invalid"] = 0 if name n is valid, 1 otherwise
#
#=======================================================================
# F U N C T I O N S . . .
#=======================================================================
#
# set_name(name) - Set name informations
#
function set_name(name ,parts, p) {
#
# Set name parts
#
parts = Names[name, "parts"]
while (1) {
if (Names[name, parts] ~ /^(JR|SR)$/ ||
Names[name, parts] ~ /^[IVX]+$/ ||
Names[name, parts-1] ~ /^(DE|MC)$/ ) {
Names[name, parts-1] = Names[name, parts-1] " " Names[name, parts];
parts--;
continue;
}
if (Names[name, 1] ~ /^(MR|MRS|MS)$/) {
for (p=2; p<=parts; p++)
Names[name, p-1] = Names[name, p];
parts--;
continue;
}
break;
}
Names[name, "parts"] = parts;
Names[name, "invalid"] = 0;
#
# Set name components
#
if (parts == 3) {
if (length(Names[name, 2]) > 1) {
Names[name, "invalid"] = 1;
} else {
Names[name, "first"] = Names[name, 1];
Names[name, "middle"] = Names[name, 2];
Names[name, "last"] = Names[name, 3];
}
} else if (parts == 2) {
Names[name, "first"] = Names[name, 1];
if (length(Names[name, 2]) == 1 && name < Names["cnt"]) {
Names[name, "middle"] = Names[name, 2];
Names[name, "last"] = Names[name+1, "last"];
} else {
Names[name, "middle"] = " ";
Names[name, "last"] = Names[name, 2];
}
} else if (parts == 1) {
if (name < Names["cnt"]) {
Names[name, "first"] = Names[name, 1];
Names[name, "middle"] = " ";
Names[name, "last"] = Names[name+1, "last"];
} else
Names[name, "invalid"] = 1;
} else
Names[name, "invalid"] = 1;
Names["invalid"] += Names[name, "invalid"];
#
# Format name
#
if (Names[name, "invalid"]) {
Names[name, "name"] = "";
for (p=1; p<=parts; p++)
Names[name, "name"] = Names[name, "name"] (p>1 ? " " : "") Names[name, p];
} else {
Names[name, "name"] = Names[name, "first"] ":" Names[name, "middle"] ":" Names[name, "last"];
}
}
#
# split_list() - Split input names list
#
function split_list( f ,cnt ,parts) {
cnt = 1;
parts = 0;
for (f=1; f<=NF; f++) {
if ($f != "&") {
Names[cnt, ++parts] = $f
} else {
Names[cnt, "parts"] = parts;
parts = 0;
cnt++;
}
}
Names[cnt, "parts"] = parts;
Names[ "cnt"] = cnt;
Names["invalid"] = 0;
Names[ "list"] = "";
}
#
# set_list() - Format names list
#
function format_list( name ,list ,sep) {
list = "";
sep = (Names["invalid"] ? " & " : ":");
for (name=1; name<=Names["cnt"]; name++) {
list = list (name>1 ? sep : "") Names[name, "name"];
}
Names["list"] = list;
}
#
# analyze_list() - Analyze input names list
#
function analyze_list( n) {
split_list();
for (n=Names["cnt"]; n>0; --n) {
set_name(n);
}
format_list();
}
#=======================================================================
# M A I N . . .
#=======================================================================
NF {
analyze_list();
print "Input =" $0
print "Output=" Names["list"];
print ""
}
Code:
Input =DONNIE BERG Output=DONNIE: :BERG Input =JERRY M MAGUIRE Output=JERRY:M:MAGUIRE Input =D A BROWN Output=D:A:BROWN Input =RICHARD N STYLES & FRANK A PERRY Output=RICHARD:N:STYLES:FRANK:A:PERRY Input =MITCH GARBO & BOBBI MILLS Output=MITCH: :GARBO:BOBBI: :MILLS Input =JUDY & STONE RUFFEY Output=JUDY: :RUFFEY:STONE: :RUFFEY Input =MRS K H SCHULTZ Output=K:H:SCHULTZ Input =JASPER O & SUZI M THOMPSON Output=JASPER:O:THOMPSON:SUZI:M:THOMPSON Input =DAY FRANKLIN-MIZER Output=DAY: :FRANKLIN-MIZER Input =BO & TYRA J SLACK Output=BO: :SLACK:TYRA:J:SLACK Input =JERRY B DE TUNA Output=JERRY:B:DE TUNA Input =CHARLES C VICTOR III Output=CHARLES:C:VICTOR III Input =DARREN E MC FANN Output=DARREN:E:MC FANN Input =TOM E VARBLE JR Output=TOM:E:VARBLE JR Input =MARY W & CAROLYN SMILEY Output=MARY:W:SMILEY:CAROLYN: :SMILEY Input =BAHAMA FARMS TOWNSHIP INC & HERME BLUE INC Output=BAHAMA FARMS TOWNSHIP INC & HERME BLUE INC Input =STAN N GAIL & HIDDEN VALLEY FARMS Output=STAN:N:GAIL & HIDDEN VALLEY FARMS Input =DRIPP CREEK FARM Output=DRIPP CREEK FARM Input =Y Z & S T OUTTRIM Output=Y:Z:OUTTRIM:S:T:OUTTRIM Input =WM AND VI JOYNER SALES Output=WM AND VI JOYNER SALES Input =A C A SALES Output=A C A SALES Input =ADDONIS SYNDICATE & LOWLAND MEADOW Output=ADDONIS: :SYNDICATE:LOWLAND: :MEADOW |
|
#3
|
||||
|
||||
|
Jean-Pierre, thank-you so much! Your program successfully splits the bulk of the 38,000 chunked-names i have to change. I can't thank-you enough for this code-gift! You've turned my nightmare into nothing more than a bad dream....
|
||||
| Google The UNIX and Linux Forums |