Splitting Chunked-FullNames Nightmare


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Splitting Chunked-FullNames Nightmare
# 1  
Old 11-03-2007
Splitting Chunked-FullNames Nightmare

I've got a problem i'm hoping other more experienced programmers have had to deal with sometime in their careers and can help me: how to get fullnames that were chunked together into one field in an old database into separate more meaningful fields.

I'd like to get the records that nicely fit into the pattern of firstname middleinitial lastname into three fields separated by a colon, and skip the other names that don't fit that pattern until i figure out what to do with them (any suggestions welcome).

GIVEN INPUT:
Code:
DONNIE BERG
JERRY M MAGUIRE
D A BROWN
RICHARD N STYLES & FRANK A PERRY
MITCH GARBO & BOBBI MILLS
JUDY & STONE RUFFEY
MRS K H SCHULTZ
JASPER O & SUZI M THOMPSON
DAY FRANKLIN-MIZER
BO & TYRA J SLACK
JERRY B DE TUNA
CHARLES C VICTOR III
DARREN E MC FANN
TOM E VARBLE JR
MARY W & CAROLYN SMILEY
BAHAMA FARMS TOWNSHIP INC & HERME BLUE INC
STAN N GAIL & HIDDEN VALLEY FARMS
DRIPP CREEK FARM
Y Z & S T OUTTRIM
WM AND VI JOYNER SALES
A C A SALES
ADDONIS SYNDICATE & LOWLAND MEADOW

DESIRED OUTPUT:
Code:
DONNIE: :BERG
JERRY:M:MAGUIRE
D:A:BROWN
RICHARD:N:STYLES:FRANK:A:PERRY
MITCH: :GARBO:BOBBI: :MILLS
JUDY: :RUFFEY:STONE:RUFFEY
K:H:SCHULTZ
JASPER:O:THOMPSON:SUZI:M:THOMPSON
DAY: :FRANKLIN-MIZER
BO: :SLACK:TYRA:J:SLACK
JERRY:B:DE TUNA
CHARLES:C:VICTOR III
DARREN:E:MC FANN
TOM:E:VARBLE JR
MARY:W:SMILEY:CAROLYN: :SMILEY
BAHAMA FARMS TOWNSHIP INC & HERME BLUE INC
STAN:N:GAIL & HIDDEN VALLEY FARMS
DRIPP CREEK FARM
Y Z & S T OUTTRIM
WM AND VI JOYNER SALES
A C A SALES
ADDONIS SYNDICATE & LOWLAND MEADOW

# 2  
Old 11-04-2007
Try and adapt the following awk program :
Code:
# Global Array Desription
#
# Names["cnt"     ] = Names count in names list (input line)
# Names["invalid" ] = 0 if all valid names, 1 otherwise
# Names["list"    ] = Formated names list
#
# Names[n,   "parts"] = Parts       of name n in list
# Names[n,   "first"] = Firstname  for name n in list
# Names[n,  "middle"] = Middlename for name n in list
# Names[n,    "last"] = Lastname   for name n in list
# Names[n,    "name"] = Formated       name n in list
# Names[n, "invalid"] = 0 if name n is valid, 1 otherwise
#

#=======================================================================
# F U N C T I O N S . . .
#=======================================================================

#
# set_name(name) - Set name informations
#

function set_name(name    ,parts, p) {

   #
   # Set name parts
   #

   parts = Names[name, "parts"]
   while (1) {

      if (Names[name, parts]   ~ /^(JR|SR)$/     ||
          Names[name, parts]   ~ /^[IVX]+$/      ||
          Names[name, parts-1] ~ /^(DE|MC)$/     ) {
         Names[name, parts-1] = Names[name, parts-1] " " Names[name, parts];
         parts--;
         continue;
      }

      if (Names[name, 1] ~ /^(MR|MRS|MS)$/) {
         for (p=2; p<=parts; p++)
            Names[name, p-1] = Names[name, p];
         parts--;
         continue;
      }

      break;
   }
   Names[name,   "parts"] = parts;
   Names[name, "invalid"] = 0;


   #
   # Set name components
   #

   if (parts == 3) {
      if (length(Names[name, 2]) > 1) {
         Names[name, "invalid"] = 1;
      } else {
         Names[name,  "first"] = Names[name, 1];
         Names[name, "middle"] = Names[name, 2];
         Names[name,   "last"] = Names[name, 3];
      }
   } else if (parts == 2) {
      Names[name,  "first"] = Names[name, 1];
      if (length(Names[name, 2]) == 1 && name < Names["cnt"]) {
         Names[name, "middle"] = Names[name, 2];
         Names[name,   "last"] = Names[name+1, "last"];
      } else {
         Names[name, "middle"] = " ";
         Names[name,   "last"] = Names[name, 2];
      }
   } else if (parts == 1) {
      if (name < Names["cnt"]) {
         Names[name,  "first"] = Names[name, 1];
         Names[name, "middle"] = " ";
         Names[name,   "last"] = Names[name+1, "last"];
      } else
         Names[name, "invalid"] = 1;
   } else
      Names[name, "invalid"] = 1;

   Names["invalid"] += Names[name, "invalid"];

   #
   # Format name
   #

   if (Names[name, "invalid"]) {
      Names[name, "name"] = "";
      for (p=1; p<=parts; p++)
         Names[name, "name"] = Names[name, "name"] (p>1 ? " " : "") Names[name, p];
   } else {
      Names[name, "name"] = Names[name, "first"] ":" Names[name, "middle"] ":" Names[name, "last"];
   }

}

#
# split_list() - Split input names list
#

function split_list(    f ,cnt ,parts) {

   cnt   = 1;
   parts = 0;

   for (f=1; f<=NF; f++) {
      if ($f != "&") {
         Names[cnt, ++parts] = $f
      } else {
         Names[cnt, "parts"] = parts;
         parts = 0;
         cnt++;
      }
   }
   Names[cnt, "parts"] = parts;

   Names[    "cnt"] = cnt;
   Names["invalid"] = 0;
   Names[   "list"] = "";

}

#
# set_list() - Format names list
#

function format_list(    name ,list ,sep) {

   list = "";
   sep = (Names["invalid"] ? " & " : ":");
   for (name=1; name<=Names["cnt"]; name++) {
      list = list (name>1 ? sep : "") Names[name, "name"];
   }
   Names["list"] = list;

}

#
# analyze_list() - Analyze input names list
#

function analyze_list(    n) {
   split_list();
   for (n=Names["cnt"]; n>0; --n) {
      set_name(n);
   }
   format_list();
}

#=======================================================================
# M A I N . . .
#=======================================================================

NF {

   analyze_list();

   print "Input =" $0
   print "Output=" Names["list"];
   print ""

}

Output (with your input sample file):
Code:
Input =DONNIE BERG
Output=DONNIE: :BERG

Input =JERRY M MAGUIRE
Output=JERRY:M:MAGUIRE

Input =D A BROWN
Output=D:A:BROWN

Input =RICHARD N STYLES & FRANK A PERRY
Output=RICHARD:N:STYLES:FRANK:A:PERRY

Input =MITCH GARBO & BOBBI MILLS
Output=MITCH: :GARBO:BOBBI: :MILLS

Input =JUDY & STONE RUFFEY
Output=JUDY: :RUFFEY:STONE: :RUFFEY

Input =MRS K H SCHULTZ
Output=K:H:SCHULTZ

Input =JASPER O & SUZI M THOMPSON
Output=JASPER:O:THOMPSON:SUZI:M:THOMPSON

Input =DAY FRANKLIN-MIZER
Output=DAY: :FRANKLIN-MIZER

Input =BO & TYRA J SLACK
Output=BO: :SLACK:TYRA:J:SLACK

Input =JERRY B DE TUNA
Output=JERRY:B:DE TUNA

Input =CHARLES C VICTOR III
Output=CHARLES:C:VICTOR III

Input =DARREN E MC FANN
Output=DARREN:E:MC FANN

Input =TOM E VARBLE JR
Output=TOM:E:VARBLE JR

Input =MARY W & CAROLYN SMILEY
Output=MARY:W:SMILEY:CAROLYN: :SMILEY

Input =BAHAMA FARMS TOWNSHIP INC & HERME BLUE INC
Output=BAHAMA FARMS TOWNSHIP INC & HERME BLUE INC

Input =STAN N GAIL & HIDDEN VALLEY FARMS
Output=STAN:N:GAIL & HIDDEN VALLEY FARMS

Input =DRIPP CREEK FARM
Output=DRIPP CREEK FARM

Input =Y Z & S T OUTTRIM
Output=Y:Z:OUTTRIM:S:T:OUTTRIM

Input =WM AND VI JOYNER SALES
Output=WM AND VI JOYNER SALES

Input =A C A SALES
Output=A C A SALES

Input =ADDONIS SYNDICATE & LOWLAND MEADOW
Output=ADDONIS: :SYNDICATE:LOWLAND: :MEADOW

Jean-Pierre.
# 3  
Old 11-05-2007
Jean-Pierre, thank-you so much! Your program successfully splits the bulk of the 38,000 chunked-names i have to change. I can't thank-you enough for this code-gift! You've turned my nightmare into nothing more than a bad dream....
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Frequency Count of chunked data

Dear all, I have an AWK script which provides frequency of words. However I am interested in getting the frequency of chunked data. This means that I have already produced valid chunks of running text, with each chunk on a line. What I need is a script to count the frequencies of each string. A... (4 Replies)
Discussion started by: gimley
4 Replies

2. Solaris

AI server corporate nightmare

Hello Ex-Sun fellows, i've been tasked to install a bran spanking new AI Solaris 11.1 server for our Oracle park. With the documentation this is what's running and what's working. T5120 2 oracle VM's (one ldom on each disk) One with a Solaris 11.1 repo, AI server. One for testing... (0 Replies)
Discussion started by: maverick72
0 Replies

3. Shell Programming and Scripting

ftp nightmare

Hi everyone I have a Fedora FTP server (lets call it FTP-SERVER1) and files are constantly being uploaded to it from a 3rd party. Once every 15 minutes I need to "move" the files from the FTP server to another server (lets call it SVR-WINDOWS) where it will be processed and then deleted. ... (3 Replies)
Discussion started by: soliberus
3 Replies

4. Solaris

date -d nightmare on Solaris

Hello there ppl, I thought my question would qualify to be posted in this forum and in Shell scripting forum. And I swear to God.. there is no discussion on this exact topic anywhere else on the web! So my script on BASH uses 2 commands: 1) date -d "Fri Mar 06 10:18:16 UTC 2009" +%s ... (1 Reply)
Discussion started by: pavanlimo
1 Replies

5. BSD

FreeBSD nightmare!!!

Dear friends out there, i hope u'll have enough time to read this problem of mine and try to help me solve it. well, i've been a long time user of microsoft products and happened to come across FreeBSD when one fellow referred me to it saying that it was a wonderful OS which one could use for web... (6 Replies)
Discussion started by: kenyatta
6 Replies

6. UNIX for Dummies Questions & Answers

is unix really such a nightmare... or is it me?

i rue the day that my server manager and i parted company... the start of a long journey.... :( sometimes i find myself daydreaming about the days when i could say... "this dont work, can u fix it?".... and 2 mins later it worked! i have a new way of "cursing" at ppls.... i just say "failed... (10 Replies)
Discussion started by: mickeymouse
10 Replies

7. UNIX for Dummies Questions & Answers

installing apache (nightmare for me)

Please help... i'm new to this job and new to unix as well..... i'm trying to install apache 2.2.6 it's installed on one server... i need to install it on another server... my clue was to maybe use the fetch command... please help.....for example..... apache is on 69.50.132.14.... and it needs to... (1 Reply)
Discussion started by: marinob007
1 Replies

8. HP-UX

Viewcvs...a nightmare on HP-UX!!!

Hello I'm new on this forum but I have a big problem. I've installed Subversion 1.1.1 and Apache 2.0.52 on a HP-UX. This is the uname: HP-UX xxxx B.11.11 U 9000/800 4169945236 unlimited-user license Now I must to install a software to browse the svn repositories. My choice is Viewcvs. ... (3 Replies)
Discussion started by: goblin79
3 Replies
Login or Register to Ask a Question