The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
Google UNIX.COM


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
FreeBSD nightmare!!! kenyatta BSD 4 08-15-2008 03:49 PM
is unix really such a nightmare... or is it me? mickeymouse UNIX for Dummies Questions & Answers 10 06-06-2008 12:51 AM
installing apache (nightmare for me) marinob007 UNIX for Dummies Questions & Answers 1 12-18-2007 09:03 PM
Splitting a txt file mohdtausifsh UNIX for Dummies Questions & Answers 6 10-03-2006 11:19 PM
Viewcvs...a nightmare on HP-UX!!! goblin79 HP-UX 3 09-06-2005 01:28 AM

Reply
 
Submit Tools LinkBack Thread Tools Search this Thread Display Modes
  #1  
Old 11-03-2007
RacerX's Avatar
Registered User
 

Join Date: Oct 2007
Posts: 34
Splitting Chunked-FullNames Nightmare

I've got a problem i'm hoping other more experienced programmers have had to deal with sometime in their careers and can help me: how to get fullnames that were chunked together into one field in an old database into separate more meaningful fields.

I'd like to get the records that nicely fit into the pattern of firstname middleinitial lastname into three fields separated by a colon, and skip the other names that don't fit that pattern until i figure out what to do with them (any suggestions welcome).

GIVEN INPUT:
Code:
DONNIE BERG
JERRY M MAGUIRE
D A BROWN
RICHARD N STYLES & FRANK A PERRY
MITCH GARBO & BOBBI MILLS
JUDY & STONE RUFFEY
MRS K H SCHULTZ
JASPER O & SUZI M THOMPSON
DAY FRANKLIN-MIZER
BO & TYRA J SLACK
JERRY B DE TUNA
CHARLES C VICTOR III
DARREN E MC FANN
TOM E VARBLE JR
MARY W & CAROLYN SMILEY
BAHAMA FARMS TOWNSHIP INC & HERME BLUE INC
STAN N GAIL & HIDDEN VALLEY FARMS
DRIPP CREEK FARM
Y Z & S T OUTTRIM
WM AND VI JOYNER SALES
A C A SALES
ADDONIS SYNDICATE & LOWLAND MEADOW
DESIRED OUTPUT:
Code:
DONNIE: :BERG
JERRY:M:MAGUIRE
D:A:BROWN
RICHARD:N:STYLES:FRANK:A:PERRY
MITCH: :GARBO:BOBBI: :MILLS
JUDY: :RUFFEY:STONE:RUFFEY
K:H:SCHULTZ
JASPER:O:THOMPSON:SUZI:M:THOMPSON
DAY: :FRANKLIN-MIZER
BO: :SLACK:TYRA:J:SLACK
JERRY:B:DE TUNA
CHARLES:C:VICTOR III
DARREN:E:MC FANN
TOM:E:VARBLE JR
MARY:W:SMILEY:CAROLYN: :SMILEY
BAHAMA FARMS TOWNSHIP INC & HERME BLUE INC
STAN:N:GAIL & HIDDEN VALLEY FARMS
DRIPP CREEK FARM
Y Z & S T OUTTRIM
WM AND VI JOYNER SALES
A C A SALES
ADDONIS SYNDICATE & LOWLAND MEADOW
Reply With Quote
Forum Sponsor
  #2  
Old 11-04-2007
aigles's Avatar
Registered User
 

Join Date: Apr 2004
Location: Bordeaux, France
Posts: 1,212
Try and adapt the following awk program :
Code:
# Global Array Desription
#
# Names["cnt"     ] = Names count in names list (input line)
# Names["invalid" ] = 0 if all valid names, 1 otherwise
# Names["list"    ] = Formated names list
#
# Names[n,   "parts"] = Parts       of name n in list
# Names[n,   "first"] = Firstname  for name n in list
# Names[n,  "middle"] = Middlename for name n in list
# Names[n,    "last"] = Lastname   for name n in list
# Names[n,    "name"] = Formated       name n in list
# Names[n, "invalid"] = 0 if name n is valid, 1 otherwise
#

#=======================================================================
# F U N C T I O N S . . .
#=======================================================================

#
# set_name(name) - Set name informations
#

function set_name(name    ,parts, p) {

   #
   # Set name parts
   #

   parts = Names[name, "parts"]
   while (1) {

      if (Names[name, parts]   ~ /^(JR|SR)$/     ||
          Names[name, parts]   ~ /^[IVX]+$/      ||
          Names[name, parts-1] ~ /^(DE|MC)$/     ) {
         Names[name, parts-1] = Names[name, parts-1] " " Names[name, parts];
         parts--;
         continue;
      }

      if (Names[name, 1] ~ /^(MR|MRS|MS)$/) {
         for (p=2; p<=parts; p++)
            Names[name, p-1] = Names[name, p];
         parts--;
         continue;
      }

      break;
   }
   Names[name,   "parts"] = parts;
   Names[name, "invalid"] = 0;


   #
   # Set name components
   #

   if (parts == 3) {
      if (length(Names[name, 2]) > 1) {
         Names[name, "invalid"] = 1;
      } else {
         Names[name,  "first"] = Names[name, 1];
         Names[name, "middle"] = Names[name, 2];
         Names[name,   "last"] = Names[name, 3];
      }
   } else if (parts == 2) {
      Names[name,  "first"] = Names[name, 1];
      if (length(Names[name, 2]) == 1 && name < Names["cnt"]) {
         Names[name, "middle"] = Names[name, 2];
         Names[name,   "last"] = Names[name+1, "last"];
      } else {
         Names[name, "middle"] = " ";
         Names[name,   "last"] = Names[name, 2];
      }
   } else if (parts == 1) {
      if (name < Names["cnt"]) {
         Names[name,  "first"] = Names[name, 1];
         Names[name, "middle"] = " ";
         Names[name,   "last"] = Names[name+1, "last"];
      } else
         Names[name, "invalid"] = 1;
   } else
      Names[name, "invalid"] = 1;

   Names["invalid"] += Names[name, "invalid"];

   #
   # Format name
   #

   if (Names[name, "invalid"]) {
      Names[name, "name"] = "";
      for (p=1; p<=parts; p++)
         Names[name, "name"] = Names[name, "name"] (p>1 ? " " : "") Names[name, p];
   } else {
      Names[name, "name"] = Names[name, "first"] ":" Names[name, "middle"] ":" Names[name, "last"];
   }

}

#
# split_list() - Split input names list
#

function split_list(    f ,cnt ,parts) {

   cnt   = 1;
   parts = 0;

   for (f=1; f<=NF; f++) {
      if ($f != "&") {
         Names[cnt, ++parts] = $f
      } else {
         Names[cnt, "parts"] = parts;
         parts = 0;
         cnt++;
      }
   }
   Names[cnt, "parts"] = parts;

   Names[    "cnt"] = cnt;
   Names["invalid"] = 0;
   Names[   "list"] = "";

}

#
# set_list() - Format names list
#

function format_list(    name ,list ,sep) {

   list = "";
   sep = (Names["invalid"] ? " & " : ":");
   for (name=1; name<=Names["cnt"]; name++) {
      list = list (name>1 ? sep : "") Names[name, "name"];
   }
   Names["list"] = list;

}

#
# analyze_list() - Analyze input names list
#

function analyze_list(    n) {
   split_list();
   for (n=Names["cnt"]; n>0; --n) {
      set_name(n);
   }
   format_list();
}

#=======================================================================
# M A I N . . .
#=======================================================================

NF {

   analyze_list();

   print "Input =" $0
   print "Output=" Names["list"];
   print ""

}
Output (with your input sample file):
Code:
Input =DONNIE BERG
Output=DONNIE: :BERG

Input =JERRY M MAGUIRE
Output=JERRY:M:MAGUIRE

Input =D A BROWN
Output=D:A:BROWN

Input =RICHARD N STYLES & FRANK A PERRY
Output=RICHARD:N:STYLES:FRANK:A:PERRY

Input =MITCH GARBO & BOBBI MILLS
Output=MITCH: :GARBO:BOBBI: :MILLS

Input =JUDY & STONE RUFFEY
Output=JUDY: :RUFFEY:STONE: :RUFFEY

Input =MRS K H SCHULTZ
Output=K:H:SCHULTZ

Input =JASPER O & SUZI M THOMPSON
Output=JASPER:O:THOMPSON:SUZI:M:THOMPSON

Input =DAY FRANKLIN-MIZER
Output=DAY: :FRANKLIN-MIZER

Input =BO & TYRA J SLACK
Output=BO: :SLACK:TYRA:J:SLACK

Input =JERRY B DE TUNA
Output=JERRY:B:DE TUNA

Input =CHARLES C VICTOR III
Output=CHARLES:C:VICTOR III

Input =DARREN E MC FANN
Output=DARREN:E:MC FANN

Input =TOM E VARBLE JR
Output=TOM:E:VARBLE JR

Input =MARY W & CAROLYN SMILEY
Output=MARY:W:SMILEY:CAROLYN: :SMILEY

Input =BAHAMA FARMS TOWNSHIP INC & HERME BLUE INC
Output=BAHAMA FARMS TOWNSHIP INC & HERME BLUE INC

Input =STAN N GAIL & HIDDEN VALLEY FARMS
Output=STAN:N:GAIL & HIDDEN VALLEY FARMS

Input =DRIPP CREEK FARM
Output=DRIPP CREEK FARM

Input =Y Z & S T OUTTRIM
Output=Y:Z:OUTTRIM:S:T:OUTTRIM

Input =WM AND VI JOYNER SALES
Output=WM AND VI JOYNER SALES

Input =A C A SALES
Output=A C A SALES

Input =ADDONIS SYNDICATE & LOWLAND MEADOW
Output=ADDONIS: :SYNDICATE:LOWLAND: :MEADOW
Jean-Pierre.
Reply With Quote
  #3  
Old 11-05-2007
RacerX's Avatar
Registered User
 

Join Date: Oct 2007
Posts: 34
Jean-Pierre, thank-you so much! Your program successfully splits the bulk of the 38,000 chunked-names i have to change. I can't thank-you enough for this code-gift! You've turned my nightmare into nothing more than a bad dream....
Reply With Quote
Google The UNIX and Linux Forums
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes




All times are GMT -7. The time now is 09:52 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited.
The UNIX and Linux Forums Content Copyright ©1993-2008. All Rights Reserved.Ad Management by RedTyger Visit The Complex Event Processing Blog

Content Relevant URLs by vBSEO 3.2.0