reformating non-uniform strings


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting reformating non-uniform strings
# 1  
Old 07-20-2010
Question reformating non-uniform strings

I have a set of free-form phone numbers that are not uniform and I want to reformat them into a standard uniform string. These are embedded at the end of a colon seperated file built by a large nawk + tr piping like such:

Code:
XXXXX:YYYYY:ZZZZZ:(333)333-3333x33333
XXXXX:YYYYY:ZZZZZ:x44444
XXXXX:YYYYY:ZZZZZ:(555)-555-5555
XXXXX:YYYYY:ZZZZZ:66666
XXXXX:YYYYY:ZZZZZ:777 - 777 - 7777
XXXXX:YYYYY:ZZZZZ:(888)888-8888
XXXXX:YYYYY:ZZZZZ:999.999.9999

Note: Row 1 & 3 are extensions that would come out to 444-444-4444 & 666-666-6666 from an outside line respectively. So the output would need to be:

Code:
XXXXX:YYYYY:ZZZZZ:333-333-3333
XXXXX:YYYYY:ZZZZZ:444-444-4444
XXXXX:YYYYY:ZZZZZ:555-555-5555
XXXXX:YYYYY:ZZZZZ:666-666-6666
XXXXX:YYYYY:ZZZZZ:777-777-7777
XXXXX:YYYYY:ZZZZZ:888-888-8888
XXXXX:YYYYY:ZZZZZ:999-999-9999

Is there a way to build a template with arrays or something similar. I have tried several things with awk and IFS, but can't seem to adequately break up these into array's by each byte. Solving this is step 1.

Step 2 is how do I incorporate this solution into a 3 step piping so that only 1 file is created during the script. No temporary files allowed. So it would need to occur like this:

Code:
nawk 'large string of operations to join 2 files' File1 File2 | tr -d ' ' | "this phone number solution" > output.txt

Is this the best way to approach this? I don't want to do nawk + "tr" to remove whitespace and create output.txt, then come back through and do the phone number solution on output.txt to create new_output.txt. It all needs to be done in one swoop without the temp file unless you can rewrite the new phone number to the output.txt file after its generated.
# 2  
Old 07-20-2010
trying to understand this, little by little

Does this help to just read numbers?
Code:
>echo "XXXXX:YYYYY:ZZZZZ:(333)333-3333x33333" | tr -cd '[:digit:]'
333333333333333

If so, then the next step is to format the data.
Digits beyond first ten are dropped, right?
Fewer than ten digits, how to format?

---------- Post updated at 04:39 PM ---------- Previous update was at 04:29 PM ----------

(Heading home shortly, but wanted to provide more for you to ponder.)

Code:
>echo "XXXXX:YYYYY:ZZZZZ:222 333 4444 x5555" | tr -cd '[:digit:]' | gawk '{print substr($1,1,3)"-"substr($1,4,3)"-"substr($1,7,4)}'
222-333-4444

# 3  
Old 07-21-2010
getting closer...

I think that would work for any of the crap encountered when a person has the full number and area code formatted in 20 different ways. The remaining issue is when they just have their extension there. It needs to be exploded based off of the first digit of the extension to include the full number + area code. For example, if the extension is 66666 then the first 6 would translate to adding XXX-XX6-6666 to make the full number. Likewise for 33333 it would morph to XXX-XX3-3333.

Would this be possible with a larger awk and conditional statements? Oh and this is AIX so no gawk...only nawk and awk. Even though your gawk just has "substr" which should be fine with nawk.
# 4  
Old 07-21-2010
I don't understand the full logic of reformating process.
Can you show us the required output from this input file :
Code:
XXXXX:YYYYY:ZZZZZ:(123)456-7890x84848
XXXXX:YYYYY:ZZZZZ:x12345
XXXXX:YYYYY:ZZZZZ:(987)-654-3210
XXXXX:YYYYY:ZZZZZ:73849
XXXXX:YYYYY:ZZZZZ:543 - 987 - 2106
XXXXX:YYYYY:ZZZZZ:(123)987-0456
XXXXX:YYYYY:ZZZZZ:098.765.4321

Jean-Pierre.
# 5  
Old 07-21-2010
Tools To do that with extensions...

I am thinking about
sprintf = formatted printing
gsub = global substitution
as useful functions within awk to help you.

(Sorry, kinda busy right now to think thru, but wanted to provide some thoughts)
# 6  
Old 07-21-2010
like this

Your input:
Code:
XXXXX:YYYYY:ZZZZZ:(123)456-7890x84848
XXXXX:YYYYY:ZZZZZ:x12345
XXXXX:YYYYY:ZZZZZ:(987)-654-3210
XXXXX:YYYYY:ZZZZZ:73849
XXXXX:YYYYY:ZZZZZ:543 - 987 - 2106
XXXXX:YYYYY:ZZZZZ:(123)987-0456
XXXXX:YYYYY:ZZZZZ:098.765.4321

Required output:
Code:
XXXXX:YYYYY:ZZZZZ:123-456-7890
XXXXX:YYYYY:ZZZZZ:736-251-2345
XXXXX:YYYYY:ZZZZZ:987-654-3210
XXXXX:YYYYY:ZZZZZ:655-627-3849
XXXXX:YYYYY:ZZZZZ:543-987-2106
XXXXX:YYYYY:ZZZZZ:123-987-0456
XXXXX:YYYYY:ZZZZZ:098-765-4321

As you can see, the numbers where the area code was provided are easier to figure out. The lines that just have an extension require an additional set of numbers based off the first byte of the extension. So the 12345 extension becomes 736-251-2345 because the 1 of the extension signifies a certain constant of 5 digits to go in front of the extension. There is no calculation just a constant value based on the number of the extension. Something similar to this:

if extension starts with "1" append 736-25 on the front of extension and add dash after 1.
if extension starts with "2" append 854-32 on the front of extension and add dash after 2.
...
if extension starts with "7" append 655-62 on the front of extension and add dash after 7.
...
etc..
# 7  
Old 07-21-2010
Try and adapt the following script:
Code:
awk '
BEGIN {
   FS = OFS = ":"
   #               1xxxx 2xxxx 3xxxx 4xxxx 5xxxx 6xxxx 7xxxx
   n=split ("?????,73625,85432,33333,44444,55555,66666,65562", ac, ",");
}
{
   gsub(/[^0-9]/, "", $4);
   tel = $4;
   len = length(tel);
   if (len == 5) {
      ext = substr(tel, 1, 1);
      tel = (ext in ac ? ac[ext+1] : "?????") tel;
   }
   tel = tel "??????????";
   $4 = substr(tel, 1, 3) "-" substr(tel, 4, 3) "-" substr(tel, 7, 4);
   print;
}
' lordmiter.txt

Input file (lordmiter.txt):
Code:
XXXXX:YYYYY:ZZZZZ:(123)456-7890x84848
XXXXX:YYYYY:ZZZZZ:x12345
XXXXX:YYYYY:ZZZZZ:(987)-654-3210
XXXXX:YYYYY:ZZZZZ:73849
XXXXX:YYYYY:ZZZZZ:543 - 987 - 2106
XXXXX:YYYYY:ZZZZZ:(123)987-0456
XXXXX:YYYYY:ZZZZZ:098.765.4321
XXXXX:YYYYY:ZZZZZ:777.654
XXXXX:YYYYY:ZZZZZ:98765

Output:
Code:
XXXXX:YYYYY:ZZZZZ:123-456-7890
XXXXX:YYYYY:ZZZZZ:736-251-2345
XXXXX:YYYYY:ZZZZZ:987-654-3210
XXXXX:YYYYY:ZZZZZ:655-627-3849
XXXXX:YYYYY:ZZZZZ:543-987-2106
XXXXX:YYYYY:ZZZZZ:123-987-0456
XXXXX:YYYYY:ZZZZZ:098-765-4321
XXXXX:YYYYY:ZZZZZ:777-654-????
XXXXX:YYYYY:ZZZZZ:???-??9-8765

Jean-Pierre.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Uniform Spacing in the message

Hello, I am running a script which sends an output as an email; I am having issues with the spacing being not uniform in the message. Snippet of the code and email message below: if ] then echo "$Hostname\tMISSING\tHMCBackup" >> $BackupMsg else if ] then echo... (12 Replies)
Discussion started by: hasn318
12 Replies

2. AIX

Uniform LUN size

Hi, Is there any advantage is making all my storage LUNS ( hdisk ) of uniform size. Currently the LUN's are having different size () eg: 50G / 60G / 75G etc ). I am planning for a storage migration....so should i go for uniform lun size or with current LUN size pattern ? Regards, jibu (3 Replies)
Discussion started by: jibujacob
3 Replies

3. Shell Programming and Scripting

Searching for strings amongst non-uniform data

Hi Guys, I have a source file which contains significant strings amongst a lot of dross in non-uniform format, I'd like to search the input file for any examples of data from my reference file, and then output any matches to a list (text file). I've made something that achieves this, it's... (4 Replies)
Discussion started by: gazza86
4 Replies

4. UNIX for Dummies Questions & Answers

Reformating unix data

Hi i have a unix date in file a file like this '1313675999' in oracle i would do it like this select TO_CHAR ( TO_DATE ('01011970', 'DDMMYYYY')+ 1 / 24 / 60 / 60 * 1313675999,'YYYYMMDD') from dual how to achive the same in unix ? (8 Replies)
Discussion started by: phpsnook
8 Replies

5. Shell Programming and Scripting

Splitting & reformating a single file

I have a bif text file with the following format: d1_03 fr:23 d1_03 fr:56 d1_03 fr:67 d1_03 fr:78 d1_01 fr:35 d1_01 fr:29 d1_01 fr:45 d2_09 fr:34 d2_09 fr:78 d3_98 fr:90 d3_98 fr:104 d3_98 fr:360 I have like thousands of such lines I want to reformat this file based on column 1... (3 Replies)
Discussion started by: Lucky Ali
3 Replies

6. Shell Programming and Scripting

Rows to columns transposing and reformating.

----File attached. Input file =========== COL_1 <IP Add 1> COL_2 <Service1> COL_3 <ABCDEFG> COL_4 <IP ADD:PORT> COL_4 <IP ADD:PORT> COL_1 <IP Add 2> COL_2 <Service2> COL_2 <Service3> COL_2 <Service4> COL_3 <AAAABBB> COL_4 <IP ADD:PORT> COL_4 <IP ADD:PORT> COL_4 <IP... (27 Replies)
Discussion started by: bluethunder
27 Replies

7. Shell Programming and Scripting

Reformating ascii file with awk

Hello, I've a lot of ascii files that I would like to reformat : One of files's column (for exemple $5) contains increasing numbers (see exemple) : $5= 1 1 1 1 1 2 2 2 2 3 3 (2 Replies)
Discussion started by: Caribou
2 Replies

8. UNIX for Advanced & Expert Users

Selectively Reformating a file using AWK

Dear users, I am new to AWK and have been battling with this one for close to a week now. Some of you did offer some help last week but I think I may not have explained myself very well. So I am trying again. I have a dataset that has the following format where the datasets repeat every... (5 Replies)
Discussion started by: sda_rr
5 Replies

9. Shell Programming and Scripting

awk - reformating rows into columns

looking to do the following... What the data looks like server1 02/01/2008 groups 10 server1 03/01/2008 groups 15 server1 04/01/2008 groups 20 server2 02/01/2008 users 50 server2 03/01/2008 users 75 server2 04/01/2008 users 100 server2 04/01/2008 users 125 What I would like the... (1 Reply)
Discussion started by: jmd2004
1 Replies

10. Shell Programming and Scripting

Help on awk.. reformating a file

Hello, I am having a trouble with awk attempting to reformat a two columns file , such as below: 201 84 201 370 201 544 201 600 213 99 213 250 213 431 220 65 220 129 220 338 220 408 220 501 220 550 231 101 231 350 What I need to do is is to add a third column containing a... (4 Replies)
Discussion started by: Martian
4 Replies
Login or Register to Ask a Question