help with data formatting


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting help with data formatting
# 1  
Old 01-05-2012
help with data formatting

Hi,

I have data coming in like below. Not all data is like that, these are the problem records that is causing the ETL load to fail. Can you pls help me with combining theese broken records!

Code:
001800018000000guyMMAAY~acct name~acct type~~"address part 1
address part2"~city~STATE~ZIP~COUNTRY~(123) 123-1234~~~~~~~~~~~~~~~~~^M
0018000000gwQ63AAE~acct name~acct type~~"address part 1
address part2"~city~state~zip~country~(123) 123-1234~(123) 123-1234~~~~~~~~~~~~~~~~^M

Appreciate your time and effort!

Thanks,

Last edited by radoulov; 01-05-2012 at 07:00 PM.. Reason: Code tags!
# 2  
Old 01-05-2012
What do you mean by broken records? Can you please clarify the problem by giving the desired output?
# 3  
Old 01-05-2012
The desired out put is
Code:
001800018000000guyMMAAY~acct name~acct type~~"address part 1 address part2"~city~STATE~ZIP~COUNTRY~(123) 123-1234~~~~~~~~~~~~~~~~~^M
0018000000gwQ63AAE~acct name~acct type~~"address part 1 address part2"~city~state~zip~country~(123) 123-1234~(123) 123-1234~~~~~~~~~~~~~~~~^M


Last edited by radoulov; 01-05-2012 at 07:41 PM..
# 4  
Old 01-05-2012
Assuming that the the area-code (xxx) is always after the 'break point', and is always present, this might work:

Code:
awk '
    /\([0-9][0-9][0-9]\)/ {
        if( buffer )
            printf( "%s%s\n", buffer, $0 );
        else
            print;
        buffer = "";
        next;
    }
    { buffer = $0; }
' input-file >fixed-file

# 5  
Old 01-05-2012
Hi,

Appreciate the quick response.

It kind of worked on the problem records with the exception that it has combined multiple non problem records into one.

like 459484~~~~~~~~~~~~~~^M001C000000wO3Y8IAK~

Thanks,
varman
# 6  
Old 01-05-2012
The problem is that my script assumed there'd be an area code in each complete record.

Assuming that the tilda characters are field seperators, is there a fixed number of fields per good record? That'd be the best way to determine if a record is broken.
# 7  
Old 01-06-2012
The number of fields are 12.

Thanks,
Varman
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Data formatting using awk

Need assistance on the data extraction using awk Below is the format and would like to extract the data in another format ------------------------------------------------------------------------------------------------- Minimum Temperature (deg F ) DAY 1 2 3 4 5 6 7 8 9 10 11... (4 Replies)
Discussion started by: ajayram_arya
4 Replies

2. Shell Programming and Scripting

Help with data formatting

Hi, I am generating the following output from my script. Country,A,B,C,D,E,F INDIA ,3755019,774604,484749,329838,7333612,442031 CHINA ,3716520,889197,530899,379754,6198475,355768 JAPAN ,52038,30462,231224,147275,1272,498 USA,9494,1130,0,0,15303,451... (5 Replies)
Discussion started by: karumudi7
5 Replies

3. UNIX for Advanced & Expert Users

formatting the data

HI I want to make it single row if start with braces i.e. { .Any idea {1:XXX2460275191}{2:SEC00687921131112201641N}{3:{58910}}{4: :R:GENL :C::xx//xx1 :20C::yy//yy1 :2S:xxT} {1:XXX2460275190}{2:SEC00687921131112201641y}{3:{58911}}{4: :z:GENL :v::xx//xx1 :10C::yy//yy1 :4S:xxT ... (2 Replies)
Discussion started by: mohan705
2 Replies

4. Shell Programming and Scripting

Formatting input data

Hello everybody, I have a file containing some statistics regarding CPU usage. The file has this syntax : Fri Jul 16 14:27:16 EEST 2010 Cpu(s): 15.2%us, 1.4%sy, 0.0%ni, 82.3%id, 0.1%wa, 0.0%hi, 0.9%si, 0.0%st Fri Jul 16 15:02:17 EEST 2010 Cpu(s): 15.3%us, 1.4%sy, 0.0%ni, 82.3%id, ... (9 Replies)
Discussion started by: spiriad
9 Replies

5. Shell Programming and Scripting

Formatting Data - CSV

I want to check whether if any column data has any + , - , = prefixed to it then convert it in such a form that in excel its not read as formula. echo "$DATA" | awk 'BEGIN { OFS="," } -F" " {print $1,$2,$3,$4,$5,$6,$7,$8.$9,$10,$11,$12}' (4 Replies)
Discussion started by: dinjo_jo
4 Replies

6. UNIX for Dummies Questions & Answers

Data manipulation/ formatting question

How would I get this output to look $ cat newfile 13114 84652 84148 LIKE THIS?: 13114,84652,84148 sed,cut awk? syntax? (2 Replies)
Discussion started by: ddurden7
2 Replies

7. Shell Programming and Scripting

Script for data formatting

Hi I have to convert the data in a file ******* 01-20-09 11:14AM 60928 ABC Valuation-2009.xls 01-20-09 11:16AM 55808 DEF GHI Equation-2009.xls 01-20-09 11:02AM 52736 ABC DF Valuation-2009.xls 01-20-09 11:06AM 89600 THE... (6 Replies)
Discussion started by: shekhar_v4
6 Replies

8. Shell Programming and Scripting

Re-formatting of data display

Hi All, I have been trying to re-arrange the below data using AWK or Perl. Can anybody help me ? Thanks in advance. Input: 111 222 333 444 AAA BBB CCC DDD 555 666 777 888 EEE FFF GGG HHH Output: (6 Replies)
Discussion started by: Raynon
6 Replies

9. UNIX for Dummies Questions & Answers

Formatting Data

i am writing a script that reads in a file and i just want it to print each element on a new line here is my code and the data that i want to read in #!/usr/bin/perl use strict; use CGI qw(:standard); use CGI qw(:cgi); my $data_file = "/tmp/results.txt"; my $configuration; my... (3 Replies)
Discussion started by: nmeliasp
3 Replies
Login or Register to Ask a Question