Unix/Linux Go Back    


Shell Programming and Scripting BSD, Linux, and UNIX shell scripting — Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here.

CSV formatting with prefixing, appending and padding field

Shell Programming and Scripting


Closed    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 01-07-2009
meself meself is offline
Registered User
 
Join Date: Jan 2009
Last Activity: 30 April 2009, 9:32 PM EDT
Posts: 4
Thanks: 0
Thanked 0 Times in 0 Posts
CSV formatting with prefixing, appending and padding field

Hi I have a very large csv file with some hundreds of thousands of rows of data.

The data is in the following format:
Up to four alpha numeric characters for the first word. This is either set as 2 characters followed by 2 spaces, or as a single 4character word. This is then followed by an 8 character alphanumeric word making a total of 12 characters in total. The number of characters and positioning is the same for every line. ie, 4 and 8.


Code:
"AB  324578DE"
"ABCDAAA87958"
"ZF  4R85T79B"

I need a way to reformat the code by taking the first 4 characters and then padding them to the end of the field as an 18 character field. So there will be 6 spaces in between the 8 character word and the original first 4 characters. ie. 8, 6, and 4.

For example the data above will become:


Code:
"324578DE      AB  "
"AAA87958      ABCD"
"4R85T79B      ZF  "


This will then be prefixed with a static text of "CODE," and then appended with 18 commas to make it a proper csv file ready for database import.

See final result below(minus the double quotes):


Code:
"CODE,324578DE      AB  ,,,,,,,,,,,,,,,,,,"
"CODE,AAA87958      ABCD,,,,,,,,,,,,,,,,,,"
"CODE,4R85T79B      ZF  ,,,,,,,,,,,,,,,,,,"

I can do the last step of prefixing and appending by using the following code where input.csv will be the results of the first text swap and padding:


Code:
sed 's/^.*/CODE,&/'  input.csv | sed -e 's/$/,,,,,,,,,,,,,,,,,,/' > output.csv

however I am stuggling to perform the first step of the text swapping and padding. I was thinking of the following logic to get this to work:

1. Append 6 blank spaces to end of field.
2. Copy first 4 characters and append to end of field.
3. Delete first 4 characters.

I can't seem to get step 2 though.

I would appreciate any hints on getting this whole thing working.
Sponsored Links
    #2  
Old Unix and Linux 01-08-2009
Annihilannic Annihilannic is offline Forum Advisor  
 
Join Date: May 2008
Last Activity: 28 October 2009, 7:03 PM EDT
Location: Sydney, Australia
Posts: 1,009
Thanks: 0
Thanked 2 Times in 2 Posts
No need to run sed multiple times; you can have multiple commands in a sed script. Try this:


Code:
sed '
  s/^\(....\)\(.*\)/\2      \1/
  s/^.*/CODE,&/
  s/$/,,,,,,,,,,,,,,,,,,/
' input.csv > output.csv

The second command there identifies a 4 character substring and a rest-of-string substring, then reinserts them into the replacement with a six-character separator (much like & does, but it inserts the entire match).

However you could simplify that further and do it in one step, e.g.:


Code:
sed 's/^\(....\)\(.*\)/CODE,\2      \1,,,,,,,,,,,,,,,,,,/' input.csv > output.csv

Sponsored Links
    #3  
Old Unix and Linux 01-08-2009
meself meself is offline
Registered User
 
Join Date: Jan 2009
Last Activity: 30 April 2009, 9:32 PM EDT
Posts: 4
Thanks: 0
Thanked 0 Times in 0 Posts
Thank you Annihilannic your solution worked really well.

Linux
    #4  
Old Unix and Linux 01-08-2009
summer_cherry summer_cherry is offline Forum Advisor  
Registered User
 
Join Date: Jun 2007
Last Activity: 11 November 2016, 3:44 AM EST
Location: Beijing China
Posts: 1,305
Thanks: 0
Thanked 26 Times in 26 Posts

Code:
sed 's/\(....\)\(.*\)/\2      \1/' a.txt

Sponsored Links
    #5  
Old Unix and Linux 01-12-2009
meself meself is offline
Registered User
 
Join Date: Jan 2009
Last Activity: 30 April 2009, 9:32 PM EDT
Posts: 4
Thanks: 0
Thanked 0 Times in 0 Posts
I didn't realise it before, but the requirement is not as simple as I had first thought.

The second field is not always 8 characters as I had assumed.

I've since found that some are 3, 4, 7 characters etc.


I guess what I should have said, is that the first four characters must be appended to the end, where this appending must start after 14 characters to make a total field length of 18 characters.

So given the following:


Code:
"AB  12345"
"ABCDAA123456"
"ZF  4R85T7"
"BFR ABC"

will become:


Code:
"12345         AB  "
"AA123456      ABCD"
"4R85T7        ZF  "
"ABC           BFR "

This sounds overly complicated to me. Any help would be very appreciated.
Sponsored Links
    #6  
Old Unix and Linux 01-12-2009
Annihilannic Annihilannic is offline Forum Advisor  
 
Join Date: May 2008
Last Activity: 28 October 2009, 7:03 PM EDT
Location: Sydney, Australia
Posts: 1,009
Thanks: 0
Thanked 2 Times in 2 Posts
In that case I would use awk rather than sed to benefit from availability of printf() with appropriate formatting specifiers:


Code:
awk '{ printf("CODE,%-14s%-4s,,,,,,,,,,,,,,,,,,\n",substr($0,5),substr($0,1,4)) }' input.csv > output.csv

For example %-14s means a left-justified 14 character field.
Sponsored Links
    #7  
Old Unix and Linux 01-13-2009
meself meself is offline
Registered User
 
Join Date: Jan 2009
Last Activity: 30 April 2009, 9:32 PM EDT
Posts: 4
Thanks: 0
Thanked 0 Times in 0 Posts
Once again Annihilannic, thank you very much.

Your awk solution worked brilliantly.

Is the following understanding of your correct?


Code:
%-14s%-4s

This means a 14character field followed by a 4 character field.




Code:
 
substr($0,5),substr($0,1,4))

This means, the 14 character field of the output file is made up of the substring from position 5 onwards from the input file and the 4 character field of the output file consists of the substring from position 1-4 from the original input file.


Sorry if it's a silly question but I'd much rather learn as I go instead of just blindly copying code without learning anything.
Sponsored Links
Closed

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
prefixing filenames ravi raj kumar Shell Programming and Scripting 2 11-20-2007 06:17 AM
Byte Padding naan Programming 6 02-22-2007 05:09 AM
Addition with the prefixing zeros included sivasenthil_k UNIX for Dummies Questions & Answers 16 09-30-2006 11:20 AM
Padding vijaygopalsk UNIX for Dummies Questions & Answers 2 06-27-2003 11:51 AM
Padding issues informshilpa UNIX for Advanced & Expert Users 2 03-01-2002 01:51 PM



All times are GMT -4. The time now is 03:04 AM.