CSV formatting with prefixing, appending and padding field


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting CSV formatting with prefixing, appending and padding field
# 1  
Old 01-07-2009
CSV formatting with prefixing, appending and padding field

Hi I have a very large csv file with some hundreds of thousands of rows of data.

The data is in the following format:
Up to four alpha numeric characters for the first word. This is either set as 2 characters followed by 2 spaces, or as a single 4character word. This is then followed by an 8 character alphanumeric word making a total of 12 characters in total. The number of characters and positioning is the same for every line. ie, 4 and 8.

Code:
"AB  324578DE"
"ABCDAAA87958"
"ZF  4R85T79B"

I need a way to reformat the code by taking the first 4 characters and then padding them to the end of the field as an 18 character field. So there will be 6 spaces in between the 8 character word and the original first 4 characters. ie. 8, 6, and 4.

For example the data above will become:

Code:
"324578DE      AB  "
"AAA87958      ABCD"
"4R85T79B      ZF  "


This will then be prefixed with a static text of "CODE," and then appended with 18 commas to make it a proper csv file ready for database import.

See final result below(minus the double quotes):

Code:
"CODE,324578DE      AB  ,,,,,,,,,,,,,,,,,,"
"CODE,AAA87958      ABCD,,,,,,,,,,,,,,,,,,"
"CODE,4R85T79B      ZF  ,,,,,,,,,,,,,,,,,,"

I can do the last step of prefixing and appending by using the following code where input.csv will be the results of the first text swap and padding:

Code:
sed 's/^.*/CODE,&/'  input.csv | sed -e 's/$/,,,,,,,,,,,,,,,,,,/' > output.csv

however I am stuggling to perform the first step of the text swapping and padding. I was thinking of the following logic to get this to work:

1. Append 6 blank spaces to end of field.
2. Copy first 4 characters and append to end of field.
3. Delete first 4 characters.

I can't seem to get step 2 though.

I would appreciate any hints on getting this whole thing working.
# 2  
Old 01-08-2009
No need to run sed multiple times; you can have multiple commands in a sed script. Try this:

Code:
sed '
  s/^\(....\)\(.*\)/\2      \1/
  s/^.*/CODE,&/
  s/$/,,,,,,,,,,,,,,,,,,/
' input.csv > output.csv

The second command there identifies a 4 character substring and a rest-of-string substring, then reinserts them into the replacement with a six-character separator (much like & does, but it inserts the entire match).

However you could simplify that further and do it in one step, e.g.:

Code:
sed 's/^\(....\)\(.*\)/CODE,\2      \1,,,,,,,,,,,,,,,,,,/' input.csv > output.csv

# 3  
Old 01-08-2009
Thank you Annihilannic your solution worked really well.

Smilie
# 4  
Old 01-08-2009
Code:
sed 's/\(....\)\(.*\)/\2      \1/' a.txt

# 5  
Old 01-12-2009
I didn't realise it before, but the requirement is not as simple as I had first thought.

The second field is not always 8 characters as I had assumed.

I've since found that some are 3, 4, 7 characters etc.


I guess what I should have said, is that the first four characters must be appended to the end, where this appending must start after 14 characters to make a total field length of 18 characters.

So given the following:

Code:
"AB  12345"
"ABCDAA123456"
"ZF  4R85T7"
"BFR ABC"

will become:

Code:
"12345         AB  "
"AA123456      ABCD"
"4R85T7        ZF  "
"ABC           BFR "

This sounds overly complicated to me. Any help would be very appreciated.
# 6  
Old 01-12-2009
In that case I would use awk rather than sed to benefit from availability of printf() with appropriate formatting specifiers:

Code:
awk '{ printf("CODE,%-14s%-4s,,,,,,,,,,,,,,,,,,\n",substr($0,5),substr($0,1,4)) }' input.csv > output.csv

For example %-14s means a left-justified 14 character field.
# 7  
Old 01-13-2009
Once again Annihilannic, thank you very much.

Your awk solution worked brilliantly.

Is the following understanding of your correct?

Code:
%-14s%-4s

This means a 14character field followed by a 4 character field.



Code:
 
substr($0,5),substr($0,1,4))

This means, the 14 character field of the output file is made up of the substring from position 5 onwards from the input file and the 4 character field of the output file consists of the substring from position 1-4 from the original input file.


Sorry if it's a silly question but I'd much rather learn as I go instead of just blindly copying code without learning anything.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Match columns from two csv files and update field in one of the csv file

Hi, I have a file of csv data, which looks like this: file1: 1AA,LGV_PONCEY_LES_ATHEE,1,\N,1,00020460E1,0,\N,\N,\N,\N,2,00.22335321,0.00466628 2BB,LES_POUGES_ASF,\N,200,200,00006298G1,0,\N,\N,\N,\N,1,00.30887539,0.00050312... (10 Replies)
Discussion started by: djoseph
10 Replies

2. Linux

How do I format a Date field of a .CSV file with multiple commas in a string field?

I have a .CSV file (file.csv) whose data are all enclosed in double quotes. Sample format of the file is as below: column1,column2,column3,column4,column5,column6, column7, Column8, Column9, Column10 "12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in... (3 Replies)
Discussion started by: dhruuv369
3 Replies

3. Shell Programming and Scripting

Padding a csv value with 0's

I have this csv file that I would like to sort on the 20th and 21st field. They are high lighted below. My challenge is that when I sort on those fields they are not in order as I would have liked. It seems like I have to pad those fields to the longest value in that fields data. ... (6 Replies)
Discussion started by: GroveTuckey
6 Replies

4. Shell Programming and Scripting

Sort, sed, and zero padding date column csv bash scripting

Hello people, I am having problem to sort, sed and zero padding of column in csv file. 7th column only. Input of csv file: 1,2,3,4,5,6,4/1/2010 12:00 AM,8 1,2,3,4,5,6,3/11/2010 9:39 AM,8 1,2,3,4,5,6,5/12/2011 3:43 PM,8 1,2,3,4,5,6,12/20/2009 7:23 PM,8 Output:... (5 Replies)
Discussion started by: sean1357
5 Replies

5. Shell Programming and Scripting

Appending = in particular column in csv file

Hi, I have a requirement to append = in particular row in csv file. Data in csv is as follow: row1,a,a,a row2,b,b,b row3,c,c,c row4,d,d,d csv should be modified at row3 and no. of columns are not fixed but rows are. output should be as: row1,a,a,a row2,b,b,b row3,=c,=c,=c... (2 Replies)
Discussion started by: Divya1987
2 Replies

6. Shell Programming and Scripting

Appending a parameter value to a .csv file???

Hi I have a date which I get as parameter. I want it to be added as first column in all the rows in a csv file. I have tried the below code but no success. date=$1 awk -F"," '{print $date","$0}' z3.csv > z4.csv Could you tell the correct code for the above req.? How to use code... (1 Reply)
Discussion started by: msp2244
1 Replies

7. Shell Programming and Scripting

Help with appending to a csv file

I am working to import a file and have almost all the information accounted for except this last item. First of all the file name is the category and this is what I need to append to the file. The catch I believe is the fact the file name has spaces in it. Filename CPU Products.csv File... (3 Replies)
Discussion started by: nkr1ptd
3 Replies

8. Shell Programming and Scripting

Appending 1st field in a file into 2nd field in another file

Hi, I've internally searched through forums for about 2+ hours. Unfortunately, with no luck. Although I've found some cases close to mine below, but didn't help so much. Actually, I'm in short with time. So I had to post my case. Hoping that you can help. I have 2 files, FILE1 ... (0 Replies)
Discussion started by: amurib
0 Replies

9. Shell Programming and Scripting

Need help appending a string to a field

Hi, This is driving me nuts, can't think of any easy way to do it. I need to append a string ".00" only in the third field of a file, and only if it does NOT have a decimal point already Here is what the file looks like- 1400030846,2,17,POL GENERAL 1400030900,3,14.95,FIC GENERAL If... (7 Replies)
Discussion started by: sfisk
7 Replies

10. Shell Programming and Scripting

Keeping padding in a date field

Hi Guys, I'm having a bit of a problem with a script, i need to get the day, month and day of month into a string, so i'm using: CURRENT_DATE=`date +"%a %b %e"` It is getting the correct date out, however it is not keeping the padding on the day of month. The %e is supposed to pad the day... (5 Replies)
Discussion started by: seanbyrne
5 Replies
Login or Register to Ask a Question