Visit Our UNIX and Linux User Community


CSV formatting with prefixing, appending and padding field


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting CSV formatting with prefixing, appending and padding field
# 1  
Old 01-07-2009
CSV formatting with prefixing, appending and padding field

Hi I have a very large csv file with some hundreds of thousands of rows of data.

The data is in the following format:
Up to four alpha numeric characters for the first word. This is either set as 2 characters followed by 2 spaces, or as a single 4character word. This is then followed by an 8 character alphanumeric word making a total of 12 characters in total. The number of characters and positioning is the same for every line. ie, 4 and 8.

Code:
"AB  324578DE"
"ABCDAAA87958"
"ZF  4R85T79B"

I need a way to reformat the code by taking the first 4 characters and then padding them to the end of the field as an 18 character field. So there will be 6 spaces in between the 8 character word and the original first 4 characters. ie. 8, 6, and 4.

For example the data above will become:

Code:
"324578DE      AB  "
"AAA87958      ABCD"
"4R85T79B      ZF  "


This will then be prefixed with a static text of "CODE," and then appended with 18 commas to make it a proper csv file ready for database import.

See final result below(minus the double quotes):

Code:
"CODE,324578DE      AB  ,,,,,,,,,,,,,,,,,,"
"CODE,AAA87958      ABCD,,,,,,,,,,,,,,,,,,"
"CODE,4R85T79B      ZF  ,,,,,,,,,,,,,,,,,,"

I can do the last step of prefixing and appending by using the following code where input.csv will be the results of the first text swap and padding:

Code:
sed 's/^.*/CODE,&/'  input.csv | sed -e 's/$/,,,,,,,,,,,,,,,,,,/' > output.csv

however I am stuggling to perform the first step of the text swapping and padding. I was thinking of the following logic to get this to work:

1. Append 6 blank spaces to end of field.
2. Copy first 4 characters and append to end of field.
3. Delete first 4 characters.

I can't seem to get step 2 though.

I would appreciate any hints on getting this whole thing working.
# 2  
Old 01-08-2009
No need to run sed multiple times; you can have multiple commands in a sed script. Try this:

Code:
sed '
  s/^\(....\)\(.*\)/\2      \1/
  s/^.*/CODE,&/
  s/$/,,,,,,,,,,,,,,,,,,/
' input.csv > output.csv

The second command there identifies a 4 character substring and a rest-of-string substring, then reinserts them into the replacement with a six-character separator (much like & does, but it inserts the entire match).

However you could simplify that further and do it in one step, e.g.:

Code:
sed 's/^\(....\)\(.*\)/CODE,\2      \1,,,,,,,,,,,,,,,,,,/' input.csv > output.csv

# 3  
Old 01-08-2009
Thank you Annihilannic your solution worked really well.

Smilie
# 4  
Old 01-08-2009
Code:
sed 's/\(....\)\(.*\)/\2      \1/' a.txt

# 5  
Old 01-12-2009
I didn't realise it before, but the requirement is not as simple as I had first thought.

The second field is not always 8 characters as I had assumed.

I've since found that some are 3, 4, 7 characters etc.


I guess what I should have said, is that the first four characters must be appended to the end, where this appending must start after 14 characters to make a total field length of 18 characters.

So given the following:

Code:
"AB  12345"
"ABCDAA123456"
"ZF  4R85T7"
"BFR ABC"

will become:

Code:
"12345         AB  "
"AA123456      ABCD"
"4R85T7        ZF  "
"ABC           BFR "

This sounds overly complicated to me. Any help would be very appreciated.
# 6  
Old 01-12-2009
In that case I would use awk rather than sed to benefit from availability of printf() with appropriate formatting specifiers:

Code:
awk '{ printf("CODE,%-14s%-4s,,,,,,,,,,,,,,,,,,\n",substr($0,5),substr($0,1,4)) }' input.csv > output.csv

For example %-14s means a left-justified 14 character field.
# 7  
Old 01-13-2009
Once again Annihilannic, thank you very much.

Your awk solution worked brilliantly.

Is the following understanding of your correct?

Code:
%-14s%-4s

This means a 14character field followed by a 4 character field.



Code:
 
substr($0,5),substr($0,1,4))

This means, the 14 character field of the output file is made up of the substring from position 5 onwards from the input file and the 4 character field of the output file consists of the substring from position 1-4 from the original input file.


Sorry if it's a silly question but I'd much rather learn as I go instead of just blindly copying code without learning anything.

Previous Thread | Next Thread
Test Your Knowledge in Computers #689
Difficulty: Medium
The HP-35, the world's first handheld scientific calculator, introduced the classical two-level RPN in 1972.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Match columns from two csv files and update field in one of the csv file

Hi, I have a file of csv data, which looks like this: file1: 1AA,LGV_PONCEY_LES_ATHEE,1,\N,1,00020460E1,0,\N,\N,\N,\N,2,00.22335321,0.00466628 2BB,LES_POUGES_ASF,\N,200,200,00006298G1,0,\N,\N,\N,\N,1,00.30887539,0.00050312... (10 Replies)
Discussion started by: djoseph
10 Replies

2. Linux

How do I format a Date field of a .CSV file with multiple commas in a string field?

I have a .CSV file (file.csv) whose data are all enclosed in double quotes. Sample format of the file is as below: column1,column2,column3,column4,column5,column6, column7, Column8, Column9, Column10 "12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in... (3 Replies)
Discussion started by: dhruuv369
3 Replies

3. Shell Programming and Scripting

Padding a csv value with 0's

I have this csv file that I would like to sort on the 20th and 21st field. They are high lighted below. My challenge is that when I sort on those fields they are not in order as I would have liked. It seems like I have to pad those fields to the longest value in that fields data. ... (6 Replies)
Discussion started by: GroveTuckey
6 Replies

4. Shell Programming and Scripting

Sort, sed, and zero padding date column csv bash scripting

Hello people, I am having problem to sort, sed and zero padding of column in csv file. 7th column only. Input of csv file: 1,2,3,4,5,6,4/1/2010 12:00 AM,8 1,2,3,4,5,6,3/11/2010 9:39 AM,8 1,2,3,4,5,6,5/12/2011 3:43 PM,8 1,2,3,4,5,6,12/20/2009 7:23 PM,8 Output:... (5 Replies)
Discussion started by: sean1357
5 Replies

5. Shell Programming and Scripting

Appending = in particular column in csv file

Hi, I have a requirement to append = in particular row in csv file. Data in csv is as follow: row1,a,a,a row2,b,b,b row3,c,c,c row4,d,d,d csv should be modified at row3 and no. of columns are not fixed but rows are. output should be as: row1,a,a,a row2,b,b,b row3,=c,=c,=c... (2 Replies)
Discussion started by: Divya1987
2 Replies

6. Shell Programming and Scripting

Appending a parameter value to a .csv file???

Hi I have a date which I get as parameter. I want it to be added as first column in all the rows in a csv file. I have tried the below code but no success. date=$1 awk -F"," '{print $date","$0}' z3.csv > z4.csv Could you tell the correct code for the above req.? How to use code... (1 Reply)
Discussion started by: msp2244
1 Replies

7. Shell Programming and Scripting

Help with appending to a csv file

I am working to import a file and have almost all the information accounted for except this last item. First of all the file name is the category and this is what I need to append to the file. The catch I believe is the fact the file name has spaces in it. Filename CPU Products.csv File... (3 Replies)
Discussion started by: nkr1ptd
3 Replies

8. Shell Programming and Scripting

Appending 1st field in a file into 2nd field in another file

Hi, I've internally searched through forums for about 2+ hours. Unfortunately, with no luck. Although I've found some cases close to mine below, but didn't help so much. Actually, I'm in short with time. So I had to post my case. Hoping that you can help. I have 2 files, FILE1 ... (0 Replies)
Discussion started by: amurib
0 Replies

9. Shell Programming and Scripting

Need help appending a string to a field

Hi, This is driving me nuts, can't think of any easy way to do it. I need to append a string ".00" only in the third field of a file, and only if it does NOT have a decimal point already Here is what the file looks like- 1400030846,2,17,POL GENERAL 1400030900,3,14.95,FIC GENERAL If... (7 Replies)
Discussion started by: sfisk
7 Replies

10. Shell Programming and Scripting

Keeping padding in a date field

Hi Guys, I'm having a bit of a problem with a script, i need to get the day, month and day of month into a string, so i'm using: CURRENT_DATE=`date +"%a %b %e"` It is getting the correct date out, however it is not keeping the padding on the day of month. The %e is supposed to pad the day... (5 Replies)
Discussion started by: seanbyrne
5 Replies

Featured Tech Videos