Visit Our UNIX and Linux User Community


Padding a csv value with 0's


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Padding a csv value with 0's
# 1  
Old 08-28-2013
Padding a csv value with 0's

I have this csv file that I would like to sort on the 20th and 21st field. They are high lighted below. My challenge is that when I sort on those fields they are not in order as I would have liked. It seems like I have to pad those fields to the longest value in that fields data.

Code:
OrderNum,MerrillRecipientID,CustomerClass,MerrillItemNum,PODTemplateID,GridCode,AetnaDocID,MemberID,FirstName,MI,LastName,Address1,Address2,Address3,City,State,Zip,Country,OEL,PalletNum,PckgNum,IMBCode,ProcDate
"M394993","M39499300010000001","0GH","3GH000503","PDP","BO","1011250","MEBB04CB","Name","","Name","address","","","City","SC","29170-2043","","*******AUTO**SCH 5-DIGIT 29033","P1","PK5","2031100094470495539729170204309","3GH000503","August 26, 2013"
"M394993","M39499300010000002","0GH","3GH000503","PDP","BO","1011572","MEBB07GB","Name","G","Name","address","","","City","SC","29020-2912","","*********AUTO**SCH 3-DIGIT 290","P1","PK1","3031100094470495580529020291210","3GH000503","August 26, 2013"
"M394993","M39499300010000003","0GH","3GH000503","PDP","BO","1011693","MEBB08MP","Name","B","Name","address","","","City","SC","29061-9447","","*********AUTO**SCH 3-DIGIT 290","P1","PK2","3031100094470495583729061944757","3GH000503","August 26, 2013"
"M394993","M39499300010000004","0GH","3GH000503","PDP","BO","1011751","MEBB097M","Name","A","Name","address","","","City","SC","29645-0433","","*************AUTO**3-DIGIT 296","P1","PK31","3031100094470495629629645043333","3GH000503","August 26, 2013"
"M394993","M39499300010000005","0GH","3GH000503","PDP","BO","1012075","MEBB0K4L","Name","E","Name","address","","","City","SC","29682-9634","","*************AUTO**3-DIGIT 296","P1","PK33","3031100094470495637929682963428","3GH000503","August 26, 2013"
"M394993","M39499300010000006","0GH","3GH000503","PDP","BO","1012437","MEBB0TWQ","Name","R","Name","address","","","City","SC","29505-3030","","*******AUTO**SCH 5-DIGIT 29501","P1","PK24","2031100094470495556429505303050","3GH000503","August 26, 2013"
"M394993","M39499300010000007","0GH","3GH000503","PDP","BO","1012750","MEBB0YJY","Name","L","Name","address","","","City","SC","29642-3006","","***********AUTO**5-DIGIT 29642","P1","PK38","2031100094470495567529642300601","3GH000503","August 26, 2013"

So from the above data I need to have the file look like so

Code:
OrderNum,MerrillRecipientID,CustomerClass,MerrillItemNum,PODTemplateID,GridCode,AetnaDocID,MemberID,FirstName,MI,LastName,Address1,Address2,Address3,City,State,Zip,Country,OEL,PalletNum,PckgNum,IMBCode,ProcDate
"M394993","M39499300010000001","0GH","3GH000503","PDP","BO","1011250","MEBB04CB","Name","","Name","address","","","City","SC","29170-2043","","*******AUTO**SCH 5-DIGIT 29033","P1","PK05","2031100094470495539729170204309","3GH000503","August 26, 2013"
"M394993","M39499300010000002","0GH","3GH000503","PDP","BO","1011572","MEBB07GB","Name","G","Name","address","","","City","SC","29020-2912","","*********AUTO**SCH 3-DIGIT 290","P1","PK01","3031100094470495580529020291210","3GH000503","August 26, 2013"
"M394993","M39499300010000003","0GH","3GH000503","PDP","BO","1011693","MEBB08MP","Name","B","Name","address","","","City","SC","29061-9447","","*********AUTO**SCH 3-DIGIT 290","P1","PK02","3031100094470495583729061944757","3GH000503","August 26, 2013"
"M394993","M39499300010000004","0GH","3GH000503","PDP","BO","1011751","MEBB097M","Name","A","Name","address","","","City","SC","29645-0433","","*************AUTO**3-DIGIT 296","P1","PK31","3031100094470495629629645043333","3GH000503","August 26, 2013"
"M394993","M39499300010000005","0GH","3GH000503","PDP","BO","1012075","MEBB0K4L","Name","E","Name","address","","","City","SC","29682-9634","","*************AUTO**3-DIGIT 296","P1","PK33","3031100094470495637929682963428","3GH000503","August 26, 2013"
"M394993","M39499300010000006","0GH","3GH000503","PDP","BO","1012437","MEBB0TWQ","Name","R","Name","address","","","City","SC","29505-3030","","*******AUTO**SCH 5-DIGIT 29501","P1","PK24","2031100094470495556429505303050","3GH000503","August 26, 2013"
"M394993","M39499300010000007","0GH","3GH000503","PDP","BO","1012750","MEBB0YJY","Name","L","Name","address","","","City","SC","29642-3006","","***********AUTO**5-DIGIT 29642","P1","PK38","2031100094470495567529642300601","3GH000503","August 26, 2013"

the P1 field could be P100 and so I would need to pad P1 to P001. But really it justs need to be whatever the max length is. I can sort the file on the two fields but not sure how to pad them.

Thanks in advance for your help.
# 2  
Old 08-28-2013
Code:
 
bash-3.2$ cat txt
OrderNum,MerrillRecipientID,CustomerClass,MerrillItemNum,PODTemplateID,GridCode,AetnaDocID,MemberID,FirstName,MI,LastName,Address1,Address2,Address3,City,State,Zip,Country,OEL,PalletNum,PckgNum,IMBCode,ProcDate
"M394993","M39499300010000001","0GH","3GH000503","PDP","BO","1011250","MEBB04CB","Name","","Name","address","","","City","SC","29170-2043","","*******AUTO**SCH 5-DIGIT 29033","P1","PK5","2031100094470495539729170204309","3GH000503","August 26, 2013"
"M394993","M39499300010000002","0GH","3GH000503","PDP","BO","1011572","MEBB07GB","Name","G","Name","address","","","City","SC","29020-2912","","*********AUTO**SCH 3-DIGIT 290","P1","PK1","3031100094470495580529020291210","3GH000503","August 26, 2013"
"M394993","M39499300010000003","0GH","3GH000503","PDP","BO","1011693","MEBB08MP","Name","B","Name","address","","","City","SC","29061-9447","","*********AUTO**SCH 3-DIGIT 290","P1","PK2","3031100094470495583729061944757","3GH000503","August 26, 2013"
"M394993","M39499300010000004","0GH","3GH000503","PDP","BO","1011751","MEBB097M","Name","A","Name","address","","","City","SC","29645-0433","","*************AUTO**3-DIGIT 296","P1","PK31","3031100094470495629629645043333","3GH000503","August 26, 2013"
"M394993","M39499300010000005","0GH","3GH000503","PDP","BO","1012075","MEBB0K4L","Name","E","Name","address","","","City","SC","29682-9634","","*************AUTO**3-DIGIT 296","P1","PK33","3031100094470495637929682963428","3GH000503","August 26, 2013"
"M394993","M39499300010000006","0GH","3GH000503","PDP","BO","1012437","MEBB0TWQ","Name","R","Name","address","","","City","SC","29505-3030","","*******AUTO**SCH 5-DIGIT 29501","P1","PK24","2031100094470495556429505303050","3GH000503","August 26, 2013"
"M394993","M39499300010000007","0GH","3GH000503","PDP","BO","1012750","MEBB0YJY","Name","L","Name","address","","","City","SC","29642-3006","","***********AUTO**5-DIGIT 29642","P1","PK38","2031100094470495567529642300601","3GH000503","August 26, 2013"
bash-3.2$ 
bash-3.2$ 
bash-3.2$ 
bash-3.2$ awk -F, '{print $20$21"="$0}' txt | perl -MSort::Naturally -lne ' my($key,$val)=split(/=/,$_); if($key eq 'PalletNumPckgNum'){print $val; next;} $x{$key}=$val; END { foreach (nsort keys %x ){ print $x{$_}; } }' 
OrderNum,MerrillRecipientID,CustomerClass,MerrillItemNum,PODTemplateID,GridCode,AetnaDocID,MemberID,FirstName,MI,LastName,Address1,Address2,Address3,City,State,Zip,Country,OEL,PalletNum,PckgNum,IMBCode,ProcDate
"M394993","M39499300010000002","0GH","3GH000503","PDP","BO","1011572","MEBB07GB","Name","G","Name","address","","","City","SC","29020-2912","","*********AUTO**SCH 3-DIGIT 290","P1","PK1","3031100094470495580529020291210","3GH000503","August 26, 2013"
"M394993","M39499300010000003","0GH","3GH000503","PDP","BO","1011693","MEBB08MP","Name","B","Name","address","","","City","SC","29061-9447","","*********AUTO**SCH 3-DIGIT 290","P1","PK2","3031100094470495583729061944757","3GH000503","August 26, 2013"
"M394993","M39499300010000001","0GH","3GH000503","PDP","BO","1011250","MEBB04CB","Name","","Name","address","","","City","SC","29170-2043","","*******AUTO**SCH 5-DIGIT 29033","P1","PK5","2031100094470495539729170204309","3GH000503","August 26, 2013"
"M394993","M39499300010000006","0GH","3GH000503","PDP","BO","1012437","MEBB0TWQ","Name","R","Name","address","","","City","SC","29505-3030","","*******AUTO**SCH 5-DIGIT 29501","P1","PK24","2031100094470495556429505303050","3GH000503","August 26, 2013"
"M394993","M39499300010000004","0GH","3GH000503","PDP","BO","1011751","MEBB097M","Name","A","Name","address","","","City","SC","29645-0433","","*************AUTO**3-DIGIT 296","P1","PK31","3031100094470495629629645043333","3GH000503","August 26, 2013"
"M394993","M39499300010000005","0GH","3GH000503","PDP","BO","1012075","MEBB0K4L","Name","E","Name","address","","","City","SC","29682-9634","","*************AUTO**3-DIGIT 296","P1","PK33","3031100094470495637929682963428","3GH000503","August 26, 2013"
"M394993","M39499300010000007","0GH","3GH000503","PDP","BO","1012750","MEBB0YJY","Name","L","Name","address","","","City","SC","29642-3006","","***********AUTO**5-DIGIT 29642","P1","PK38","2031100094470495567529642300601","3GH000503","August 26, 2013"


You need this perl module Sort::Naturally
Sort::Naturally - search.cpan.org

Just make a directory called Sort, and put Naturally.pm within the Sort directory.

Save the csv file up one level of the Sort directory and launch that command.
# 3  
Old 08-28-2013
You may try to develop on

Code:
sort -t, -nk 21.4 file

# 4  
Old 08-28-2013
Quote:
Originally Posted by krishmaths
You may try to develop on

Code:
sort -t, -nk 21.4 file

Not having tried this yet, But can I sort taking both 20 and 21 into consideration? I will be able to mess with this in about 1/2 hour or so.

Mr.Bean, I will try your method if I can't get anything else to work. I would prefer to no have to install anything to make this happen.

Thanks to both on the reply.
# 5  
Old 08-28-2013
How do you want field 21 sorted...lexically or numerically...and post a sample of how the output looks like because from your post there seems to be no difference in the input and output...and you can try the sort below which will divvy up field 21 into 2 parts i.e. the first 3 characters are sorted lexically and the rest are sorted numerically...is this something you are looking for...
Code:
sort -t, -k21.1,21.3 -k21.4,21n file

# 6  
Old 08-28-2013
This solution is more generic.
Create new keyfield(s) which are usable for sort.
Sort it and then remove those extra fields. This not works if fld value include comma (,).
Code:
awk 'BEGIN { FS=","
             OFS=","
           }
NR==1 { print $0,"A","0","AA","0" ; next } # sort key for headerline
{
    f20=$20
    f21=$21
    gsub(/"/,"",f20)
    gsub(/"/,"",f21)
    f20a=substr(f20,1,1) # 1st char
    f20b=substr(f20,2)  # rest is numeric
    f21a=substr(f21,1,2) # 2 char
    f21b=substr(f21,3)   # rest is numeric
    print $0,f20a,f20b,f21a,f21b
}

' inputfile | \
        sort -t, -k 26,26 -nk 27,27 -k 28,28 -nk 29,29 | \
        awk 'BEGIN { FS=","  }
                {
                  for (i=1;i<=24;i++)
                        printf("%s,",$i)
                  printf("%s\n",$25)
                } '

# 7  
Old 08-28-2013
Quote:
Originally Posted by kshji
This solution is more generic.
Create new keyfield(s) which are usable for sort.
Sort it and then remove those extra fields. This not works if fld value include comma (,).
Code:
awk 'BEGIN { FS=","
             OFS=","
           }
NR==1 { print $0,"A","0","AA","0" ; next } # sort key for headerline
{
    f20=$20
    f21=$21
    gsub(/"/,"",f20)
    gsub(/"/,"",f21)
    f20a=substr(f20,1,1) # 1st char
    f20b=substr(f20,2)  # rest is numeric
    f21a=substr(f21,1,2) # 2 char
    f21b=substr(f21,3)   # rest is numeric
    print $0,f20a,f20b,f21a,f21b
}

' inputfile | \
        sort -t, -k 26,26 -nk 27,27 -k 28,28 -nk 29,29 | \
        awk 'BEGIN { FS=","  }
                {
                  for (i=1;i<=24;i++)
                        printf("%s,",$i)
                  printf("%s\n",$25)
                } '

I ended up with a file that looked like so

Code:
OrderNum,MerrillRecipientID,CustomerClass,MerrillItemNum,PODTemplateID,GridCode,AetnaDocID,MemberID,FirstName,MI,LastName,Address1,Address2,Address3,City,State,Zip,Country,OEL,PalletNum,PckgNum,IMBCode,LocationID,ProcDate
,A
"V670443","V67044300010000320","0GH","3GH000501","HMO_MAPD","2E","1001086","ME091036","Name","G","Name","Address SPRUCE ST APT 1710","","","PHILADELPHIA","PA","19102-4554","","***********AUTO**5-DIGIT 19102","P1","PK1","2030110676000685569319102455435","3GH000501","August 28, 2013"
"V670443","V67044300010000698","0GH","3GH000501","HMO_MAPD","2E","1003663","ME321579","Name","N","Name","address SPRUCE ST APT 709","THE DRAKE","","PHILADELPHIA","PA","19102-4551","","***********AUTO**5-DIGIT 19102","P1","PK1","2030110676000685569419102455184","3GH000501","August 28, 2013"
"V670443","V67044300010001103","0GH","3GH000501","HMO_MAPD","2E","1038732","MEBBC41B","Name","J","Name","address LOCUST ST APT 20N","","","PHILADELPHIA","PA","19102-4212","","***********AUTO**5-DIGIT 19102","P1","PK1","2030110676000685569519102421224","3GH000501","August 28, 2013"
"V670443","V67044300010001179","0GH","3GH000501","HMO_MAPD","2E","1039945","MEBBFYZM","Name","J","Name","address S HICKS ST APT 3A","","","PHILA","PA","19102-4947","","***********AUTO**5-DIGIT 19102","P1","PK1","2030110676000685569619102494731","3GH000501","August 28, 2013"
"V670443","V67044300010001825","0GH","3GH000501","HMO_MAPD","2E","1053739","MEBBQNQL","Name","D","Name","address LOCUST ST APT 16N","","","PHILA","PA","19102-4210","","***********AUTO**5-DIGIT 19102","P1","PK1","2030110676000685569719102421084","3GH000501","August 28, 2013"
"V670443","V67044300010002280","0GH","3GH000501","HMO_MAPD","2E","1108001","MEBCHSQS","Name","H","Name","address LOCUST ST APT 8H","","","PHILADELPHIA","PA","19102-4206","","***********AUTO**5-DIGIT 19102","P1","PK1","2030110676000685569819102420688","3GH000501","August 28, 2013"

I will look to see why it kept that ,A line.

Thanks!

Last edited by GroveTuckey; 08-29-2013 at 12:10 AM..

Previous Thread | Next Thread
Test Your Knowledge in Computers #41
Difficulty: Easy
In computer lingo, FAT stands for 'File Access Table'.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Sort, sed, and zero padding date column csv bash scripting

Hello people, I am having problem to sort, sed and zero padding of column in csv file. 7th column only. Input of csv file: 1,2,3,4,5,6,4/1/2010 12:00 AM,8 1,2,3,4,5,6,3/11/2010 9:39 AM,8 1,2,3,4,5,6,5/12/2011 3:43 PM,8 1,2,3,4,5,6,12/20/2009 7:23 PM,8 Output:... (5 Replies)
Discussion started by: sean1357
5 Replies

2. Shell Programming and Scripting

Padding leading zero

hi All i am new to linux... source txt .. 281-BUM-5M BUM-5M 0 0 282-BUM-5M BUM-5M 0 0 83-BUM-5M BUM-5M 0 0 is it possible to use bash script to convert to (remove the "-" and fill up to 4 digit" ? 0281 BUM-5M BUM-5M 0 0 0282 BUM-5M BUM-5M 0 0 0083 BUM-5M BUM-5M 0 0 thanks a ... (5 Replies)
Discussion started by: samoptimus
5 Replies

3. Shell Programming and Scripting

Padding with zeros.

Hi Friends, I would like to left pad with "0's" on first column say (width six) I have a large file with the format: FILE: 1: ALFRED 84378 NY 8385: JAMES 88385 FL 323: SMITH 00850 TX My output needs to be like: 000001: ALFRED 84378 NY 008385: JAMES 88385 FL 000323: SMITH... (10 Replies)
Discussion started by: sbasetty
10 Replies

4. UNIX for Dummies Questions & Answers

Zero padding dates

I have a file with records containing dates like: SMPBR|DUP-DO NOT USE|NEW YORK||16105|BA5270715|2007-6-6|MWERNER|109||||JOHN||SMITH|MD|72211118||||||74559|21 WILMINGTON RD||D|2003-11-6|SL# MD CONTACT-LIZ RICHARDS|||0|Y|N||1411458| How can I get the date fields in each of my records to be... (1 Reply)
Discussion started by: ChicagoBlues
1 Replies

5. Shell Programming and Scripting

CSV formatting with prefixing, appending and padding field

Hi I have a very large csv file with some hundreds of thousands of rows of data. The data is in the following format: Up to four alpha numeric characters for the first word. This is either set as 2 characters followed by 2 spaces, or as a single 4character word. This is then followed by an 8... (7 Replies)
Discussion started by: meself
7 Replies

6. Shell Programming and Scripting

Padding in Unix

I have a file with different character counts on each line how do i make it with unique character counts. example: 1st line : ABCD 011 XYZ 0000 YYYY BBB TEADINGDA 2nd line: ABCD 011 xys 0010 YYYY BBB TEAD 3rd line : ABCD 022 YXU 000 UUU BBB TE 1st line is 43... (3 Replies)
Discussion started by: rudoraj
3 Replies

7. Programming

Byte Padding

Hi, Can someone explain what is byte padding? For ex: struct emp{ char s; int b; char s1; int b1; long b3; char s3; } What will be the size of this structure? Thanks (6 Replies)
Discussion started by: naan
6 Replies

8. UNIX for Dummies Questions & Answers

Padding

Hi Can anyone tell me how to pad zeroes on the left side to a numeric string in unix shell scripting Your answer is very much appreciated Thanks Vijay (2 Replies)
Discussion started by: vijaygopalsk
2 Replies

9. Programming

Zero Padding to a string

I am writing a C program which a part of it needs to padding zero in front of a string. The program will get a sting from an ASCII file which the maxium length of this string is 5 char long. The string can sometimes less the 5 char long. In order to make it with the same length '0's are being... (3 Replies)
Discussion started by: Wing m. Cheng
3 Replies

10. UNIX for Advanced & Expert Users

Padding issues

Hello, Structure padding & structure size are different on Compaq & HP UNIX. When structures are transfered via netfork from Compaq to HP will this be a problem? If yes, what can be the solution? Thanks, shilpa (2 Replies)
Discussion started by: informshilpa
2 Replies

Featured Tech Videos