Remove leading zeroes in 2nd field using sed


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove leading zeroes in 2nd field using sed
# 8  
Old 10-04-2010
thank you guys for all your proposed solutions. Will test them out.
# 9  
Old 10-04-2010
hergp, alister, you are right. It is because of the *, so that zero matches is a match too and therefore there is always a match (and a substitution) in the second column on every row....

Last edited by Scrutinizer; 10-04-2010 at 03:24 PM..
# 10  
Old 10-04-2010
A minor trick for more speed -- no sub if no zero:

Code:
sed 's/^\([^|]*|\)00*/\1/' infile

Do you want the units zero preserved?

Code:
sed 's/^\([^|]*|\)0\{1,99\}\([0-9]\{1,99\}|\)/\1\2/' infile

The first variable count metastring '\{1,99\}' takes precedence,
as sed processes greedily left to right,
but has to stop and leave the last digit.

Last edited by Scott; 10-04-2010 at 03:53 PM.. Reason: Please use code tags
# 11  
Old 10-04-2010
Good observation DGPickett, I guess to preserve the 0, this would work too:
Code:
sed 's/^\([^|]*|\)00*\([1-9]\)/\1\2/' infile


Last edited by Scrutinizer; 10-04-2010 at 04:06 PM..
# 12  
Old 10-04-2010
Quote:
Originally Posted by DGPickett
A minor trick for more speed -- no sub if no zero:

Code:
sed 's/^\([^|]*|\)00*/\1/' infile

I'm curious. Did you actually benchmark that solution versus a simpler one like mine?

Quote:
Originally Posted by alister
Code:
sed 's/|0*/|/'

What little might be gained from avoiding some substitutions could be lost due to the more complicated regular expression which now must evaluate a character class and capture and in the case of substitutions refer to a backreference. Additionally, if most of the data does require substition, your more complicated approach could be further slowed.

My instincts tell me there's not much to be gained, but, naturally, I could be wrong. Still, the improvement would have to be non-trivial for me to discard the simpler, more readable solution.

Regards,
Alister
# 13  
Old 10-04-2010
I also suspect there there is little to be gained speedwise, however I think the preservation of zero values is a good point.
# 14  
Old 10-04-2010
Hi, DGPickett:

My curiosity got the better of me. I created two files with a million lines each. One file consists of lines that never require any substitution. The other of lines that always require substitution. I then tested the solutions on each.

Code:
$ jot -w '2010-01-01|123|1|1000|2000|500|1500|600|' 1000000 > data-without-0
$ jot -w '2010-01-01|0123|1|1000|2000|500|1500|600|' 1000000 > data-with-0
$ wc -l data*; ls -lh data*
 1000000 data-with-0
 1000000 data-without-0
 2000000 total
-rw-r--r--   1 xxxxxx  xxxxxx       45M Oct  4 16:18 data-with-0
-rw-r--r--   1 xxxxxx  xxxxxx       44M Oct  4 16:17 data-without-0


No substitution necessary:
Code:
$ time sed 's/|0*/|/' data-without-0 > /dev/null

real    0m2.006s
user    0m1.898s
sys     0m0.072s
$ time sed 's/^\([^|]*|\)00*/\1/' data-without-0 > /dev/null

real    0m0.942s
user    0m0.863s
sys     0m0.066s


Substitution necessary:
Code:
$ time sed 's/|0*/|/' data-with-0 > /dev/null

real    0m2.136s
user    0m2.031s
sys     0m0.077s
$ time sed 's/^\([^|]*|\)00*/\1/' data-with-0 > /dev/null

real    0m12.654s
user    0m12.320s
sys     0m0.137s

While the more complicated solution shows some improvement when no substition is required at all, about 1 second per million lines, it exhibits a much larger degration if substitution is required by all lines. Based on my brief testing (insert all the usual caveats about benchmarking here Smilie), I would not choose your approach unless the data set is massive AND there are few lines within it requiring modification.

Regards,
Alister
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

[Solved] How remove leading whitespace from xml (sed /awk?)

Hi again I have an xml file and want to remove the leading white space as it causes me issues later in my script I see sed is possible but cant seem to get it to work I tried sed 's/^ *//' file.xml output <xn:VsDataContainer id="1U104799" modifier="update"> ... (10 Replies)
Discussion started by: aniquebmx
10 Replies

2. Shell Programming and Scripting

sed to work on 2nd field only

I have a requirement to replace "\" with "/" in only the 2nd field of the input file which has 2 fields. The field delimiter is "|" Sample records from input file: 1\23|\tmp\user mn\wer|\home\temp Expected output: 1\23|/tmp/user mn\wer|/home/temp I used sed 's/\\/\//g' ... (2 Replies)
Discussion started by: krishmaths
2 Replies

3. Shell Programming and Scripting

Pad zeroes first field in a Delimited file

Need help. I tried using an awk command to pad zeroes. Unfortunately, the "|" pipe delimited character is gone when I tried to write the records to another file. awk -F \| ' {$1=sprintf("%06s", $1); print $0}' $CUSTFINAL2 > $CUSTFINAL3 BEFORE "KEYRECORD"|"SA ID"|"PER ID"|"SP ID"|"ACCT... (3 Replies)
Discussion started by: johnhips
3 Replies

4. Shell Programming and Scripting

awk and leading zeroes

I have the following script that renames filenames like: blah_bleh_91_2011-09-26_00.05.43AM.xls and transforms it in: 91_20110926_000543_3_blih.xls for a in *.xls; do b="$(echo "${a}" | cut -d '_' -f4)" dia=`echo ${b} | cut -c9-10` mes=`echo ${b} | cut -c6-7` anio=`echo ${b} | cut -c1-4`... (4 Replies)
Discussion started by: Tr0cken
4 Replies

5. Shell Programming and Scripting

sed not removing leading zeroes

I have th following file 0000000011 0000000001 0000000231 0000000001 0000000022 noow when i run the following command sed 's/^0+//g' file name I receive the same output and the leading zeroes are not removed from the file . Please let me know how to achieve... (4 Replies)
Discussion started by: asalman.qazi
4 Replies

6. Shell Programming and Scripting

insert leading zeroes based on the character count

Hi, I need add leading zeroes to a field in a file based on the character count. The field can be of 1 character to 6 character length. I need to make the field 14bytes. eg: 8351,20,1 8351,234,6 8351,2,0 8351,1234,2 8351,123456,1 8351,12345,2 This should become. ... (3 Replies)
Discussion started by: gpaulose
3 Replies

7. Shell Programming and Scripting

sed over writes my original file (using sed to remove leading spaces)

Hello and thx for reading this I'm using sed to remove only the leading spaces in a file bash-280R# cat foofile some text some text some text some text some text bash-280R# bash-280R# sed 's/^ *//' foofile > foofile.use bash-280R# cat foofile.use some text some text some text... (6 Replies)
Discussion started by: laser
6 Replies

8. Programming

how to check and remove leading zeroes from the buffer using c program

Helo , I m writing small module of c.on RHEL 4 I have one buffer (for e.g. buffer = "002" now I want to check whethere buffer contains leading zeroes and if it contains leading zeroes then I want to remove all leading zeroes ( i.e. if buffer = "002" then I want to make buffer = "2") how... (1 Reply)
Discussion started by: amitpansuria
1 Replies

9. Shell Programming and Scripting

Add leading zeroes to numbers in a file

Hello, I am (trying) to write a script that will check to see how many users are logged on to my machine, and if that number is more than 60 I need to kill off all the oldest sessions that are over 60. So far I have been able to check how many users are on and now I am at the part where I have to... (3 Replies)
Discussion started by: raidzero
3 Replies

10. Shell Programming and Scripting

How to trim the leading zeroes in a Currency field ?

How do I trim the leading zeroes, and (+,-) in the currency field ? I have a text file. Your bill of +00002780.96 for a/c no. 25287324 is due on 11-06. Your bill of +00422270.48 for a/c no. 28931373 is due on 11-06. I want the O/P file to be like. Your bill of 2780.96 for a/c no. 25287324... (22 Replies)
Discussion started by: Amruta Pitkar
22 Replies
Login or Register to Ask a Question