Remove leading zeroes in 2nd field using sed


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove leading zeroes in 2nd field using sed
# 15  
Old 10-04-2010
Your solution is very simple and ingenuous, and you get the terse and simple award.

My alternative was annotated that, for big data sets 0* may be slower because it always hits and a hit seems to have extra cost to copy or move parts of the buffer. Some regex have metachar + just to facilitate situations like 00*. If you add a second zero so it only triggers occasionally, then you have to anchor to ^, which is also a cost. The costs definitely vary with data, and probably vary with different sed versions and systems.

Your solution is not as extensibly general, as a tutorial, being only good for the second field.

It is good to be aware of the alternatives and tradeoffs.

It is wise to benchmark when the run time rises.
# 16  
Old 10-04-2010
I think alister's solution is extensible though. For example, this would take care of the 3rd column:
Code:
sed 's/|0*/|/2' infile

and this of the 4th
Code:
sed 's/|0*/|/3' infile

and so forth
# 17  
Old 10-04-2010
Re: sed 's/^\([^|]*|\)00*\([1-9]\)/\1\2/' infile

You assume there is always a non-zero digit after the zeros. Preserving just the low order digit for |00000| but clearing just leading digits for |01020| and allowing variable field width as well takes a more complex regex.

Putting in commas takes looping or repetition. Smilie
# 18  
Old 10-04-2010
Fair enough, I did not think of 0000 for instance, but then this would do, wouldn't it?:
Code:
sed 's/^\([^|]*|\)00*\([0-9]\)/\1\2/' infile

# 19  
Old 10-04-2010
I am overwhelmingly thrilled that you did the benchmark. So few read the manual or, better yet, write up something and try it. The sed man page for UNIX SVR3 lied about how a greedy wild card worked, which I immediately saw and my office mate was amazed was a correct anticipation!

In the internet age, you cannot say 'whatever', you have to Google and quote something.

A real developer writes something and tries it, because man pages lie, or are vague, or are talking about something else. And sometimes we have people hip-shooting their wiki-ignorance.

Data can make a big difference. Trying to remove spaces leading '| *' is faster then '| *' because a delimited file usually has many columns. Removing trailing spaces ' *|' and even ' *|' is slower because of the huge number of spaces, so I always do leading first (to remove spaces in the empty fields), and sometimes pipe many sed together to lighten the load. Finally, I wrote a C utility, all state varialbes and getchar/putchar(), to do the really big sets really fast, because I was way past 2 megs. Even at 2 megs, the cache and VM hits make a big difference. And to think the first H200 came with 2 or 4K ram -- how time flies.

---------- Post updated at 05:07 PM ---------- Previous update was at 05:03 PM ----------

Quote:
Originally Posted by Scrutinizer
Fair enough, I did not think of 0000 for instance, but then this would do, wouldn't it?:
Code:
sed 's/^\([^|]*|\)00*\([0-9]\)/\1\2/' infile


Yes, I was scratchng around for that one, the 'save any first digit after the zeros' thing. It is that greedy wildcard and left to right that makes it work, so it feels too loose, but it goes!

---------- Post updated at 05:13 PM ---------- Previous update was at 05:07 PM ----------

Quote:
Originally Posted by Scrutinizer
I think alister's solution is extensible though. For example, this would take care of the 3rd column:
Code:
sed 's/|0*/|/2' infile

and this of the 4th
Code:
sed 's/|0*/|/3' infile

and so forth

Good point, I have neglected the trailing number not g case, as it only allows you access to one column, and so I have yet to use it in the real world. If sed had better delimited field stuff, without becoming awk, it would be great! The limit of 99 in \{99\} is a pain, too.

Speed in sed is so good, it is really a tractor trailer of a tool!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

[Solved] How remove leading whitespace from xml (sed /awk?)

Hi again I have an xml file and want to remove the leading white space as it causes me issues later in my script I see sed is possible but cant seem to get it to work I tried sed 's/^ *//' file.xml output <xn:VsDataContainer id="1U104799" modifier="update"> ... (10 Replies)
Discussion started by: aniquebmx
10 Replies

2. Shell Programming and Scripting

sed to work on 2nd field only

I have a requirement to replace "\" with "/" in only the 2nd field of the input file which has 2 fields. The field delimiter is "|" Sample records from input file: 1\23|\tmp\user mn\wer|\home\temp Expected output: 1\23|/tmp/user mn\wer|/home/temp I used sed 's/\\/\//g' ... (2 Replies)
Discussion started by: krishmaths
2 Replies

3. Shell Programming and Scripting

Pad zeroes first field in a Delimited file

Need help. I tried using an awk command to pad zeroes. Unfortunately, the "|" pipe delimited character is gone when I tried to write the records to another file. awk -F \| ' {$1=sprintf("%06s", $1); print $0}' $CUSTFINAL2 > $CUSTFINAL3 BEFORE "KEYRECORD"|"SA ID"|"PER ID"|"SP ID"|"ACCT... (3 Replies)
Discussion started by: johnhips
3 Replies

4. Shell Programming and Scripting

awk and leading zeroes

I have the following script that renames filenames like: blah_bleh_91_2011-09-26_00.05.43AM.xls and transforms it in: 91_20110926_000543_3_blih.xls for a in *.xls; do b="$(echo "${a}" | cut -d '_' -f4)" dia=`echo ${b} | cut -c9-10` mes=`echo ${b} | cut -c6-7` anio=`echo ${b} | cut -c1-4`... (4 Replies)
Discussion started by: Tr0cken
4 Replies

5. Shell Programming and Scripting

sed not removing leading zeroes

I have th following file 0000000011 0000000001 0000000231 0000000001 0000000022 noow when i run the following command sed 's/^0+//g' file name I receive the same output and the leading zeroes are not removed from the file . Please let me know how to achieve... (4 Replies)
Discussion started by: asalman.qazi
4 Replies

6. Shell Programming and Scripting

insert leading zeroes based on the character count

Hi, I need add leading zeroes to a field in a file based on the character count. The field can be of 1 character to 6 character length. I need to make the field 14bytes. eg: 8351,20,1 8351,234,6 8351,2,0 8351,1234,2 8351,123456,1 8351,12345,2 This should become. ... (3 Replies)
Discussion started by: gpaulose
3 Replies

7. Shell Programming and Scripting

sed over writes my original file (using sed to remove leading spaces)

Hello and thx for reading this I'm using sed to remove only the leading spaces in a file bash-280R# cat foofile some text some text some text some text some text bash-280R# bash-280R# sed 's/^ *//' foofile > foofile.use bash-280R# cat foofile.use some text some text some text... (6 Replies)
Discussion started by: laser
6 Replies

8. Programming

how to check and remove leading zeroes from the buffer using c program

Helo , I m writing small module of c.on RHEL 4 I have one buffer (for e.g. buffer = "002" now I want to check whethere buffer contains leading zeroes and if it contains leading zeroes then I want to remove all leading zeroes ( i.e. if buffer = "002" then I want to make buffer = "2") how... (1 Reply)
Discussion started by: amitpansuria
1 Replies

9. Shell Programming and Scripting

Add leading zeroes to numbers in a file

Hello, I am (trying) to write a script that will check to see how many users are logged on to my machine, and if that number is more than 60 I need to kill off all the oldest sessions that are over 60. So far I have been able to check how many users are on and now I am at the part where I have to... (3 Replies)
Discussion started by: raidzero
3 Replies

10. Shell Programming and Scripting

How to trim the leading zeroes in a Currency field ?

How do I trim the leading zeroes, and (+,-) in the currency field ? I have a text file. Your bill of +00002780.96 for a/c no. 25287324 is due on 11-06. Your bill of +00422270.48 for a/c no. 28931373 is due on 11-06. I want the O/P file to be like. Your bill of 2780.96 for a/c no. 25287324... (22 Replies)
Discussion started by: Amruta Pitkar
22 Replies
Login or Register to Ask a Question