Printf padded string


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Printf padded string
# 22  
Old 09-14-2015
Quote:
Originally Posted by yifangt
Thanks Corona688! This what I tried by combining Don's reply.
Code:
awk '
NUMBER=substr($0, match($0, /[[:digit:]]*$/))
PREFIX=substr($0, 1, RSTART - 1)
LEN=8
PRELEN="${#PREFIX}"
DIGITS=$((LEN - PRELEN))
{printf "%s%0${DIGITS}d\n", $PREFIX, $NUMBER} ' < test.file

but did not work. What did I miss?
Following your attempt a little closer than the way Corona688 did it...

This is an awk script, not a shell script; so $ before a variable name references the field named by contents of the variable instead of the contents of the variable AND awk variables are not expanded in quoted strings AND ${#var} and $((expr)) are shell expressions that are not valid awk expressions.

If what you are trying to do is produce 8 character output strings (assuming the length of the input string is never more than 8 characters) with varying length alphabetic and numeric parts in the input, try:
Code:
awk -v LEN=8 '
{	NUMBER = substr($0, match($0, /[[:digit:]]*$/))
	PREFIX = substr($0, 1, PRELEN = (RSTART - 1))
	DIGITS = LEN - PRELEN
	printf "%s%0*d\n", PREFIX, DIGITS, NUMBER
}' test.file

Assuming test.file contains:
Code:
S1
S2  
S12  
S21  
sk1  
sk12  
sk321  
sk1344

as shown in post #18 in this thread, it produces the output:
Code:
S0000001
S2  0000
S12  000
S21  000
sk1  000
sk12  00
sk321  0
sk001344

Remember the input file format I specified in post #10 in this thread. Your sample input file has trailing spaces (violating item #2: Each input string is an alphanumeric string ending in one or more decimal digits.)

If we remove all of the trailing spaces from file.test or change all occurrences of $0 in the above script to $1 (so we just look at the first field instead of the entire line), the output produced is:
Code:
S0000001
S0000002
S0000012
S0000021
sk000001
sk000012
sk000321
sk001344

which I assume is closer to what you were trying to do.

As a learning exercise, can you explain why the awk script Corona688 suggested didn't have a problem with trailing spaces while my script above does have a problem with trailing spaces?

And, no, I'm not a lawyer. But I did like the Perry Mason, Matlock, and Boston Legal TV series. Smilie And, in my last job, my boss referred to me as his standards lawyer because he could get me to answer any questions about why we were failing POSIX/UNIX standards conformance tests and directions on how to fix our code (when we had a bug) or how to file a bug report against the test suite (when there was a bug in the test suite or it was assuming behavior above and beyond what the standards require). Smilie
# 23  
Old 09-14-2015
Thanks Don!
can you explain why the awk script Corona688 suggested didn't have a problem with trailing spaces while my script above does...?Not quite sure, nor ever thought about it, but my guess is:
In Crona688's reply, sub(/^[^0-9]*/,"",A)removed all the trailing spaces, ---No, this only removes the leading chars!!!
Do the printf format modifiers make any difference?
Yours is
Code:
printf "%s%0*d\n", PREFIX, DIGITS, NUMBER

vs
Code:
printf("%s%0" 8-N "d\n", substr($0,1,N), A);

This is related to my original un-clear that I want clarify.
By the way, thanks for your legal story. It is interesting.

Last edited by yifangt; 09-14-2015 at 05:06 PM..
# 24  
Old 09-14-2015
Quote:
Originally Posted by yifangt
Thanks Don!
As a learning exercise, can you explain why the awk script Corona688 suggested didn't have a problem with trailing spaces while my script above does have a problem with trailing spaces?Not quite sure, but my guess is:
In Crona688's reply, sub(/^[^0-9]*/,"",A)removed all the trailing spaces, but yours did not substr($0, match($0, /[[:digit:]]*$/)) Correct?
Do the printf format modifiers make any difference?
Yours is
Code:
printf "%s%0*d\n", PREFIX, DIGITS, NUMBER

vs
Code:
printf("%s%0" 8-N "d\n", substr($0,1,N), A);

This is my original un-clear that I want clarify.
No.
Code:
        sub(/^[^0-9]*/,"",A) # Delete prefix

deletes everything from the start of the string that is not a decimal digit.
Code:
        N=length($0)-length(A) # Measure length of prefix from this

then computes the length of the alphabetic part of your input as the original line length minus the line length of the input with the characters that are not decimal digits at the start of your input removed. With the input line "S2<space><space>", N is set to 1, i.e., 4 (4 input characters) - 3 (the length of "2<space><space>" after removing the leading "S").
Code:
        printf("%s%0" 8-N "d\n", substr($0,1,N), A)

and this works because the substr() extracts the 1st character from the original input and prints it using the format %s and prints "2<space><space>" using the format %07d which prints the 7 (or more) digit leading zero filled number specified by A (and the %d format specifier ignores anything in the string it evaluates starting with the 1st character that is not part of a valid numeric value).

My code looks for trailing digits to determine the numeric part of your input (allowing other digits to appear elsewhere in the prefix). When there aren't any trailing decimal digits:
Code:
	NUMBER = substr($0, match($0, /[[:digit:]]*$/))

saves the trailing decimal digits in the variable NUMBER and the call to match has the side effect of setting RSTART to the offset in $0 where the first decimal digit was found (zero if not match was found) and setting RLENGTH to the number of decimal digits found at the end of the $0 (-1 if no match is found).
Code:
	PREFIX = substr($0, 1, PRELEN = (RSTART - 1))

sets PREFIX to the prefix (your alphabetic part, but this will take use the longest string at the start of the line that does not end in a decimal digit). And it sets PRELEN to the length of that string.
Code:
	DIGITS = LEN - PRELEN

sets DIGITS to the length of the string you want (8) minus the length of PREFIX (PRELEN).
Code:
	printf "%s%0*d\n", PREFIX, DIGITS, NUMBER

and here we print the PREFIX saved above and uses the same %0xd format to print the decimal digits found at the end of your input.

And, with the input "S2<space><space>", what we find if we look closely is that the awk match() function on Mac OS X (from BSD) does not conform to the standards. When no match is found RSTART should be set to zero, but instead it is being set to the length of the input string plus one. So, the almost reasonable output I showed you in post #22 for the input with trailing spaces is not what a standards-conforming awk should do. (I just love it when I find conformance bugs in UNIX implementations when I'm trying to explain how things should work! Smilie ) So, now I need to check to see what other implementations do to determine if this is a bug in the standards or a bug in BSD/Apple awk. Are we having fun yet...

Update: Please ignore the grayed out paragraph above... I obviously hadn't had enough sleep when I wrote it. I'll post an update later today explaining correctly how Apple/BSD awk is doing exactly what it is supposed to be doing with the input string "S2<space><space>"... I apologize for any confusion this may have caused. Smilie

Last edited by Don Cragun; 09-15-2015 at 03:09 PM..
This User Gave Thanks to Don Cragun For This Post:
# 25  
Old 09-14-2015
Thanks Don!
My understanding of the code by Corona688 is the same as your explanation. The "you-called-bug" part is too complicated for me at this moment. I have met this problem before, but never thought about it your way, as not able to. It is real good point when dealing with lines with trailing space(s).
# 26  
Old 09-15-2015
Here is the correction to the last paragraph I wrote in post #18...

My code looks for trailing digits to determine the numeric part of your input (allowing other digits to appear elsewhere in the prefix). When there aren't any trailing decimal digits (as with an input line containing "S2<space><space>"), you don't get what you wanted because:
Code:
	NUMBER = substr($0, match($0, /[[:digit:]]*$/))

successfully matches zero decimal digits at the end of the string and saves an empty string in the variable NUMBER and the call to match has the side effect of setting RSTART to the offset in $0 where the first of zero decimal digits was found (5 being the address of the null byte terminating the string) and setting RLENGTH to the number of decimal digits found at the end of the $0 (0 in this case). (If the ERE had been [[:digit:]]+$ which matches one or more decimal digits at the end of the string instead of [[:digit:]]*$ which matches zero of more decimal digits at the end of the string, different problems would arise. This code was designed to work under several assumption including:
Quote:
Each input string is an alphanumeric string ending in one or more decimal digits.
Code:
	PREFIX = substr($0, 1, PRELEN = (RSTART - 1))

sets PREFIX to the prefix (your alphabetic part, but this will use the longest string at the start of the line that does not end in a decimal digit). And it sets PRELEN to the length of that string (4 = 5 - 1 in this case).
Code:
	DIGITS = LEN - PRELEN

sets DIGITS to the length of the string you want (8) minus the length of PREFIX (PRELEN), which in this case is 4 (8 - 4).
Code:
	printf "%s%0*d\n", PREFIX, DIGITS, NUMBER

and here we print the PREFIX saved above and uses the same %04d format to print the decimal digits found at the end of your input. And, since there weren't any digits at the end of the string and the awk printf %d specifier treats an empty string as the numeric value zero, it prints the entire input string as the prefix followed by four zeros:
Code:
S2  0000

If I was writing production code, I would test that the input meets the stated assumptions and produce a diagnostic message for that input line instead of producing garbage output from garbage input. If anyone wants to turn sample code provided by the volunteers here at The UNIX & Linux Forums into production code, verifying that input meets the stated input requirements is part of that task. (Note that I stated the input and output assumptions used by my script in post #10 in this thread. When the input meets all of the input assumptions, it produces the desired output. If one or more of those assumptions are not met, I make no claims about the output produced by my sample code as a result.)
This User Gave Thanks to Don Cragun For This Post:
# 27  
Old 09-15-2015
Thanks a lot!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Printf or any other method to put long string of spec characters - passing passwords

Hello, I am looking for a method to use in my bash script which allows me to use long strings with all special characters. I have found that printf method could be helpful for me but unfortunately, when I trying root@machine:~# tevar=`printf "%s%c"... (2 Replies)
Discussion started by: elxa1
2 Replies

2. Shell Programming and Scripting

Unable to match string within awk printf

Hi All I am working to process txt file into csv commo separated. Input.txt 1,2,asdf,34sdsd,120,haahha2 2,2,wewedf,45sdsd,130,haahha ..... .... Errorcode.txt 120 130 140 myawk.awk code: { BEGIN{ HEADER="f1,f2,f3,f4,f5,f6" (4 Replies)
Discussion started by: krsnadasa
4 Replies

3. Shell Programming and Scripting

Printf question: getting padded zero in decimal plus floating point together.

Hi Experts, Quick question: I am trying to get the output with decimal and floating point but not working: echo "20.03" | awk '{printf "%03d.2f\n" , $0 }' 020.2f How to get the output as : 020.03 Thank you. (4 Replies)
Discussion started by: rveri
4 Replies

4. Shell Programming and Scripting

How to print a string using printf?

I want to print a string say "str1 str2 str3 str4" using printf. If I try printing it using printf it is printing as follows. output ------- str1 str2 str3 str4 btw I'm working in AIX. This is my first post in this forum :) regards, rakesh (4 Replies)
Discussion started by: enigmatrix
4 Replies

5. Shell Programming and Scripting

String formatting using awk printf

Hi Friends, I am trying to insert lines of the below format in a file: # x3a4914 Joe 2010/04/07 # seh Lane 2010/04/07 # IN01379 Larry 2010/04/07 I am formatting the strings as follows using awk printf: awk 'printf "# %s %9s %18s\n", $2,$3,$4}' ... (2 Replies)
Discussion started by: sugan
2 Replies

6. Shell Programming and Scripting

Explanation for printf string in awk

hi all can any one help me to understand this bdf -t vfxs | awk '/\//{printf("%-30s%-10s%-10s%-10s%-5s%-10s\n",$1,$2,$3,$4,$5,$6)}' i want to understand the numbers %-30S% (4 Replies)
Discussion started by: maxim42
4 Replies

7. Shell Programming and Scripting

Help formatting a string. Something like printf?

Hi I'm having a problem with converting a file: ID X 1 7 1 8 1 3 2 5 2 7 2 2 To something like this: ID X1 X2 X3 1 7 8 3 2 5 7 2 I've tried the following loop: for i in `cat tst.csv| awk -F "," '{print $1}'| uniq`;do grep -h $i... (4 Replies)
Discussion started by: flotsam
4 Replies

8. Shell Programming and Scripting

printf with Character String

I am trying to use printf with a character string that is used within a do loop. The problem is that while in the loop, the printf prints the variable name instead of the value. The do loop calls the variable name from a text file (called device.txt): while read device do cat $device.clean... (2 Replies)
Discussion started by: dleblanc67
2 Replies

9. Shell Programming and Scripting

awk printf formatting using string format specifier.

Hi all, My simple AWK code does C = A - B If C can be a negative number, how awk printf formating handles it using string format specifier. Thanks in advance Kanu :confused: (9 Replies)
Discussion started by: kanu_pathak
9 Replies

10. Shell Programming and Scripting

find: problems escaping printf-command string

Hi Folks! Can you help me with this find -printf command. I seem to be unable to execute the printf-command from my shell script. I'm confused: :confused: My shell script snippet looks like this: #!/bin/sh .. COMMAND="find ./* -printf '%p %m %s %u %g \n'" echo "Command: ${COMMAND}"... (1 Reply)
Discussion started by: grahamb
1 Replies
Login or Register to Ask a Question