Align number to same length by adding "0"


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Align number to same length by adding "0"
# 1  
Old 06-25-2012
Align number to same length by adding "0"

Hello,
I want to add "0" to the number part of each string to make them equal length for sorting. The challenge to me is the number part is in the middle of the string so that CP1_Items are behind CP19_Items as underscore "_" is bigger than number. My string structure is quite formatted with CP[0-9]{1,2}_Items.
Input file:
Code:
S00092F     CP10_Items 1 
S000936     CP11_Items 1 
S000935     CP12_Items 1 
S00092D     CP13_Items 2 
S00093A     CP14_Items 1 
S00093F     CP15_Items 1 
S000931     CP16_Items 1 
S000934     CP17_Items 1 
S000930     CP18_Items 1 
S000938     CP19_Items 1 
S000950     CP1_Items 2 
S000954     CP20_Items 3 
S000932     CP21_Items 1 
S00093D     CP22_Items 1 
S00095D     CP23_Items 3 
S000965     CP24_Items 3 
S00093C     CP2_Items 1 
S00092C     CP3_Items 1 
S000937     CP4_Items 1 
S000933     CP5_Items 1 
S00092E     CP6_Items 1 
S00093B     CP7_Items 1 
S00093E     CP8_Items 1 
S000939     CP9_Items 1

Output file:
Code:
S000950     CP01_Items 2 
S00093C     CP02_Items 1 
S00092C     CP03_Items 1 
S000937     CP04_Items 1 
S000933     CP05_Items 1 
S00092E     CP06_Items 1 
S00093B     CP07_Items 1 
S00093E     CP08_Items 1 
S000939     CP09_Items 1 
S00092F     CP10_Items 1 
S000936     CP11_Items 1 
S000935     CP12_Items 1 
S00092D     CP13_Items 1 
S00093A     CP14_Items 1 
S00093F     CP15_Items 1 
S000931     CP16_Items 1 
S000934     CP17_Items 1 
S000930     CP18_Items 1 
S000938     CP19_Items 1 
S000954     CP20_Items 3 
S000932     CP21_Items 1 
S00093D     CP22_Items 1 
S00095D     CP23_Items 3 
S000965     CP24_Items 3

This is quite common for me, sometime there are three or four digits for the numbers. Say I want change CP1_Items to CP001_Items, and CP10_Items to CP010_Items, etc. So that they can be aligned nicely and sorted first by prefix character then by number, i.e. the value of the number part, not number string!.
I thought of back reference again, but could not figure it out by myself. What is the trick for this type of substitution? Thanks a lot! YT
# 2  
Old 06-25-2012
Code:
perl -pe 's/CP(\d)_/CP0$1_/' file

This User Gave Thanks to bartus11 For This Post:
# 3  
Old 06-25-2012
You could sort straight away with:
Code:
sort -t_ -k1.15,1n

if the space between column 1 and column 2 consists of 5 spaces, or
Code:
sort -t_ -k1.11,1n

if the space between column 1 and column 2 consists of a single TAB.

Output:

Code:
S000950     CP1_Items 2 
S00093C     CP2_Items 1 
S00092C     CP3_Items 1 
S000937     CP4_Items 1 
S000933     CP5_Items 1 
S00092E     CP6_Items 1 
S00093B     CP7_Items 1 
S00093E     CP8_Items 1 
S000939     CP9_Items 1
S00092F     CP10_Items 1 
S000936     CP11_Items 1 
S000935     CP12_Items 1 
S00092D     CP13_Items 2 
S00093A     CP14_Items 1 
S00093F     CP15_Items 1 
S000931     CP16_Items 1 
S000934     CP17_Items 1 
S000930     CP18_Items 1 
S000938     CP19_Items 1 
S000954     CP20_Items 3 
S000932     CP21_Items 1 
S00093D     CP22_Items 1 
S00095D     CP23_Items 3 
S000965     CP24_Items 3


---
To make it 4 digits, try this:
Code:
sed 's/[0-9]*_/00000&/; s/00*\(.\{4\}\)_/\1_/' infile

or
Code:
awk -F'CP|_' '{printf "%sCP%04d_%s\n",$1,$2,$3}' infile


Last edited by Scrutinizer; 06-25-2012 at 01:23 PM..
This User Gave Thanks to Scrutinizer For This Post:
# 4  
Old 06-25-2012
increment by 1 by alway keep two digits.

Thanks bartus, that's what I meant!
and thanks Scrutinizer! Your answer is very detailed, although too comprehensive for me!
Actually the purpose is to do my next loop with increment 1 from 01 to 24, i.e. CP01..CP24, then PP01~PP19 and RP01~RP16. Totally there are 7296 permutations. I did not anticipate this problem until I come across the different file names.
Can I ask another question about increment from 01 to 99 (i.e. 01, 02 ~ 99) by 1 each time in bash/awk?
Thanks a lot again!
# 5  
Old 06-25-2012
I made it comprehensive, because of this passage:
Code:
This is quite common for me, sometime there are three or four digits for the numbers. Say I want change CP1_Items to CP001_Items, and CP10_Items to CP010_Items, etc...

You can change the 4 in the two examples to 2 or 3 or 5 for example to get different 0-padded number widths...

--
Are you trying to enumerate files that are present in a directory?

--
To enumerate in bash
Code:
printf "CP%02d\n" {1..99}

Code:
for i in {1..99}; do
  printf printf "CP%02d\n" $i
done

done

Last edited by Scrutinizer; 06-25-2012 at 04:25 PM..
# 6  
Old 06-25-2012
increment by 1 by alway keeping two digits.

Yes, I want loop thru the directory, and the two-digits numbers are only part of the each file name. Feel need both looping the file names and regexpr to do the job. That's why I want to sort the two digits problem first, then looping the files.
Say, I need to create files according to three dimensions: Firs is Table, then column and row of each Table. I want
Code:
File_110101: the result of column 1 and row 1 of Table11.

If I have File_1111 I could not distinguish from
Code:
 column 11 and row 1 of Table 1; or, column 1, row 11 of table 1; or column 1, row 1 of Table 11

etc.
So that if I could always use the two digits at the beginning of my BASH script, this problem can be avoided.
I tried search the similar thing, only find hex format examples.
Then, how should I embed your
Code:
 pringf CP%d02

to my bash script?
Thanks a lot again!

Last edited by yifangt; 06-25-2012 at 05:00 PM.. Reason: More explanation
# 7  
Old 06-25-2012
Couldn't you cd to the directory and use:

Code:
printf "%s\n" CP[0-9][0-9] PP[0-9][0-9] RP[0-9][0-9]

or

Code:
for f in CP[0-9][0-9] PP[0-9][0-9] RP[0-9][0-9]
do
  printf "%s\n" "$f"
done

This User Gave Thanks to Scrutinizer For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. AIX

Apache 2.4 directory cannot display "Last modified" "Size" "Description"

Hi 2 all, i have had AIX 7.2 :/# /usr/IBMAHS/bin/apachectl -v Server version: Apache/2.4.12 (Unix) Server built: May 25 2015 04:58:27 :/#:/# /usr/IBMAHS/bin/apachectl -M Loaded Modules: core_module (static) so_module (static) http_module (static) mpm_worker_module (static) ... (3 Replies)
Discussion started by: penchev
3 Replies

2. Shell Programming and Scripting

Bash script - Print an ascii file using specific font "Latin Modern Mono 12" "regular" "9"

Hello. System : opensuse leap 42.3 I have a bash script that build a text file. I would like the last command doing : print_cmd -o page-left=43 -o page-right=22 -o page-top=28 -o page-bottom=43 -o font=LatinModernMono12:regular:9 some_file.txt where : print_cmd ::= some printing... (1 Reply)
Discussion started by: jcdole
1 Replies

3. UNIX for Dummies Questions & Answers

Using "mailx" command to read "to" and "cc" email addreses from input file

How to use "mailx" command to do e-mail reading the input file containing email address, where column 1 has name and column 2 containing “To” e-mail address and column 3 contains “cc” e-mail address to include with same email. Sample input file, email.txt Below is an sample code where... (2 Replies)
Discussion started by: asjaiswal
2 Replies

4. Shell Programming and Scripting

Filter file by length, looking only at lines that don't begin with ">"

I have a file that stores data in pairs of lines, following this format: line 1: header (preceded by ">") line 2: sequence Example.txt: >seq1 name GATTGATGTTTGAGTTTTGGTTTTT >seq2 name TTTTCTTC I want to filter out the sequences and corresponding headers for all sequences that are less... (2 Replies)
Discussion started by: pathunkathunk
2 Replies

5. Post Here to Contact Site Administrators and Moderators

Suggestion: adding two new groups "sed" and "awk"

Majority of the questions are pertaining file/string parsing w.r.t sed or awk It would be nice to have these two as their own sub category under shell-programming-scripting which can avoid lot of duplicate posts. (1 Reply)
Discussion started by: jville
1 Replies

6. Shell Programming and Scripting

HPUX "bdf" , "%" align to right side.

Hi All, Need you help. I have HPUX “bdf” output, I need % to be align to right side. if you see there are long file systems lv so i cant do column formatting. Any idea or best way to align the "%" to be right side. /dev/emcvg02/lv01 52428800 29931 49123947 0% /abc/disco/iasbin... (7 Replies)
Discussion started by: ashanabey
7 Replies

7. Shell Programming and Scripting

awk command to replace ";" with "|" and ""|" at diferent places in line of file

Hi, I have line in input file as below: 3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL My expected output for line in the file must be : "1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL" Can someone... (7 Replies)
Discussion started by: shis100
7 Replies

8. Shell Programming and Scripting

perl file, one line code include "length, rindex, substr", slow

Hi Everyone, # cat a.txt a;b;c;64O a;b;c;d;ee;f # cat a.pl #!/usr/bin/perl use strict; use warnings; my $tmp3 = ",,a,,b,,c,,d,,e,,f,,"; open(my $FA, "a.txt") or die "$!"; while(<$FA>) { chomp; my @tmp=split(/\;/, $_); if ( ($tmp =~ m/^(64O)/i) || ($tmp... (3 Replies)
Discussion started by: jimmy_y
3 Replies

9. UNIX for Dummies Questions & Answers

Explain the line "mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'`"

Hi Friends, Can any of you explain me about the below line of code? mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'` Im not able to understand, what exactly it is doing :confused: Any help would be useful for me. Lokesha (4 Replies)
Discussion started by: Lokesha
4 Replies
Login or Register to Ask a Question