suffix a sequence in awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting suffix a sequence in awk
# 1  
Old 05-08-2010
suffix a sequence in awk

hi

I have a string pattern like

Code:
...
...
000446448742    00432265               040520100408 21974435      DEWSWATER GARRIER AAG IK4000            N 017500180000000000000000077000000000100
000446448742    00580937               040520100408 32083576      PEWSWATER BARRIER DAG GK4000            N 017500180000000000000000077000000000100
000446448742    00580937               040520100408 32083576      PEWSWATER BARRIER DAG GK4000            N 017509900000000000000000077000000000100
000446448742    00543376               040520100408 43194667      KEWSWATER FARRIER NAG HK4000            N 017500180000000000000000077000000000100
...
...

I am trying to use an awk code that will search every line and for a given
Code:
substr($0,17,8)

value compute SUM as sum of corresponding
Code:
substr($0,114,6)

.

And if this SUM exceeds one million, then add an alphabetical suffix but ensure that its overall length is no greater than eight characters. So that the above data transforms to

Code:
...
...
000446448742    00432265               040520100408 21974435      DEWSWATER GARRIER AAG IK4000            N 017500180000000000000000077000000000100
000446448742    0580937A               040520100408 32083576      PEWSWATER BARRIER DAG GK4000            N 017500180000000000000000077000000000100
000446448742    0580937B               040520100408 32083576      PEWSWATER BARRIER DAG GK4000            N 017500180000000000000000077000000000100
000446448742    00543376               040520100408 43194667      KEWSWATER FARRIER NAG HK4000            N 017509900000000000000000077000000000100
...
...

I have written a code in UNIX awk like

Code:
awk -v RECREATE_FILE=re.txt 'BEGIN {SUFFIX="A"};
{
SHIP_NUMBER=substr($0,17,8)
QTY_DELIVERED=substr($0,114,6)
TOT_QTY[SHIP_NUMBER]+=QTY_DELIVERED
DATA_VAL[SHIP_NUMBER]=$0"^"DATA_VAL[SHIP_NUMBER]
};
END {
for (SHIPMENT_NUMBER in DATA_VAL)
{
if(TOT_QTY[SHIPMENT_NUMBER]<1000000) {
#print DATA_VAL[SHIPMENT_NUMBER] > CLEAN_FILE
}
else{
i=split(DATA_VAL[SHIPMENT_NUMBER],GT_ONE_MIL,"^");
for (j=1;j<=i;j++)
{
TEMP_PO=int(SHIPMENT_NUMBER)SUFFIX++
gsub(SHIPMENT_NUMBER,"%08s"TEMP_PO,GT_ONE_MIL[j])
print GT_ONE_MIL[j] > RECREATE_FILE
}
}
}
}' <input_file_name>

But the output that I am getting is

Code:
cat re.txt
000446448742    %08s8590730               040520100408 32083576      PEWSWATER BARRIER DAG GK4000            N 017509900000000000000000077000000000100
000446448742    %08s8590731               040520100408 32083576      PEWSWATER BARRIER DAG GK4000            N 017500180000000000000000077000000000100

Can you please help me here Smilie.
# 2  
Old 05-09-2010
Code:
awk -v RECREATE_FILE=re.txt 'BEGIN {STR="A B C D E F G H I J K L M N O P Q R S T U V W X Y Z"; split(STR,SUFFIX, " "); a=1};
NR==FNR { TOT_QTY[$2]+=substr($NF,6,6) }
NR>FNR { for (i in TOT_QTY) { (($2==i)&&(TOT_QTY[$2]>1000000))?$2=substr($2,2,7)  SUFFIX[a++]:$2=$2}
         print $0 > RECREATE_FILE}' urfile urfile

Code:
$ cat re.txt
000446448742 00432265 040520100408 21974435 DEWSWATER GARRIER AAG IK4000 N 017500180000000000000000077000000000100
000446448742 0580937A 040520100408 32083576 PEWSWATER BARRIER DAG GK4000 N 017500180000000000000000077000000000100
000446448742 0580937B 040520100408 32083576 PEWSWATER BARRIER DAG GK4000 N 017509900000000000000000077000000000100
000446448742 00543376 040520100408 43194667 KEWSWATER FARRIER NAG HK4000 N 017500180000000000000000077000000000100

This User Gave Thanks to rdcwayx For This Post:
# 3  
Old 05-21-2010
Hi

I can see that the above solution will work. I have now tried to find out a manner in which the suffix will start with:
[CENTER]
Code:
AA


Code:
AB


then
Code:
AC

....
....
when the suffix reaches
Code:
Z


Can this be feasible?
# 4  
Old 05-26-2010
not sure if anybody can help on this?
# 5  
Old 05-27-2010
not clear.

So you need export as:

Code:
AA, AB, AC,.... BA, BB, BC, ..... ZZ ?

So above sample will convert from 00580937 to 580937AA
# 6  
Old 05-27-2010
hey rdcwayx

The answer to your question is yes. To summarise,

intially the suffix needs to start from

Code:
A, B, C, ..., Z

so that the 'processed' output (or as you call export) looks like

Code:
0580937A, 0580937B, 0580937C, ... 0580937Z

and when the suffix reaches Z then 'processed' output looks need to appear as

Code:
580937AA, 580937AB, 580937AC, ... , 580937AZ, 580937BA, ...  580937ZZ

i.e. in brief

Code:
0580937A, 0580937B, ..., 0580937Z, 580937AA, 580937AB, ...  580937ZZ

hope I'm making sense with my query.
# 7  
Old 05-27-2010
The silly way to use my code without big change is:

1. First generate the sequence

Code:
$ echo {A..Z}{A..Z}
AA AB AC AD AE AF AG AH AI AJ AK AL AM AN AO AP AQ AR AS AT AU AV AW AX AY AZ BA BB BC BD BE BF BG BH BI BJ BK BL BM BN BO BP BQ BR BS BT BU BV BW BX BY BZ CA CB CC CD CE CF CG CH CI CJ CK CL CM CN CO CP CQ CR CS CT CU CV CW CX CY CZ DA DB DC DD DE DF DG DH DI DJ DK DL DM DN DO DP DQ DR DS DT DU DV DW DX DY DZ EA EB EC ED EE EF EG EH EI EJ EK EL EM EN EO EP EQ ER ES ET EU EV EW EX EY EZ FA FB FC FD FE FF FG FH FI FJ FK FL FM FN FO FP FQ FR FS FT FU FV FW FX FY FZ GA GB GC GD GE GF GG GH GI GJ GK GL GM GN GO GP GQ GR GS GT GU GV GW GX GY GZ HA HB HC HD HE HF HG HH HI HJ HK HL HM HN HO HP HQ HR HS HT HU HV HW HX HY HZ IA IB IC ID IE IF IG IH II IJ IK IL IM IN IO IP IQ IR IS IT IU IV IW IX IY IZ JA JB JC JD JE JF JG JH JI JJ JK JL JM JN JO JP JQ JR JS JT JU JV JW JX JY JZ KA KB KC KD KE KF KG KH KI KJ KK KL KM KN KO KP KQ KR KS KT KU KV KW KX KY KZ LA LB LC LD LE LF LG LH LI LJ LK LL LM LN LO LP LQ LR LS LT LU LV LW LX LY LZ MA MB MC MD ME MF MG MH MI MJ MK ML MM MN MO MP MQ MR MS MT MU MV MW MX MY MZ NA NB NC ND NE NF NG NH NI NJ NK NL NM NN NO NP NQ NR NS NT NU NV NW NX NY NZ OA OB OC OD OE OF OG OH OI OJ OK OL OM ON OO OP OQ OR OS OT OU OV OW OX OY OZ PA PB PC PD PE PF PG PH PI PJ PK PL PM PN PO PP PQ PR PS PT PU PV PW PX PY PZ QA QB QC QD QE QF QG QH QI QJ QK QL QM QN QO QP QQ QR QS QT QU QV QW QX QY QZ RA RB RC RD RE RF RG RH RI RJ RK RL RM RN RO RP RQ RR RS RT RU RV RW RX RY RZ SA SB SC SD SE SF SG SH SI SJ SK SL SM SN SO SP SQ SR SS ST SU SV SW SX SY SZ TA TB TC TD TE TF TG TH TI TJ TK TL TM TN TO TP TQ TR TS TT TU TV TW TX TY TZ UA UB UC UD UE UF UG UH UI UJ UK UL UM UN UO UP UQ UR US UT UU UV UW UX UY UZ VA VB VC VD VE VF VG VH VI VJ VK VL VM VN VO VP VQ VR VS VT VU VV VW VX VY VZ WA WB WC WD WE WF WG WH WI WJ WK WL WM WN WO WP WQ WR WS WT WU WV WW WX WY WZ XA XB XC XD XE XF XG XH XI XJ XK XL XM XN XO XP XQ XR XS XT XU XV XW XX XY XZ YA YB YC YD YE YF YG YH YI YJ YK YL YM YN YO YP YQ YR YS YT YU YV YW YX YY YZ ZA ZB ZC ZD ZE ZF ZG ZH ZI ZJ ZK ZL ZM ZN ZO ZP ZQ ZR ZS ZT ZU ZV ZW ZX ZY ZZ

2. Update the code with that sequence to:

Code:
awk -v RECREATE_FILE=re.txt 'BEGIN {STR="A B C D E F G H I J K L M N O P Q R S T U V W X Y Z AA AB AC AD AE AF AG AH AI AJ AK AL AM AN AO AP AQ AR AS AT AU AV AW AX AY AZ BA BB BC BD BE BF BG BH BI BJ BK BL BM BN BO BP BQ BR BS BT BU BV BW BX BY BZ CA CB CC CD CE CF CG CH CI CJ CK CL CM CN CO CP CQ CR CS CT CU CV CW CX CY CZ DA DB DC DD DE DF DG DH DI DJ DK DL DM DN DO DP DQ DR DS DT DU DV DW DX DY DZ EA EB EC ED EE EF EG EH EI EJ EK EL EM EN EO EP EQ ER ES ET EU EV EW EX EY EZ FA FB FC FD FE FF FG FH FI FJ FK FL FM FN FO FP FQ FR FS FT FU FV FW FX FY FZ GA GB GC GD GE GF GG GH GI GJ GK GL GM GN GO GP GQ GR GS GT GU GV GW GX GY GZ HA HB HC HD HE HF HG HH HI HJ HK HL HM HN HO HP HQ HR HS HT HU HV HW HX HY HZ IA IB IC ID IE IF IG IH II IJ IK IL IM IN IO IP IQ IR IS IT IU IV IW IX IY IZ JA JB JC JD JE JF JG JH JI JJ JK JL JM JN JO JP JQ JR JS JT JU JV JW JX JY JZ KA KB KC KD KE KF KG KH KI KJ KK KL KM KN KO KP KQ KR KS KT KU KV KW KX KY KZ LA LB LC LD LE LF LG LH LI LJ LK LL LM LN LO LP LQ LR LS LT LU LV LW LX LY LZ MA MB MC MD ME MF MG MH MI MJ MK ML MM MN MO MP MQ MR MS MT MU MV MW MX MY MZ NA NB NC ND NE NF NG NH NI NJ NK NL NM NN NO NP NQ NR NS NT NU NV NW NX NY NZ OA OB OC OD OE OF OG OH OI OJ OK OL OM ON OO OP OQ OR OS OT OU OV OW OX OY OZ PA PB PC PD PE PF PG PH PI PJ PK PL PM PN PO PP PQ PR PS PT PU PV PW PX PY PZ QA QB QC QD QE QF QG QH QI QJ QK QL QM QN QO QP QQ QR QS QT QU QV QW QX QY QZ RA RB RC RD RE RF RG RH RI RJ RK RL RM RN RO RP RQ RR RS RT RU RV RW RX RY RZ SA SB SC SD SE SF SG SH SI SJ SK SL SM SN SO SP SQ SR SS ST SU SV SW SX SY SZ TA TB TC TD TE TF TG TH TI TJ TK TL TM TN TO TP TQ TR TS TT TU TV TW TX TY TZ UA UB UC UD UE UF UG UH UI UJ UK UL UM UN UO UP UQ UR US UT UU UV UW UX UY UZ VA VB VC VD VE VF VG VH VI VJ VK VL VM VN VO VP VQ VR VS VT VU VV VW VX VY VZ WA WB WC WD WE WF WG WH WI WJ WK WL WM WN WO WP WQ WR WS WT WU WV WW WX WY WZ XA XB XC XD XE XF XG XH XI XJ XK XL XM XN XO XP XQ XR XS XT XU XV XW XX XY XZ YA YB YC YD YE YF YG YH YI YJ YK YL YM YN YO YP YQ YR YS YT YU YV YW YX YY YZ ZA ZB ZC ZD ZE ZF ZG ZH ZI ZJ ZK ZL ZM ZN ZO ZP ZQ ZR ZS ZT ZU ZV ZW ZX ZY ZZ"; split(STR,SUFFIX, " "); a=1};
NR==FNR { TOT_QTY[$2]+=substr($NF,6,6) }
NR>FNR { for (i in TOT_QTY) { (($2==i)&&(TOT_QTY[$2]>1000000)&&a<=26)?$2=substr($2,2,7)  SUFFIX[a++]:$2=$2 ;
                             (($2==i)&&(TOT_QTY[$2]>1000000)&&a>26)?$2=substr($2,3,6)  SUFFIX[a++]:$2=$2}
         print $0 > RECREATE_FILE}' urfile urfile

(Code is not tested.)

Otherwise, you have to rewrite the code with your new rules.

Last edited by rdcwayx; 05-27-2010 at 09:29 PM.. Reason: script adjusted.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk script to append suffix to column when column has duplicated values

Please help me to get required output for both scenario 1 and scenario 2 and need separate code for both scenario 1 and scenario 2 Scenario 1 i need to do below changes only when column1 is CR and column3 has duplicates rows/values. This inputfile can contain 100 of this duplicated rows of... (1 Reply)
Discussion started by: as7951
1 Replies

2. UNIX for Dummies Questions & Answers

Sequence of conditions awk

hello gurus, I want to use an associative array from a file to populate a field of another file, by matching several columns in order of priority. If the first column matches, then i dont want to match $2. Similarly I only want to match $3 when $1 and $2 are not in associative array. For the... (6 Replies)
Discussion started by: ritakadm
6 Replies

3. Shell Programming and Scripting

Suffix formatting with awk

i would like to format the 9 character with suffix as "0". i tried below it doesn't work. >a=12345 > echo $a | awk '{printf "%-09s\n",$1}' >12345 required output is 123450000 can you guys help me out ? (7 Replies)
Discussion started by: expert
7 Replies

4. UNIX for Dummies Questions & Answers

awk code to reconstruct sequence from alignment

Hi Everyone, I need some help to construct a long 'Sbjct' string from the following input using incremental order of 'Sbjct' starting number (e.g. 26325115,33716368,33769033,34869860 etc.) Different 'Sbject' string will be separated by 'NNNN's as: ... (6 Replies)
Discussion started by: Fahmida
6 Replies

5. Shell Programming and Scripting

Rsync script to rewrite suffix - BASH, awk, sed, perl?

trying to write up a script to put the suffix back. heres what I have but can't get it to do anything :( would like it to be name.date.suffix rsync -zrlpoDtub --suffix=".`date +%Y%m%d%k%M%S`.~" --bwlimit=1024 /mymounts/test1/ /mymounts/test2/ while IFS=. read -r -u 9 -d '' name... (1 Reply)
Discussion started by: jmituzas
1 Replies

6. Shell Programming and Scripting

find common entries and match the number with long sequence and cut that sequence in output

Hi all, I have a file like this ID 3BP5L_HUMAN Reviewed; 393 AA. AC Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3; DT 05-FEB-2008, integrated into UniProtKB/Swiss-Prot. DT 05-JUL-2004, sequence version 1. DT 05-SEP-2012, entry version 71. FT COILED 59 140 ... (1 Reply)
Discussion started by: manigrover
1 Replies

7. Shell Programming and Scripting

AWK adding prefix/suffix to list of strings

75 103 131 133 138 183 197 221 232 234 248 256 286 342 368 389 463 499 524 538 (5 Replies)
Discussion started by: chrisjorg
5 Replies

8. Shell Programming and Scripting

awk finding counting sequence

Can awk count numbers until it reaches the end of the sequence after the slash? input: serv1a, 32, 41/47, 53, 89/100, 108/11, 113. serv1b, 1/2, 114/18, 121/35, 139/40, 143/55, 159/64, serv2, 255/56, 274/77, 763, 774/75, 777, 1434/35, 1444/50, 1715, 2025/31, 2048. serv10b, 804, 808, 929/32,... (9 Replies)
Discussion started by: sdf
9 Replies

9. Shell Programming and Scripting

Arguments and suffix name

While calling shell script i need to use prefix . Any idea? Ex: myscript.sh -parameter1 "AA;BB" -parameter2 "DD;E" (5 Replies)
Discussion started by: mnjx
5 Replies

10. UNIX for Dummies Questions & Answers

how can i isolate the random sequence of numbers using awk?

as you can see there is a delimiter after c8 "::". Awk sees the rest as fields because it doesn't recognize spaces and tabs as delimiters. So i am basically looking to isolate 20030003ba13f6cc. Can anyone help? c8::20030003ba13f6cc disk connected configured unknown (2 Replies)
Discussion started by: rcon1
2 Replies
Login or Register to Ask a Question