Awk: subset of fields as variable with sprint


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Awk: subset of fields as variable with sprint
# 1  
Old 05-05-2016
Awk: subset of fields as variable with sprint

Dear Unix Gurus,

input:
Code:
A|1|2|3|4|5
B|1|2|3|4|3
C|1|2|3|4|1
D|1|9|3|4|12

output:
Code:
A_(5);B(3);C(1)|1|2|3|4
D_(12)|1|9|3|4

Details:
If $2, $3, $3, $5 are identical, concatenate $1 and associated $NF together in the first field.
But I am trying to do the above by passing the identical fields to a sprintf variable.

With my code below, sprintf variable write all the fields on a new line, so I assume it comes from the for loop in the main block, but I don't see what is wrong:
Code:
gawk 'BEGIN{FS=OFS="|"; start=2; end=5}{for(i=start; i<=end; i++){var=sprintf("%s%s",$i, i<e ? OFS : "\n")}; common[var] = common[var] (common[var]? ";" : "") $1"_("$NF")"}END{for(j in common){print common[j] OFS j}}' input

# 2  
Old 05-05-2016
Hello beca123456,

Not sure if in output shown A_(5) and D_(12) where _ is required, if not then following may help you in same. If you are not considering the order of the output then following may help you.
Code:
awk -F"|" '{XYZ[$2 FS $3 FS $3 FS $4]=XYZ[$2 FS $3 FS $3 FS $4]?XYZ[$2 FS $3 FS $3 FS $4]";"$1"("$NF")":$1"("$NF")"} END{for(i in XYZ){print XYZ[i] FS i}}'  Input_file
OR
awk -F"|" '{Q=$2 FS $3 FS $3 FS $4;XYZ[Q]=XYZ[Q]?XYZ[Q]";"$1"("$NF")":$1"("$NF")"} END{for(i in XYZ){print XYZ[i] FS i}}'  Input_file

Output will be as follows then.
Code:
D(12)|1|9|9|3
A(5);B(3);C(1)|1|2|2|3

If you need the correct order in output too as per your Input_file then following may help you in same.
Code:
awk -F"|" 'FNR==NR{XYZ[$2 FS $3 FS $4 FS $5]=XYZ[$2 FS $3 FS $4 FS $5]?XYZ[$2 FS $3 FS $4 FS $5]";"$1"("$NF")":$1"("$NF")";next} (($2 FS $3 FS $4 FS $5) in XYZ){print XYZ[$2 FS $3 FS $4 FS $5] OFS $2 FS $3 FS $4 FS $5;delete XYZ[$2 FS $3 FS $4 FS $5]}' Input_file OFS="|" Input_file
OR
awk -F"|" 'FNR==NR{Q=$2 FS $3 FS $4 FS $5;XYZ[Q]=XYZ[Q]?XYZ[Q]";"$1"("$NF")":$1"("$NF")";next} {P=$2 FS $3 FS $4 FS $5} (P in XYZ){print XYZ[P] OFS P;delete XYZ[P]}' Input_file OFS="|"  Input_file

Output will be as follows then.
Code:
A(5);B(3);C(1)|1|2|3|4
D(12)|1|9|3|4

If above is not completing your requirements then please show us more Input_file and expected output with more specific details on same, hope this helps you.
EDIT: Adding a non-one liner form of solutions now.
Code:
awk -F"|" '{
                XYZ[$2 FS $3 FS $3 FS $4]=XYZ[$2 FS $3 FS $3 FS $4]?XYZ[$2 FS $3 FS $3 FS $4]";"$1"("$NF")":$1"("$NF")"
           }
           END{
                for(i in XYZ){
                                print XYZ[i] FS i
                             }
              }
          ' Input_file
  
 
awk -F"|" 'FNR==NR{
                        Q=$2 FS $3 FS $4 FS $5;
                        XYZ[Q]=XYZ[Q]?XYZ[Q]";"$1"("$NF")":$1"("$NF")";
                        next
                  }
                  {
                        P=$2 FS $3 FS $4 FS $5
                  }
                  (P in XYZ){
                                print XYZ[P] OFS P;
                                delete XYZ[P]
                            }
          ' Input_file OFS="|" Input_file

Thanks,
R. Singh

Last edited by RavinderSingh13; 05-05-2016 at 02:18 AM.. Reason: Added non-one liner form of solution now.
# 3  
Old 05-05-2016
Thanks for your help !

However, the point of my post was to use sprintf to store the identical fields in a variable ($2, $3, $4, $5 here). I need to use sprintf because in my real file there are more than 30 fields that need to be identical between records in order to concatenate the first and last field together.
As I would like to avoid writing the 30 fields, I was thinkingusing a for loop and sprintf

Sorry for the confusion
# 4  
Old 05-05-2016
Without further commenting on your approach, I can see that in the conditional assignment i<e will always be FALSE as e is undefined thus zero.

Actually I can't see the advantages of using sprintf, so why do you insist on it?
# 5  
Old 05-05-2016
Quote:
Without further commenting on your approach, I can see that in the conditional assignment i<e will always be FALSE as e is undefined thus zero.
Correct. This is a typo. My code should have been:
Code:
gawk 'BEGIN{FS=OFS="|"; start=2; end=5}{for(i=start; i<=end; i++){var=sprintf("%s%s",$i, i<end ? OFS : "\n")}; common[var] = common[var] (common[var]? ";" : "") $1"_("$NF")"}END{for(j in common){print common[j] OFS j}}' input

Quote:
Actually I can't see the advantages of using sprintf , so why do you insist on it?
Alright. Let me be more explicit.

input:
Code:
A|1|2|3|4|A|Q|W|S|E|D|F|R|C|D|E|S|S|W|Q|D|C|E|F|E|V|F|R|R|E|W|Q|Z|V|L|L|H|5
B|1|2|3|4|A|Q|W|S|E|D|F|R|C|D|E|S|S|W|Q|D|C|E|F|E|V|F|R|R|E|W|Q|Z|V|L|L|H|3
C|1|2|3|4|A|Q|W|S|E|D|F|R|C|D|E|S|S|W|Q|D|C|E|F|E|V|F|R|R|E|W|Q|Z|V|L|L|H|1
D|1|2|3|4|S|D|E|Q|O|P|V|S|E|R|E|E|E|E|J|A|R|J|L|U|E|L|O|P|I|T|A|L|Y|D|T|B|12

output:
Code:
A_(5);B(3);C(1)|1|2|3|4|A|Q|W|S|E|D|F|R|C|D|E|S|S|W|Q|D|C|E|F|E|V|F|R|R|E|W|Q|Z|V|L|L|H
D_(12)|1|2|3|4|S|D|E|Q|O|P|V|S|E|R|E|E|E|E|J|A|R|J|L|U|E|L|O|P|I|T|A|L|Y|D|T|B

The common fields between records starting by A, B, and C are fields from $2 up to $36.
Since I don't want to write the following because it is error-prone:
Code:
gawk 'BEGIN{FS=OFS="|"}{common[$2FS$3FS$4FS$5FS$6FS$7FS$8FS$9FS$10FS$11FS$12FS$13FS$14FS$15FS$16FS$17FS$18FS$19FS$20$FS$21FS$22FS$23FS$24FS$25FS$26FS$27FS$28FS$29FS$30FS$31FS$32FS$33FS$34FS$35FS$36] = common[$2FS$3FS$4FS$5FS$6FS$7FS$8FS$9FS$10FS$11FS$12FS$13FS$14FS$15FS$16FS$17FS$18FS$19FS$20$FS$21FS$22FS$23FS$24FS$25FS$26FS$27FS$28FS$29FS$30FS$31FS$32FS$33FS$34FS$35FS$36] (common[$2FS$3FS$4FS$5FS$6FS$7FS$8FS$9FS$10FS$11FS$12FS$13FS$14FS$15FS$16FS$17FS$18FS$19FS$20$FS$21FS$22FS$23FS$24FS$25FS$26FS$27FS$28FS$29FS$30FS$31FS$32FS$33FS$34FS$35FS$36]? ";" : "") $1"_("$NF")"}END{for(j in common){print common[j] OFS j}}' input

I thought I could replace
Code:
common[$2FS$3FS$4FS$5FS$6FS$7FS$8FS$9FS$10FS$11FS$12FS$13FS$14FS$15FS$16FS$17FS$18FS$19FS$20$FS$21FS$22FS$23FS$24FS$25FS$26FS$27FS$28FS$29FS$30FS$31FS$32FS$33FS$34FS$35FS$36]

by
Code:
for(i=2; i<=36; i++){var=sprintf("%s%s",$i, i<36 ? OFS : "\n"); common[var]}

# 6  
Old 05-06-2016
You might want to consider an alternative approach. No sprintf() calls, no for loops to gather your keys, and no long lists of explicitly copied arguments; just a single substr() call. And, it doesn't care how many input fields are in your keys.
Code:
awk -F'|' '
{	key = substr($0, length($1) + 1, length - length($1 FS $NF))
	if(key in list)
		list[key] = list[key] ";" $1 "(" $NF ")"
	else {	list[key] = $1 "_(" $NF ")"
		keylist[++keycount] = key
	}
}
END {	for(i = 1; i <= keycount; i++)
		print list[keylist[i]]  keylist[i]
}' file

If file is a file that contains slightly modified versions of your two input samples:
Code:
A|1|2|3|4|5
B|1|2|3|4|3
C|1|2|3|4|1
D|1|9|3|4|12
E|1|2|3|4|A|Q|W|S|E|D|F|R|C|D|E|S|S|W|Q|D|C|E|F|E|V|F|R|R|E|W|Q|Z|V|L|L|H|5
F|1|2|3|4|A|Q|W|S|E|D|F|R|C|D|E|S|S|W|Q|D|C|E|F|E|V|F|R|R|E|W|Q|Z|V|L|L|H|3
G|1|2|3|4|A|Q|W|S|E|D|F|R|C|D|E|S|S|W|Q|D|C|E|F|E|V|F|R|R|E|W|Q|Z|V|L|L|H|1
H|1|2|3|4|S|D|E|Q|O|P|V|S|E|R|E|E|E|E|J|A|R|J|L|U|E|L|O|P|I|T|A|L|Y|D|T|B|12

it produces the output:
Code:
A_(5);B(3);C(1)|1|2|3|4
D_(12)|1|9|3|4
E_(5);F(3);G(1)|1|2|3|4|A|Q|W|S|E|D|F|R|C|D|E|S|S|W|Q|D|C|E|F|E|V|F|R|R|E|W|Q|Z|V|L|L|H
H_(12)|1|2|3|4|S|D|E|Q|O|P|V|S|E|R|E|E|E|E|J|A|R|J|L|U|E|L|O|P|I|T|A|L|Y|D|T|B

If someone wants to try this on a Solaris/SunOS system, change awk in this script to /usr/xpg4/bin/awk or nawk.

Obviously, you can convert my suggestion to a 1-liner; but I'll take this more readable, more easily maintained version of the code over a 1-liner any day.
This User Gave Thanks to Don Cragun For This Post:
# 7  
Old 05-06-2016
Thanks Don Cragun, this is brilliant !
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Check if string variable is a subset of another string variable

Below is my ksh shell script where I need to check if variable fileprops is a subset of $1 argument. echo "FILE PROPERTY: $fileprops" echo "PARAMETER3: $1" if ; then echo "We are Good. $line FILE is found to be INTACT !! " else echo... (2 Replies)
Discussion started by: mohtashims
2 Replies

2. Shell Programming and Scripting

awk to filter file using another working on smaller subset

In the below awk if I use the attached file as the input, I get no results for TCF4. However, if I just copy that line from the attached file and use that as input I get results for TCF4. Basically the gene file is a 1 column list that is used to filter $8 of the attached file. When there is a... (9 Replies)
Discussion started by: cmccabe
9 Replies

3. Shell Programming and Scripting

awk sort based on difference of fields and print all fields

Hi I have a file as below <field1> <field2> <field3> ... <field_num1> <field_num2> Trying to sort based on difference of <field_num1> and <field_num2> in desceding order and print all fields. I tried this and it doesn't sort on the difference field .. Appreciate your help. cat... (9 Replies)
Discussion started by: newstart
9 Replies

4. Shell Programming and Scripting

awk processing of variable number of fields data file

Hy! I need to post-process some data files which have variable (and periodic) number of fields. For example, I need to square (data -> data*data) the folowing data file: -5.34281E-28 -3.69822E-29 8.19128E-29 9.55444E-29 8.16494E-29 6.23125E-29 4.42106E-29 2.94592E-29 1.84841E-29 ... (5 Replies)
Discussion started by: radudownload
5 Replies

5. Shell Programming and Scripting

awk - compare 1st 15 fields of record with 20 fields

I'm trying to compare 2 files for differences in a selct number of fields. When differnces are found it will write the whole record of the second file including appending '|C' out to a delta file. Each record will have 20 fields, but only want to do comparison of 1st 15 fields. The 1st field of... (7 Replies)
Discussion started by: sljnk
7 Replies

6. Shell Programming and Scripting

Concatenating awk fields with variable

Hi, I am trying to do this:- FILE=application.log PID=12345 FILE=`echo $FILE | awk -F "." '{print $1 "$PID" $2}'` echo $FILE application$PIDlog I need the output to be application12345.log but I am not sure how to get the $PID variable into the output. I have tried various things... (3 Replies)
Discussion started by: sniper57
3 Replies

7. Shell Programming and Scripting

How to print 1st field and last 2 fields together and the rest of the fields after it using awk?

Hi experts, I need to print the first field first then last two fields should come next and then i need to print rest of the fields. Input : a1,abc,jsd,fhf,fkk,b1,b2 a2,acb,dfg,ghj,b3,c4 a3,djf,wdjg,fkg,dff,ggk,d4,d5 Expected output: a1,b1,b2,abc,jsd,fhf,fkk... (6 Replies)
Discussion started by: 100bees
6 Replies

8. Shell Programming and Scripting

Join fields comparing 4 fields using awk

Hi All, I am looking for an awk script to do the following Join the fields together only if the first 4 fields are same. Can it be done with join function in awk?? a,b,c,d,8,,, a,b,c,d,,7,, a,b,c,d,,,9, a,b,p,e,8,,, a.b,p,e,,9,, a,b,p,z,,,,9 a,b,p,z,,8,, desired output: ... (1 Reply)
Discussion started by: aksijain
1 Replies

9. Shell Programming and Scripting

awk print variable then fields in variable

i have this variable: varT="1--2--3--5" i want to use awk to print field 3 from this variable. i dont want to do the "echo $varT". but here's my awk code: awk -v valA="$varT" "BEGIN {print valA}" this prints the entire line. i feel like i'm so close to getting what i want. i... (4 Replies)
Discussion started by: SkySmart
4 Replies

10. Shell Programming and Scripting

awk sed cut? to rearrange random number of fields into 3 fields

I'm working on formatting some attendance data to meet a vendors requirements to upload to their system. With some help on the forums here, I have the data close. But they've since changed what they want. The vendor wants me to submit three fields to them. Field 1 is the studentid field,... (4 Replies)
Discussion started by: axo959
4 Replies
Login or Register to Ask a Question