Awk: group multiple fields from different records

 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Awk: group multiple fields from different records
# 1  
Old 07-06-2018
Awk: group multiple fields from different records

Hi,

My input looks like that:
Code:
A|123|qwer
A|456|tyui
A|456|wsxe
B|789|dfgh

Using awk, I am trying to get:
Code:
A|123;456|qwer;tyui;wsxe
B|789|dfgh

For records with same $1, group all the $2 in a field (without replicates), and all the $3 in a field (without replicates).

What I have tried:
Code:
echo -e "A|123|qwer\nA|456|tyui\nA|456|wsxe\nB|789|dfgh" | gawk 'BEGIN{FS=OFS="|"}{a[$1]=sprintf("%s%s", a[$1], a[$1] ~ /$2/ ? "":";"$2); b[$1]=sprintf("%s%s", b[$1], b[$1] ~ /$3/ ? "":";"$3)}END{for(i in a){print i FS a[i] FS b[i]}}'

(Wrong) output:
Code:
A|;123;456;456|;qwer;tyui;wsxe
B|;789|;dfgh

However, I still cannot manage to remove the duplicated strings inside fields $2 and $3.

Last edited by beca123456; 07-06-2018 at 09:12 AM..
# 2  
Old 07-06-2018
Do yourself a favour and start indenting / structuring your code for readability and understandability. Try

Code:
awk -F\| '
        {if (!(a[$1] ~ $2)) a[$1] = a[$1] DL[$1] $2
         if (!(b[$1] ~ $3)) b[$1] = b[$1] DL[$1] $3
         DL[$1] = ";"
        }
END     {for (i in a)   {print i, a[i], b[i]
                        }
        }
' OFS="|"  file
A|123;456|qwer;tyui;wsxe
B|789|dfgh

This User Gave Thanks to RudiC For This Post:
# 3  
Old 07-06-2018
The following variant does a precise lookup (to supress duplicates),
and does not need an array of delimiters:
Code:
awk '
BEGIN {
  FS=OFS="|"
  dl=";"
}
function strjoin(i, j){
  if (i=="") return j  # first element
  if (index((dl i dl), (dl j dl))) return i # duplicate
  return (i dl j) # join element
} 
{
  s2[$1]=strjoin (s2[$1], $2)
  s3[$1]=strjoin (s3[$1], $3)
}
END {
  for (i in s2) print i, s2[i], s3[i]
}
' file

This is a good demonstration of a function Smilie
This User Gave Thanks to MadeInGermany For This Post:
# 4  
Old 07-06-2018
Quote:
The following variant does a precise lookup (to supress duplicates)
I don't understand this statement. Both solutions seem to work just fine.
Is one more prone to errors than the other?

Last edited by beca123456; 07-06-2018 at 03:53 PM..
# 5  
Old 07-06-2018
The regular expression search ~ is different from the string search via index.
You'll see differences e.g. with the following input files
Code:
A|123|qwer
A|456|tyui
A|45|wsxe
B|789|dfgh

Code:
A|123|qwer
A|455|tyui
A|45*|wsxe
B|789|dfgh

This User Gave Thanks to MadeInGermany For This Post:
# 6  
Old 07-06-2018
Very good point !
I got it now, thanks !
# 7  
Old 07-06-2018
You can "sharpen" or "narrow down" the regex to avoid false positive matches like
Code:
awk -F\| '
        {if (!(a[$1] ~ "(^|;)" $2 "(;|$)")) a[$1] = a[$1] DL[$1] $2
         if (!(b[$1] ~ "(^|;)" $3 "(;|$)")) b[$1] = b[$1] DL[$1] $3
         DL[$1] = ";"
        }
END     {for (i in a)   {print i, a[i], b[i]
                        }
        }
' OFS="|"  file

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk for matching fields between files with repeated records

Hello all, I am having trouble with what should be an easy task, but seem to be missing something fundamental. I have two files, with File 1 consisting of a single field of many thousands of records. I also have File 2 with two fields and many thousands of records. My goal is that when $1 of... (2 Replies)
Discussion started by: jvoot
2 Replies

2. Shell Programming and Scripting

Print multiple fields with awk

so its common knowledge one can print multiple fields with simple commands like this: echo 12 44 45 552 24 | awk '{print $1,$4,$3}' but suppose i want to avoid specifying the "$" symbol. is that possible? can something like this be done: echo 12 44 45 552 24 | awk '{print $(1,4,3)}' ... (9 Replies)
Discussion started by: SkySmart
9 Replies

3. Shell Programming and Scripting

Shell Script to Group by Based on Multiple Fields in a file

Hi, I want to know if there is any simple approach to SUM a field based on group by of different fields for e.g. file1.txt contains below data 20160622|XXX1||50.00||50.00|MONEY|Plan1| 20160622|XXX1||100.00||100.00|MONEY|Plan1| 20160623|XXX1||25.00||25.00|MONEY|Plan1|... (3 Replies)
Discussion started by: cnu_theprince
3 Replies

4. UNIX for Dummies Questions & Answers

Make all records with the same number of fields (awk)

Hi, input: AA|BB|CC DD|EE FF what I am trying to get: AA|BB|CC DD|EE| FF|| I tried to create first an UDF for printing repeats, but I think I have an issue with my END section or my array: function repeat(str, n, rep, i) { for(i=1 ;i<n;i++) rep=rep str return rep } ... (6 Replies)
Discussion started by: beca123456
6 Replies

5. Shell Programming and Scripting

awk multiple fields separators

Can you please help me with this .... Input File share "FTPTransfer" "/v31_fs01/root/FTP-Transfer" umask=022 maxusr=4294967295 netbios=NJ09FIL530 share "Test" "/v31_fs01/root/Test" umask=022 maxusr=4294967295 netbios=NJ09FIL530 share "ENR California" "/v31_fs01/root/ENR California"... (14 Replies)
Discussion started by: greycells
14 Replies

6. Shell Programming and Scripting

awk gsub multiple fields

Hi, I am trying to execute this line awk -F ";" -v OFS=";" '{gsub(/\./,",",$6); print}' FILE but for multiple fields $6 $7 $8 Do you have a suggstion? Tried: awk -F ";" -v OFS="";"" "function GSUB( F ) {gsub(/\./,\",\",$F); print} { GSUB( 6 ); GSUB( 7 ); GSUB( 8 ) } 1"... (2 Replies)
Discussion started by: nakaedu
2 Replies

7. UNIX for Dummies Questions & Answers

keeping last record among group of records with common fields (awk)

input: ref.1;rack.1;1 #group1 ref.1;rack.1;2 #group1 ref.1;rack.2;1 #group2 ref.2;rack.3;1 #group3 ref.2;rack.3;2 #group3 ref.2;rack.3;3 #group3 Among records from same group (i.e. with same 1st and 2nd field - separated by ";"), I would need to keep the last record... (5 Replies)
Discussion started by: beca123456
5 Replies

8. Shell Programming and Scripting

how to parse with awk (using different fields), then group by a field?

When parsing multiple fields in a file using AWK, how do you group by one of the fields and parse by delimiters? to clarify If a file had tom | 223-2222-4444 , randofield ivan | 123-2422-4444 , random filed ... | and , are the delimiters ... How would you group by the social security... (4 Replies)
Discussion started by: Josef_Stalin
4 Replies

9. Infrastructure Monitoring

Processing records as group - awk

I have a file has following records policy glb id 1233 name Permit ping from "One" to "Second" "Address1" "Any" "ICMP-ANY" permit policy id 999251 service "snmp-udp" exit policy glb id 1234 name Permit telnet from "One" to "Second" "Address2" "Any" "TCP-ANY" permit policy id 1234... (3 Replies)
Discussion started by: baskar
3 Replies

10. UNIX for Dummies Questions & Answers

AWK ??-print for fields within records in a file

Hello all, Would appreciate if someone can help me out on the following requirement. INPUT FILE: -------------------------- TPS REPORT abc def ghi jkl mon pqr stu vrs lll END OF TPS REPORT TPS REPORT field1 field2 field3 field4 field5 field6 (8 Replies)
Discussion started by: hyennah
8 Replies
Login or Register to Ask a Question