keeping last record among group of records with common fields (awk)


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers keeping last record among group of records with common fields (awk)
# 1  
Old 10-07-2012
keeping last record among group of records with common fields (awk)

input:
Code:
ref.1;rack.1;1     #group1
ref.1;rack.1;2     #group1
ref.1;rack.2;1     #group2
ref.2;rack.3;1     #group3
ref.2;rack.3;2     #group3
ref.2;rack.3;3     #group3

Among records from same group (i.e. with same 1st and 2nd field - separated by ";"), I would need to keep the last record (or the record with the highest number in the last field, which is the same here).

in order to get:
Code:
ref.1;rack.1;2
ref.1;rack.2;1
ref.2;rack.3;3

I think I managed to isolate records in groups by doing
Code:
BEGIN{FS=OFS=";"}

{array[$1$2] = $0

for(a in array)
if(<$0 is last record>) print array[a]}

but I don't know how to say "the last record". I tried with NR but it didn't really help...
# 2  
Old 10-07-2012
Quote:
Originally Posted by beca123456
but I don't know how to say "the last record". I tried with NR but it didn't really help...
To know the last record, you have read the full input stream and then in the END pattern, print out the records. Since you are using an associative array in the mentioned manner, the elements of it will always contain the last records of any group.
And, setting OFS is useless in this case.
Code:
BEGIN{FS=";"}
{array[$1,$2]=$0}
END{for(a in array) print array[a]}

This User Gave Thanks to elixir_sinari For This Post:
# 3  
Old 10-07-2012
Thanks elixir_sinari. I got it now for this way !

And what about if I wanted to keep the highest value in the last field instead of the last line (which is the same here but just for me to know)?
# 4  
Old 10-07-2012
Assuming only positive values in the third field, you may try:
Code:
BEGIN{FS=";"}
$3>=highest[$1,$2]{maxrec[$1,$2]=$0;highest[$1,$2]=$3}
END{
for(i in maxrec)
 print maxrec[i]
}


Last edited by elixir_sinari; 10-07-2012 at 04:11 AM..
This User Gave Thanks to elixir_sinari For This Post:
# 5  
Old 10-07-2012
Here are some options for you. This script provides several ways of processing the input file giving different results depending on whether you want the highest value for $3 or the last value for $3, all entries with matching field 1 and field 2 values adjacent or spread throughout the input file, and whether or not you care if the output order is the same as the input file order:
Code:
#!/bin/ksh
printf "Following assumes all entries with matching field 1 & field 2 are
adjacent and prints the last entry found.\n"
awk 'BEGIN {FS = OFS = ";"}
last != $1 FS $2 {
        if(last != "") print last, hi3
        last = $1 FS $2
        hi3 = $3
        next
}
        {hi3 = $3}
END {   if(last != "") print last, hi3}' input

printf "\nFollowing assumes all entries with matching field 1 & field 2 are
adjacent and prints the entry with highest value in field 3.\n"
awk 'BEGIN {FS = OFS = ";"}
last != $1 FS $2 {
        if(last != "") print last, hi3
        last = $1 FS $2
        hi3 = $3
        next
}
        {if($3 > hi3) hi3 = $3}
END {   if(last != "") print last, hi3}' input

printf "\nFollowing assumes entries with matching field 1 & field 2 might not be
adjacent and prints the last entry found.  Output order is not guaranteed to
match the order of first appearance in the input file.\n"
awk 'BEGIN {FS = OFS = ";"}
{       k3[$1 FS $2] = $3}
END {   for (k in k3) print k, k3[k]}' input

printf "\nFollowing assumes entries with matching field 1 & field 2 might not be
adjacent and prints the highest entry found.  Output order is not guaranteed to
match the order of first appearance in the input file.\n"
awk 'BEGIN {FS = OFS = ";"}
{       if(kc[$1 FS $2]++ == 0) k3[$1 FS $2] = $3
        else if($3 > k3[$1 FS $2]) k3[$1 FS $2] = $3
}
END {   for (k in kc) print k, k3[k]}' input

printf "\nFollowing assumes entries with matching field 1 & field 2 might not be
adjacent and prints the highest entry found.  Output order is guaranteed to
match the order of first appearance in the input file.\n"
awk 'BEGIN {FS = OFS = ";"}
{       if(kc[$1 FS $2]++ == 0) {
                k3[$1 FS $2] = $3
                order[++cnt] = $1 FS $2
        } else if($3 > k3[$1 FS $2]) k3[$1 FS $2] = $3
}
END {   for(i = 1; i <= cnt; i++) print order[i], k3[order[i]]}' input

When the file input contains:
Code:
split;test;2
ref.1;rack.1;2
ref.1;rack.1;1
ref.1;rack.2;1
split;test;3
ref.2;rack.3;1
ref.2;rack.3;2
ref.2;rack.3;3
split;test;1

the output from the above script is:
Code:
Following assumes all entries with matching field 1 & field 2 are
adjacent and prints the last entry found.
split;test;2
ref.1;rack.1;1
ref.1;rack.2;1
split;test;3
ref.2;rack.3;3
split;test;1

Following assumes all entries with matching field 1 & field 2 are
adjacent and prints the entry with highest value in field 3.
split;test;2
ref.1;rack.1;2
ref.1;rack.2;1
split;test;3
ref.2;rack.3;3
split;test;1

Following assumes entries with matching field 1 & field 2 might not be
adjacent and prints the last entry found.  Output order is not guaranteed to
match the order of first appearance in the input file.
split;test;1
ref.2;rack.3;3
ref.1;rack.1;1
ref.1;rack.2;1

Following assumes entries with matching field 1 & field 2 might not be
adjacent and prints the highest entry found.  Output order is not guaranteed to
match the order of first appearance in the input file.
split;test;3
ref.2;rack.3;3
ref.1;rack.1;2
ref.1;rack.2;1

Following assumes entries with matching field 1 & field 2 might not be
adjacent and prints the highest entry found.  Output order is guaranteed to
match the order of first appearance in the input file.
split;test;3
ref.1;rack.1;2
ref.1;rack.2;1
ref.2;rack.3;3

This User Gave Thanks to Don Cragun For This Post:
# 6  
Old 10-07-2012
Waoww !!! This is a very complete answer !
Thanks a lot !
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Awk: group multiple fields from different records

Hi, My input looks like that: A|123|qwer A|456|tyui A|456|wsxe B|789|dfgh Using awk, I am trying to get: A|123;456|qwer;tyui;wsxe B|789|dfgh For records with same $1, group all the $2 in a field (without replicates), and all the $3 in a field (without replicates). What I have tried:... (6 Replies)
Discussion started by: beca123456
6 Replies

2. Shell Programming and Scripting

Keeping record of file 2 based on a reference file 1 in awk

I have 2 input files (tab separated): file1: make_A 1990 foo bar make_B 2010 this that make_C 2004 these those file2: make_X 1970 1995 ref_1:43 ref_2:65 make_A 1970 1995 ref_1:4 ref_2:21 ref_3:18 make_A 1980 2002 ref_1:7 ref_2:7 ref_3:0 ... (2 Replies)
Discussion started by: beca123456
2 Replies

3. Shell Programming and Scripting

Sorting group of records and loading last record

Hi Everyone, I have below record set. File is fixed widht file 101newjersyus 20150110 101nboston us 20150103 102boston us 20140106 102boston us 20140103 I need to group record based on first 3 letters in our case(101 and 102) and sort last 8 digit in ascending order and print only... (4 Replies)
Discussion started by: patricjemmy6
4 Replies

4. Shell Programming and Scripting

awk print matching records and occurences of each record

Hi all , I have two files : dblp.xml with dblp records and itu1.txt with faculty members records. I need to find out how many dblp records are related to the faculty members. More specific: I need to find out which names from itu1.txt are a match in dblp. xml file , print them and show how many... (4 Replies)
Discussion started by: iori
4 Replies

5. Shell Programming and Scripting

awk - compare 1st 15 fields of record with 20 fields

I'm trying to compare 2 files for differences in a selct number of fields. When differnces are found it will write the whole record of the second file including appending '|C' out to a delta file. Each record will have 20 fields, but only want to do comparison of 1st 15 fields. The 1st field of... (7 Replies)
Discussion started by: sljnk
7 Replies

6. Shell Programming and Scripting

Finding out the common lines in two files using 4 fields with the help of awk and UNIX

Dear All, I have 2 files. If field 1, 2, 4 and 5 matches in both file1 and file2, I want to print the whole line of file1 and file2 one after another in my output file. File1: sc2/80 20 . A T 86 F=5;U=4 sc2/60 55 . G T ... (1 Reply)
Discussion started by: NamS
1 Replies

7. Shell Programming and Scripting

Common records using AWK

Hi, To be honest, I am really impressed and amazed at the pace I find solutions for un-solved coding mysteries in this forum. I have a file like this input1.txt x y z 1 2 3 a b c 4 -3 7 k l m n 0 p 1 2 a b c 4 input2 x y z 9 0 -1 a b c 0 6 9 k l m 8 o p 1 2 a f x 9 Output... (9 Replies)
Discussion started by: jacobs.smith
9 Replies

8. Shell Programming and Scripting

Print all the fields of record using awk

Hi, i want to generate print statement using awk. i have 20+ and 30+ fields in each line Now its priting only first eight fields print statement as output not all. my record is as shown below filename ... (2 Replies)
Discussion started by: raghavendra.nsn
2 Replies

9. Infrastructure Monitoring

Processing records as group - awk

I have a file has following records policy glb id 1233 name Permit ping from "One" to "Second" "Address1" "Any" "ICMP-ANY" permit policy id 999251 service "snmp-udp" exit policy glb id 1234 name Permit telnet from "One" to "Second" "Address2" "Any" "TCP-ANY" permit policy id 1234... (3 Replies)
Discussion started by: baskar
3 Replies

10. Shell Programming and Scripting

awk: record has too many fields

Hi, I'm trying this command - but get this error. Do you guys have any workaround for this? cat tf|sed 's/{//g'|sed 's/,//g'|awk '{for (i=1;i<=NF;i++) {if ($i == "OPTIME") {k = i + 2; print $i,$k}}}' awk: record `2005 Jul 28 17:35:29...' has too many fields record number 15 This is how... (3 Replies)
Discussion started by: chaandana
3 Replies
Login or Register to Ask a Question