Any 'shortcut' to doing this search for duplicate and print max

02-09-2016

Registered User

295, 6

Join Date: May 2009

Last Activity: 7 May 2020, 5:18 PM EDT

Posts: 295

Thanks Given: 62

Thanked 6 Times in 6 Posts

Any 'shortcut' to doing this search for duplicate and print max

Hi,

I have a file that contains multiple records of the same database.
I need to search for the maximum size of the database. At the moment, I am doing as below:

Sample generated file to parse is as below. With the caret (^) delimiter, field 1 is the database name, 2 is the database ID and the field4 is the database size.

Code:

[oraclescripts]$ cat /tmp/x.txt
- [ test11 ] - TEST11^2609333750^ARCHIVELOG^157184688128
- [ test11 ] - TEST11^3637562595^ARCHIVELOG^163462512640
- [ qual11 ] - QUAL11^901361709^ARCHIVELOG^11422138368
- [ qual11 ] - QUAL11^4014910711^ARCHIVELOG^14071889920

Excerpt of the script that I use to parse /tmp/x.txt is as below:

Code:

 
[oraclescripts]$ cat /tmp/x.ksh
#!/bin/ksh
 awk '{ print $6 }' /tmp/x.txt | awk -F"^" '{ print $1 }' | sort | uniq > /tmp/x.00
 while read line
do
   grep "$line" /tmp/x.txt | sort -nt"^" -k 4 | tail -1
done < /tmp/x.00

Sample run of the script is as below:

Code:

[oraclescripts]$ /tmp/x.ksh
- [ qual11 ] - QUAL11^4014910711^ARCHIVELOG^14071889920
- [ test11 ] - TEST11^3637562595^ARCHIVELOG^163462512640

Then I've been informed that ideally, what constitute a unique database is a concatenation of field1+ field2.

So if the file to parse, /tmp/x1.txt, is as below. There is actually two databases named TEST11, each having a different database IDs and two databases named QUAL11.

Code:

- [ test11 ] - TEST11^2609333750^ARCHIVELOG^157184688128
- [ test11 ] - TEST11^2609333750^ARCHIVELOG^170184688128
- [ test11 ] - TEST11^3637562595^ARCHIVELOG^163462512640
- [ qual11 ] - QUAL11^901361709^ARCHIVELOG ^11422138368
- [ qual11 ] - QUAL11^4014910711^ARCHIVELOG^14071889920

So the script that I am using now then is the one below:

Code:

 
 $ cat /tmp/x1.ksh
#!/bin/ksh
 awk '{ print $6 }' /tmp/x1.txt | awk -F"^" '{ print $1"^"$2 }' | sort | uniq > /tmp/x1.00
 while read line
do
   grep "$line" /tmp/x1.txt | sort -nt"^" -k 4 | tail -1
done < /tmp/x1.00

Sample run of x1.ksh looks good.

Code:

$ /tmp/x1.ksh
- [ qual11 ] - QUAL11^4014910711^ARCHIVELOG^14071889920
- [ qual11 ] - QUAL11^901361709^ARCHIVELOG ^11422138368
- [ test11 ] - TEST11^2609333750^ARCHIVELOG^170184688128
- [ test11 ] - TEST11^3637562595^ARCHIVELOG^163462512640

At the moment, I am getting the desired output but just want to know if there is a shortcut way of doing it.

BTW, need to run the script in KSH in Solaris and Linux.

Thanks in advance.

newbie_01

View Public Profile for newbie_01

Find all posts by newbie_01

02-09-2016

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

How about

Code:

sort -t^ -k2,2n -k4,4nr file | awk '!T[$2]++' FS=^
- [ qual11 ] - QUAL11^901361709^ARCHIVELOG ^11422138368
- [ test11 ] - TEST11^2609333750^ARCHIVELOG^170184688128
- [ test11 ] - TEST11^3637562595^ARCHIVELOG^163462512640
- [ qual11 ] - QUAL11^4014910711^ARCHIVELOG^14071889920

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

02-26-2016

Registered User

295, 6

Join Date: May 2009

Last Activity: 7 May 2020, 5:18 PM EDT

Posts: 295

Thanks Given: 62

Thanked 6 Times in 6 Posts

Quote:

Originally Posted by RudiC

How about

Code:

sort -t^ -k2,2n -k4,4nr file | awk '!T[$2]++' FS=^
- [ qual11 ] - QUAL11^901361709^ARCHIVELOG ^11422138368
- [ test11 ] - TEST11^2609333750^ARCHIVELOG^170184688128
- [ test11 ] - TEST11^3637562595^ARCHIVELOG^163462512640
- [ qual11 ] - QUAL11^4014910711^ARCHIVELOG^14071889920

Hi,

I tried your advise and it does work although I need to use nawk on the Solaris 8/9 one.

If you don't mind, may I know what the

Code:

awk '!T[$2]++' FS=^

means? To be more specific the

HTML Code:

'!T[$2]++'

part.

Thanks.

newbie_01

View Public Profile for newbie_01

Find all posts by newbie_01

02-26-2016

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

You know awk works on pattern {action} pairs with the default action being {print}.
!T[$2]++ is a pattern (The logical value of empty or zero strings is FALSE, the logical negation will make it TRUE), which reads
if T[$2] is empty, print, increment T[$2] (so any further references will "fail" = print on first occurrence only).

RudiC

View Public Profile for RudiC

Find all posts by RudiC

UNIX for Dummies Questions & Answers

Any 'shortcut' to doing this search for duplicate and print max

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Print a row with the max number in a column

Discussion started by: coppuca

2. Shell Programming and Scripting

Find Max value of line and print

Discussion started by: ncwxpanther

3. Shell Programming and Scripting

Print root number between min and max ranges

Discussion started by: Ophiuchus

4. Shell Programming and Scripting

Search pattern on logfile and search for day/dates and skip duplicate lines if any

Discussion started by: newbie_01

5. Shell Programming and Scripting

Sum value in a row and print the max

Discussion started by: justbow

6. Shell Programming and Scripting

Median and max of duplicate rows

Discussion started by: ritakadm

7. Shell Programming and Scripting

Print min and max value from two column

Discussion started by: aksin

8. Shell Programming and Scripting

print max number of 2 columns - awk

Discussion started by: quincyjones

9. Shell Programming and Scripting

Search max value in a column in a file instead of sort

Discussion started by: jimmy_y

10. Shell Programming and Scripting

awk to print mon and max values of ranges

Discussion started by: Mudshark