Visit Our UNIX and Linux User Community


Questions on File filtering


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Questions on File filtering
# 1  
Old 11-11-2010
Question Questions on File filtering

I have a file test.txt with the lines below :

Code:
$ cat test.txt
AAA 1
AAA 2
BBB 5
BBB 7
BBB 9
CCC 3
CCC 4
DDD 6
EEE 5

I want to filter the file above to make it have unique rows with the condition that if there are rows with the same value in the first column I want the row with the biggest value in the second column to be displayed.

e.g. There are three rows which has "BBB" in the first column but I want the row with the biggest number in the second column "BBB 9" to be displayed.

My desired result after filtering is the below :

Code:
AAA 2
BBB 9
CCC 4
DDD 6
EEE 5

I have created a ksh script below which gets the above result but when there are many rows it takes very long to process so I would like to get some help in getting the same result with shorter process time.

Code:
#!/bin/ksh

INPUT=$1

sort $INPUT > $$ && mv $$ $INPUT

cut -d' ' -f1 $INPUT | sort | uniq -d > ROWS_WITH_MULTIPLE_VALUES
cut -d' ' -f1 $INPUT | sort | uniq -u > ROWS_WITH_SINGLE_VALUES

if [[ `cat ROWS_WITH_MULTIPLE_VALUES | wc -l` -gt 0 ]] ;
then

while read VALUE_1
do

    grep $VALUE_1 $INPUT | tail -1

done < ROWS_WITH_MULTIPLE_VALUES

fi;

if [[ `cat ROWS_WITH_SINGLE_VALUES | wc -l` -gt 0 ]] ;
then

while read VALUE_1
do

    grep $VALUE_1 $INPUT

done < ROWS_WITH_SINGLE_VALUES

fi;

rm ROWS_WITH_MULTIPLE_VALUES ROWS_WITH_SINGLE_VALUES

Any help will be greatly appreciated.

Cheers
Steve

Last edited by stevefox; 11-11-2010 at 01:33 AM..
# 2  
Old 11-11-2010
Code:
#!/bin/ksh
sort -rk2 $INPUT > temp
awk '!x[$1]++' temp > temp1
sort temp1 > $OUTPUT

# 3  
Old 11-11-2010
Power

anurag.singh,

Your script works great with much shorter process time!!
I'll learn to use awk from next time.
Thank you very much for your help!!!

Cheers
Steve
# 4  
Old 11-11-2010
@anurag:That should be sort -rnk2, no? Otherwise number > 9 would not get represented properly..
Also there is no need for temp files and an extra sort step if you sort like this:
Code:
sort -k1,1 -k2,2rn infile

So in short:
Code:
sort -k1,1 -k2,2rn infile | awk '!x[$1]++'

These 2 Users Gave Thanks to Scrutinizer For This Post:
# 5  
Old 11-11-2010
Thanks Scrutinizer, This is much much better !!
# 6  
Old 11-11-2010
Code:
awk '{a[$1]=(a[$1]>$2)?a[$1]:$2}END{for (i in a) print i,a[i]|"sort"}' test.txt


Previous Thread | Next Thread
Test Your Knowledge in Computers #53
Difficulty: Easy
Computer caches are typically built using EPROMS.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need help with filtering records in a file

Hi, I have following records in a file more file1.txt setting applicaction ABC for user setting applicaction CDE for user setting applicaction XXX for user logging applicaction XXX for user I need to filter out records which have strings " setting... (5 Replies)
Discussion started by: manid
5 Replies

2. Shell Programming and Scripting

Need Help of filtering string from a file.

HI All, We have an Redhat Machine, And some folder with couple simple text files, this files containing a lot of lines with various strings and IP address with different classes. The Requirement in eventually , is to pass the all various IP addresses to Excel. My question is : what is... (4 Replies)
Discussion started by: James Stone
4 Replies

3. Shell Programming and Scripting

Filtering first file columns based on second file column

Hi friends, I have one file like below. (.csv type) SNo,data1,data2 1,1,2 2,2,3 3,3,2 and another file like below. Exclude data1 where Exclude should be treated as column name in file2. I want the output shown below. SNo,data2 1,2 2,3 3,2 Where my data1 column got removed from... (2 Replies)
Discussion started by: ks_reddy
2 Replies

4. UNIX for Dummies Questions & Answers

Filtering records from 1 file based on some manipulation doen on second file

Hi, I am looking for an awk script which should help me to meet the following requirement: File1 has records in following format INF: FAILEd RECORD AB1234 INF: FAILEd RECORD PQ1145 INF: FAILEd RECORD AB3215 INF: FAILEd RECORD AB6114 ............................ (2 Replies)
Discussion started by: mintu41
2 Replies

5. Shell Programming and Scripting

[ask]filtering file to indexing...

dear all, i have file with format like this file_master.txt 20110212|231213|rio|apri|23112|222222 20110212|312311|jaka|dino|31223|543234 20110301|343322|alfan|budi|32131|333311 ... i want filter with output like this index_nm.txt rio|apri jaka|dino ... index_years.txt 20110212... (7 Replies)
Discussion started by: zvtral
7 Replies

6. Shell Programming and Scripting

filtering the log file

Hi i have a log file like example below. I need only one field from below log all other need to be truncated from the log file. 2011-06-13 15:10:53,498 INFO ext.SP->CAL (log point 5) can any body help on this please. Thanks (4 Replies)
Discussion started by: mostwantedkisss
4 Replies

7. UNIX for Dummies Questions & Answers

Filtering a file

I have a list of directories looking something like; /usr/local/1/in /usr/local/1/out /usr/local/1/archive /usr/local/2/in /usr/local/2/out /usr/local/2/archive /usr/local/3/in /usr/local/3/out /usr/local/3/archive Is there a way I can filter the out and archive directories so I... (5 Replies)
Discussion started by: JayC89
5 Replies

8. UNIX for Dummies Questions & Answers

Filtering Log file

Hi, Iam trying to filter a log file in the below format |fffff|hhhhh|ffff|dd|mm|yy|hh|min||dd|mm|yy|hh|min the first set of |dd|mm|yy|hh|min is when the application ran the second set of |dd|mm|yy|hh|min when it ended. I will be removing the last of the months in the log file to... (1 Reply)
Discussion started by: baanprog
1 Replies

9. UNIX for Dummies Questions & Answers

problem in filtering the file

-------------------------------------------------------------------------------- Hi, Plz help me out with this. I have some requirement like this..... I have a file like this... * CS sent email (11.20) CALYPSO 1031276 9076673 CDSHY FAILED Nov 19 2007 7:28AM OASYS: Unable to find CUSTOMER... (0 Replies)
Discussion started by: adityam
0 Replies

10. Shell Programming and Scripting

Urgent: Filtering a File

Hi all I need to write a small shell script, where we have one Log file and another File 1 containing some tags in it. My log file can have multiple tags in it which can be other than the ones that are part of File 1. So I need to write a script that will run and test whether the tags... (5 Replies)
Discussion started by: HItesh
5 Replies

Featured Tech Videos