Find lines with matching column 1 value, retain only the one with highest value in column 2


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Find lines with matching column 1 value, retain only the one with highest value in column 2
# 1  
Old 03-05-2013
Find lines with matching column 1 value, retain only the one with highest value in column 2

I have a file like:
Quote:
s_48806 comp48806_c0_seq1 100.0 86 3285 0 2838 2838 2838 -1
s_48825 comp48825_c0_seq1 100.0 22 2793 0 626 626 626 -1
s_48825 comp48825_c1_seq1 100.0 60 2793 0 1683 1683 1683 -1
s_48827 comp48827_c0_seq2 75.8 71 5431 787 3045 4017 3872 -1
s_48827 comp48827_c0_seq5 100.0 40 5431 0 2147 2147 2147 -1
s_48831 comp48831_c0_seq1 73.1 50 2040 237 773 1058 1040 -1
I would like to find lines lines with duplicate values in column 1, and retain only one based on two conditions: 1) keep line with highest value in column 3, 2) if column 3 values are equal, retain the line with the highest value in column 4.

Desired output:
Quote:
s_48806 comp48806_c0_seq1 100.0 86 3285 0 2838 2838 2838 -1
s_48825 comp48825_c1_seq1 100.0 60 2793 0 1683 1683 1683 -1
s_48827 comp48827_c0_seq5 100.0 40 5431 0 2147 2147 2147 -1
s_48831 comp48831_c0_seq1 73.1 50 2040 237 773 1058 1040 -1
I was able to find duplicate lines:
Code:
awk 'NR==FNR{a[$1]++;next;}{ if (a[$1] > 1)print;}' file1 file1

But I can't figure out how to go about filtering based on the criteria I just described.
# 2  
Old 03-05-2013
try:
Code:
awk '
!a[$1]++ {h3[$1]=$3-1; h4[$1]=$4-1}

{ b[$1]=$1;
  if ($3 > h3[$1]) {
     h3[$1]=$3; ol[$1]=$0;
  } else if ($3 == h3[$1]) {
     if ($4 > h4[$1]) {
        h3[$1]=$3; h4[$1]=$4; ol[$1]=$0
     }
  }
}

END { for (i in ol) print ol[i]}
' infile

# 3  
Old 03-05-2013
Assuming that the order of the output records matters:
Code:
awk 'FNR==NR{
 if($1 in a)
 {
  split(a[$1],preva)
  if(($3+0 > preva[3]+0) || (($3+0 == preva[3]+0) && ($4+0 > preva[4]+0)))
   a[$1]=$0
  next
 }
 a[$1]=$0
 next
} !b[$1]++{ print a[$1]}' file file

# 4  
Old 03-06-2013
Code:
$ sort -k1,1 -k3nr -k4nr file | awk ' !arr[$1]++ '
s_48806 comp48806_c0_seq1 100.0 86 3285 0 2838 2838 2838 -1
s_48825 comp48825_c1_seq1 100.0 60 2793 0 1683 1683 1683 -1
s_48827 comp48827_c0_seq5 100.0 40 5431 0 2147 2147 2147 -1
s_48831 comp48831_c0_seq1 73.1 50 2040 237 773 1058 1040 -1

This User Gave Thanks to anbu23 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Matching column value from 2 different file using awk and append value from different column

Hi, I have 2 csv files. a.csv HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-11-ISV3-1(to LAMONGAN_M),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG 027_1287_LAMONGAN_RTN980_IDU1,20LMG027,1+1(HSB),195.675,20LMG011-20LMG027,99.9995,202.6952012... (7 Replies)
Discussion started by: tententen
7 Replies

2. Shell Programming and Scripting

Remove lines matching a substring in a specific column

Dear group, I have following input text file: Brit 2016 11 18 12 00 10 1.485,00 EUR Brit 2016 11 18 12 00 10 142,64 EUR Brit 2016 11 18 12 00 10 19,80 EUR Brit 2016 11 18 12 00 10 545,00 EUR Brit 2016 11 18 12 00 10 6.450,00 EUR... (3 Replies)
Discussion started by: gfhsd
3 Replies

3. Shell Programming and Scripting

Matching column then append to existing File as new column

Good evening I have the below requirements, as I am not an experts in Linux/Unix and am looking for your ideas how I can do this. I have file called file1 and file2. I need to get the second column which is text1_random_alphabets and find that in file 2, if it's exists then print the 3rd... (4 Replies)
Discussion started by: mychbears
4 Replies

4. Shell Programming and Scripting

awk Print New Column For Every Two Lines and Match On Multiple Column Values to print another column

Hi, My input files is like this axis1 0 1 10 axis2 0 1 5 axis1 1 2 -4 axis2 2 3 -3 axis1 3 4 5 axis2 3 4 -1 axis1 4 5 -6 axis2 4 5 1 Now, these are my following tasks 1. Print a first column for every two rows that has the same value followed by a string. 2. Match on the... (3 Replies)
Discussion started by: jacobs.smith
3 Replies

5. UNIX for Dummies Questions & Answers

Awk, highest and lowest value of a column

Hi again! I am still impressed how fast I get a solution for my topic "average specific column value awk" yesterday. The associative arrays in awk work fine for me! But now I have another question for the same project. Now I have a list like this 1 -0.1 1 0 1 0.1 2 0 2 0.2 2 -0.2 How... (10 Replies)
Discussion started by: bjoern456
10 Replies

6. Shell Programming and Scripting

Based on column in file1, find match in file2 and print matching lines

file1: file2: I need to find matches for any lines in file1 that appear in file2. Desired output is '>' plus the file1 term, followed by the line after the match in file2 (so the title is a little misleading): This is honestly beyond what I can do without spending the whole night on it, so I'm... (2 Replies)
Discussion started by: pathunkathunk
2 Replies

7. Shell Programming and Scripting

awk print non matching lines based on column

My item was not answered on previous thread as code given did not work I wanted to print records from file2 where comparing column 1 and 16 for both files find rows where column 16 in file 1 does not match column 16 in file 2 Here was CODE give to issue ~/unix.com$ cat f1... (0 Replies)
Discussion started by: sigh2010
0 Replies

8. UNIX for Dummies Questions & Answers

Removing Lines based on matching first column

I have a file1 that looks like this: File 1 a b b c c e d e and a file 2 that looks like this: File 2 b c e e Note that file 2 is the right hand column from file1. I want to remove any lines from file1 that begin with the column in file2. In this case the desired output... (6 Replies)
Discussion started by: kschiltz55
6 Replies

9. UNIX for Dummies Questions & Answers

Print line with highest value from one column

Hi everyone, This is my first post, but I have already received a lot of help from the forums in the past. Thanks! I've searched the forums and my question is very similar to an earlier post entitled "Printing highest value from one column", which I am apparently not yet allowed to post a... (1 Reply)
Discussion started by: dliving3
1 Replies

10. UNIX for Dummies Questions & Answers

Printing highest value from one column

Hi, I have a file that looks like this: s6 98 s6 91 s6 56 s5 32 s5 10 s5 4 So what I want to do is print only the highest value for each value in the column: So the file will look like this: s6 98 s5 32 Thanks (4 Replies)
Discussion started by: phil_heath
4 Replies
Login or Register to Ask a Question