how to delete duplicate rows based on last column

08-25-2009

Registered User

41, 0

Join Date: Jul 2009

Last Activity: 2 June 2010, 7:15 AM EDT

Posts: 41

Thanks Given: 0

Thanked 0 Times in 0 Posts

how to delete duplicate rows based on last column

hii i have a huge amt of data stored in a file.Here in this file i need to remove duplicates rows in such a way that the last column has different data & i must check for greatest among last colmn data & print the largest data along with other entries but just one of other duplicate entries is needed .For example the given file which looks like this
1902 8 22 3 40.0000 77.0000 8.60
1902 8 22 3 40.0000 76.5000 8.20
1902 8 22 3 40.0000 76.5000 8.30
1902 8 22 3 40.0000 77.0000 8.40
1902 8 22 3 39.8000 76.2000 8.10
1902 9 30 6 38.5000 67.0000 7.70
1902 9 30 6 38.5000 67.0000 6.30
1902 10 6 9 36.5000 70.5000 7.20
1902 12 4 22 37.8000 65.5000 4.90

Now i want the output for such a file as below
1902 8 22 3 40.0000 77.0000 8.60
1902 8 22 3 40.0000 76.5000 8.30
1902 8 22 3 39.8000 76.2000 8.10
1902 9 30 6 36.5000 67.0000 7.70
1902 10 6 9 36.5000 70.5000 7.20
1902 12 4 22 37.8000 65.5000 4.90
------

Last edited by reva; 08-25-2009 at 06:17 AM..

reva

View Public Profile for reva

Find all posts by reva

08-25-2009

Registered User

1,170, 106

Join Date: Sep 2008

Last Activity: 10 October 2019, 7:06 AM EDT

Posts: 1,170

Thanks Given: 22

Thanked 106 Times in 101 Posts

Why the o/p have

Code:

1902  8 22  3  40.0000  76.5000 8.20

instead of

Code:

1902  8 22  3  40.0000  76.5000 8.30

although Your condition ( last column value > previous value for the same data ).

panyam

View Public Profile for panyam

Find all posts by panyam

08-25-2009

Registered User

41, 0

Join Date: Jul 2009

Last Activity: 2 June 2010, 7:15 AM EDT

Posts: 41

Thanks Given: 0

Thanked 0 Times in 0 Posts

oh sorry by mistak i wil correct my output a min...

---------- Post updated at 04:18 AM ---------- Previous update was at 04:16 AM ----------

ok now tell me how ....

reva

View Public Profile for reva

Find all posts by reva

08-25-2009

Registered User

1,170, 106

Join Date: Sep 2008

Last Activity: 10 October 2019, 7:06 AM EDT

Posts: 1,170

Thanks Given: 22

Thanked 106 Times in 101 Posts

something like this :

Code:

awk '{ va=$NF;$NF=" "; if ($0 in a) { if (va > a[$0]){a[$0]=va}} else {a[$0]=va}} END { for ( i in a ) print i" "a[i] }'  file_name.txt

need to check further as the order of the elements in associative array is not the same.

panyam

View Public Profile for panyam

Find all posts by panyam

08-25-2009

Registered User

41, 0

Join Date: Jul 2009

Last Activity: 2 June 2010, 7:15 AM EDT

Posts: 41

Thanks Given: 0

Thanked 0 Times in 0 Posts

thanks a lot its working.but first few lines are been deleted in my file...

one more thing for the same data if i need the ouput as

1902 8 22 3 40.0000 77.0000 8.60
1902 9 30 6 38.5000 67.0000 7.70
1902 10 6 9 36.5000 70.5000 7.20
1902 12 4 22 37.8000 65.5000 4.90

that is just check for first 4 columns if its equal & other columns for largest value as shown in above ..

reva

View Public Profile for reva

Find all posts by reva

08-25-2009

Registered User

1,170, 106

Join Date: Sep 2008

Last Activity: 10 October 2019, 7:06 AM EDT

Posts: 1,170

Thanks Given: 22

Thanked 106 Times in 101 Posts

Something like this :

Code:

awk '{ va2=$NF;va1=$(NF-1);va=$(NF-2);$NF=" ";$(NF-1)=" ";$(NF-2)=" ";if ($0 in a) { if (va" "va1" "va2 >a[$0] ){a[$0]=va" "v
a1" "va2" "}} else {a[$0]=va" "va1" "va2}} END { for ( i in a ) print i" "a[i] }'  file_name.txt

As i said already :

need to check further as the order of the elements in associative array is not the same.

panyam

View Public Profile for panyam

Find all posts by panyam

08-25-2009

Registered User

511, 29

Join Date: Sep 2008

Last Activity: 10 November 2015, 2:16 AM EST

Location: In the beautiful World...

Posts: 511

Thanks Given: 10

Thanked 29 Times in 29 Posts

Another way...

For the 1st one...

Code:

 
sort -n +6 infile | awk '{t[$1" "$2" "$3" "$4" "$5" "$6]=$7}END{for (i in t){print i,t[i]}}'

For the 2nd one...

Code:

 
sort -n +4 infile | awk '{t[$1" "$2" "$3" "$4]=$5" "$6" "$7}END{for (i in t){print i,t[i]}}'

malcomex999

View Public Profile for malcomex999

Find all posts by malcomex999

Shell Programming and Scripting

how to delete duplicate rows based on last column

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract and exclude rows based on duplicate values

Discussion started by: CHoggarth

2. Shell Programming and Scripting

Remove duplicate rows based on one column

Discussion started by: clarissab

3. Shell Programming and Scripting

awk to sum a column based on duplicate strings in another column and show split totals

Discussion started by: prashob123

4. UNIX for Dummies Questions & Answers

merging rows into new file based on rows and first column

Discussion started by: A-V

5. Shell Programming and Scripting

Delete duplicate rows

Discussion started by: jacobs.smith

6. UNIX for Dummies Questions & Answers

Remove duplicate rows when >10 based on single column value

Discussion started by: informaticist

7. Ubuntu

delete duplicate rows with awk files

Discussion started by: sashtari

8. UNIX for Dummies Questions & Answers

forming duplicate rows based on value of a key

Discussion started by: ruby_sgp

9. UNIX for Dummies Questions & Answers

Remove duplicate rows of a file based on a value of a column

Discussion started by: risk_sly

10. Shell Programming and Scripting

how to delete duplicate rows in a file

Discussion started by: vamshikrishnab