how to delete duplicate rows based on last column

08-25-2009

Registered User

41, 0

Join Date: Jul 2009

Last Activity: 2 June 2010, 7:15 AM EDT

Posts: 41

Thanks Given: 0

Thanked 0 Times in 0 Posts

Its not exactly working..
To tell
My data has different values in the first column not all are same as i had mentioned in question &
data in my file looks some what lik this

Code:

1900  2  7  0   9.5000  76.5000 4.30
1900  2  7  0   9.5000  76.5000 6.00
1901  2 15  0  26.0000 100.0000 6.00
1901  4 27  0  12.0000  75.0000 5.00
1901  4 17 21  40.0000  71.0000 5.90
1902  4 17 21  40.0000  71.0000 5.90
1902  8 12 17  39.5000  68.5000 6.20
1902  8 22  3  40.0000  77.0000 8.60
1902  8 22  3  40.0000  76.5000 8.20
1902  8 22  3  40.0000  76.5000 8.30
1902  8 22  3  40.0000  77.0000 8.20
1903  8 30 21  37.0000  71.0000 7.70
1904  9 20  6  38.5000  67.0000 6.30

The output which i need is exactly lik this....

Code:

1900  2  7  0   9.5000  76.5000 6.00
1901  2 15  0  26.0000 100.0000 6.00
1901  4 27  0  12.0000  75.0000 5.00
1901  4 17 21  40.0000  71.0000 5.90
1902  8 12 17  39.5000  68.5000 6.20
1902  8 22  3  40.0000  77.0000 8.60
1902  8 22  3  40.0000  76.5000 8.30
1903  8 30 21  37.0000  71.0000 7.70
1904  9 20  6  38.5000  67.0000 6.30

Last edited by vgersh99; 08-25-2009 at 10:21 AM.. Reason: code tags, PLEASE!

reva

View Public Profile for reva

Find all posts by reva

08-25-2009

Moderator

8,825, 1,112

Join Date: Feb 2005

Last Activity: 23 August 2021, 11:26 AM EDT

Location: Foxborough, MA

Posts: 8,825

Thanks Given: 579

Thanked 1,112 Times in 1,003 Posts

To keep the forums high quality for all users, please take the time to format your posts correctly.

First of all, use Code Tags when you post any code or data samples so others can easily read your code. You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags [code] and [/code] by hand.)

Second, avoid adding color or different fonts and font size to your posts. Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.

Third, be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.

Thank You.

The UNIX and Linux Forums

vgersh99

View Public Profile for vgersh99

Find all posts by vgersh99

08-25-2009

Registered User

1,170, 106

Join Date: Sep 2008

Last Activity: 10 October 2019, 7:06 AM EDT

Posts: 1,170

Thanks Given: 22

Thanked 106 Times in 101 Posts

Reva,

it's working properly for me , of course with sort a you can make the sequence in order.

something like this :

Code:

awk '{ va2=$NF;va1=$(NF-1);va=$(NF-2);$NF="";$(NF-1)="";$(NF-2)="";if ($0 in a) { if (va" "va1" "va2 >a[$0] ){a[$0]=va" "va1" "va2}} else {a[$0]=va" "va1" "va2}} END { for ( i in a ) print i" "a[i] }'  file_name.txt | sort +1n

Last edited by panyam; 08-25-2009 at 11:10 AM..

panyam

View Public Profile for panyam

Find all posts by panyam

08-26-2009

Registered User

41, 0

Join Date: Jul 2009

Last Activity: 2 June 2010, 7:15 AM EDT

Posts: 41

Thanks Given: 0

Thanked 0 Times in 0 Posts

ya i will follow from next post...

---------- Post updated 08-26-09 at 04:34 AM ---------- Previous update was 08-25-09 at 08:49 AM ----------

Thanks for the help i got it...

---------- Post updated at 04:45 AM ---------- Previous update was at 04:34 AM ----------

Hiii
now if i have data like shown below.how to sort it out. i mean delete duplicate entries in such a way that it must take the largest value in last column & it must choose a row which has many sets of values in the row.
For example the data in my file is

HTML Code:

1900  2  7  0   9.5000  76.5000 0.00 4.30 0.00 0.00 0.00 4.30
1900  2  7  0  10.8000  76.8000 0.00 6.00 0.00 0.00 0.00 6.00
1901 12  1  0  37.8000  66.0000 0.00 5.00 0.00 0.00 0.00 5.00
1901 12  1  0  37.8000  66.0000 0.00 4.60 3.00 3.50 3.50 4.60
1902  4 17 21  40.0000  71.0000 0.00 5.80 0.00 5.90 5.70 5.90
1902  8 12 17  39.5000  68.5000 0.00 6.00 0.00 6.20 5.90 6.20
1902  8 22  3  40.0000  77.0000 0.00 0.00 0.00 8.00 8.60 8.60
1902  8 22  3  40.0000  76.5000 0.00 0.00 0.00 0.00 8.20 8.20
1902  8 22  3  40.0000  76.5000 0.00 0.00 0.00 0.00 8.30 8.30
1903  5 16  6   5.3600  80.0000 0.00 4.50 0.00 5.00 0.00 5.00
1903  5 16  6   5.3600  80.0000 0.00 4.30 0.00 3.00 0.00 4.30

The output for it is

HTML Code:

1900  2  7  0  10.8000  76.8000 0.00 6.00 0.00 0.00 0.00 6.00
1901 12  1  0  37.8000  66.0000 0.00 4.60 3.00 0.00 3.50 4.60
1902  4 17 21  40.0000  71.0000 0.00 5.80 0.00 5.90 5.70 5.90
1902  8 12 17  39.5000  68.5000 0.00 6.00 0.00 6.20 5.90 6.20
1902  8 22  3  40.0000  77.0000 0.00 0.00 0.00 8.00 8.60 8.60
1903  5 16  6   5.3600  80.0000 0.00 4.50 0.00 5.00 0.00 5.00

Here it removes duplicates & checks for longest row with many values & largest value in last column.
If any one has an idea help me out..

Last edited by reva; 08-26-2009 at 12:06 PM..

reva

View Public Profile for reva

Find all posts by reva

08-26-2009

Registered User

1,170, 106

Join Date: Sep 2008

Last Activity: 10 October 2019, 7:06 AM EDT

Posts: 1,170

Thanks Given: 22

Thanked 106 Times in 101 Posts

From where you got :

Code:

1901 12  1  0  37.8000  66.2000 0.00 4.60 3.00 0.00 3.50 4.60

in the output you mentioned.

I hope with the code that we have given you can try further a bit to achieve your task.

panyam

View Public Profile for panyam

Find all posts by panyam

08-26-2009

Registered User

41, 0

Join Date: Jul 2009

Last Activity: 2 June 2010, 7:15 AM EDT

Posts: 41

Thanks Given: 0

Thanked 0 Times in 0 Posts

ya i have corrected my output just check now once..

Last edited by reva; 09-01-2009 at 12:35 AM..

reva

View Public Profile for reva

Find all posts by reva

09-01-2009

Registered User

41, 0

Join Date: Jul 2009

Last Activity: 2 June 2010, 7:15 AM EDT

Posts: 41

Thanks Given: 0

Thanked 0 Times in 0 Posts

If i have 19 columns & i need to just check duplicates for column 1,2,3,4 & tak the largest value of column 18.Then how to use awk..help me out & try explaining the code also i am very new to unix to tell.
Thanks in advance

reva

View Public Profile for reva

Find all posts by reva

Shell Programming and Scripting

how to delete duplicate rows based on last column

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract and exclude rows based on duplicate values

Discussion started by: CHoggarth

2. Shell Programming and Scripting

Remove duplicate rows based on one column

Discussion started by: clarissab

3. Shell Programming and Scripting

awk to sum a column based on duplicate strings in another column and show split totals

Discussion started by: prashob123

4. UNIX for Dummies Questions & Answers

merging rows into new file based on rows and first column

Discussion started by: A-V

5. Shell Programming and Scripting

Delete duplicate rows

Discussion started by: jacobs.smith

6. UNIX for Dummies Questions & Answers

Remove duplicate rows when >10 based on single column value

Discussion started by: informaticist

7. Ubuntu

delete duplicate rows with awk files

Discussion started by: sashtari

8. UNIX for Dummies Questions & Answers

forming duplicate rows based on value of a key

Discussion started by: ruby_sgp

9. UNIX for Dummies Questions & Answers

Remove duplicate rows of a file based on a value of a column

Discussion started by: risk_sly

10. Shell Programming and Scripting

how to delete duplicate rows in a file

Discussion started by: vamshikrishnab