Sponsored Content
Top Forums UNIX for Dummies Questions & Answers How to remove duplicated based on longest row & largest value in a column Post 302459781 by gpsridhar on Tuesday 5th of October 2010 11:07:50 AM
Old 10-05-2010
I have this sample text file..

E643E32D00AB58B49926B3C9628793E5,907 ,9999,5/1/2004,867 ,12/31/2006,ACT,1,0,1
CA589E9EC9CDBABA560EE6BF77AA4DBE,907 ,8741,7/1/2006,867 ,7/31/2007,ACT,1,0,1
5DBD6FF7877F5F38C62658DA5E460E64,907 ,5141,10/1/2003,867 ,9/30/2008,ACT,1,0,1
DB392456D01E0BDEE374C7BD62C9301F,907 ,4213,7/1/2009,867 ,12/31/9999,ACT,1,0,1
E1D08EF15E28E729D354B2484DDF5DFB,907 ,1014,6/15/2010,809 ,6/15/2010,DEL,500001,0,500001
86487F19E6275AFAC66279077B94FDE3,907 ,1542,6/1/2009,867 ,12/31/9999,ACT,1,0,1
E45B7371EEC0D1AB00E1750B5BC661F7,907 ,5211,1/1/2004,867 ,12/31/2006,ACT,1,0,1
FCBAFE572C5E4BA29B3F8030BD480A94,907 ,6531,1/1/2003,867 ,12/31/2005,ACT,1,0,1
2345AD5D2BFB29C821C1BC3DE8B746A7,907 ,2711,1/1/2004,827 ,1/31/2305,ACT,1,0,1
2345AD5D2BFB29C821C1BC3DE8B746A7,907 ,2711,1/1/2004,867 ,1/31/2005,ACT,1,0,1
F30641D0918E6BD2BA0B13903B3EA012,907 ,1541,5/1/2007,867 ,8/31/2007,ACT,1,0,1
F30641D0918E6BD2BA0B13903B3EA012,907 ,1541,5/1/2007,867 ,8/31/2007,ACT,1,0,1


The last two lines are exact duplicates and the penultimate two lines are duplicates only for my keys which are columns 1 and 2.

when i tried the code provided above modifying it like this

sort -k1,2 f1.txt |sort -mu -k1,2

It just removes the line corresponding to this key F30641D0918E6BD2BA0B13903B3EA012,907

but the lines corresponding to the key 2345AD5D2BFB29C821C1BC3DE8B746A7,907 are not removed.

I do not want to use awk, since i will not be able to reuse it..

The keys might not be fixed.. I will be passing it as a variable..

My reusable code might look like
pk=1,2
sort -k1$pk f1.txt|sort -mu -k$pk

Please help..

Last edited by gpsridhar; 10-05-2010 at 12:25 PM.. Reason: Additional information provided
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

ITERATION: remove row based on string value

It is my first post, hoping to get help from the forum. In a directory, I have 5000 multiple files that contains around 4000 rows with 10 columns in each file containing a unique string 'AT' located at 4th column. OM 3328 O BT 268 5.800 7.500 4.700 0.000 ... (9 Replies)
Discussion started by: asanjuan
9 Replies

2. Shell Programming and Scripting

How to print column based on row number

Hi, I want to print column value based on row number say multiple of 8. Input file: line 1 67 34 line 2 45 57 . . . . . . line 8 12 46 . . . . . . line 16 24 90 . . . . . . line 24 49 67 Output 46 90 67 (2 Replies)
Discussion started by: Surabhi_so_mh
2 Replies

3. Shell Programming and Scripting

duplicate row based on single column

I am a newbie to shell scripting .. I have a .csv file. It has 1000 some rows and about 7 columns... but before I insert this data to a table I have to parse it and clean it ..basing on the value of the first column..which a string of phone number type... example below.. column 1 ... (2 Replies)
Discussion started by: mitr
2 Replies

4. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Given a file such as this I need to remove the duplicates. 00060011 PAUL BOWSTEIN ad_waq3_921_20100826_010517.txt 00060011 PAUL BOWSTEIN ad_waq3_921_20100827_010528.txt 0624-01 RUT CORPORATION ad_sade3_10_20100827_010528.txt 0624-01 RUT CORPORATION ... (13 Replies)
Discussion started by: script_op2a
13 Replies

5. Shell Programming and Scripting

Sort a the file & refine data column & row format

cat file1.txt field1 "user1": field2:"data-cde" field3:"data-pqr" field4:"data-mno" field1 "user1": field2:"data-dcb" field3:"data-mxz" field4:"data-zul" field1 "user2": field2:"data-cqz" field3:"data-xoq" field4:"data-pos" Now i need to have the date like below. i have just... (7 Replies)
Discussion started by: ckaramsetty
7 Replies

6. Shell Programming and Scripting

Deleting a row based on fetched value of column

Hi, I have a file which consists of two columns but the first one can be varying in length like 123456789 0abcd 123456789 0abcd 4015 0 0abcd 5000 0abcd I want to go through the file reading each line, count the number of characters in the first column and delete... (2 Replies)
Discussion started by: swasid
2 Replies

7. Shell Programming and Scripting

Remove duplicates within row and separate column

Hi all I have following kind of input file ESR1 PA156 leflunomide PA450192 leflunomide CHST3 PA26503 docetaxel Pa4586; thalidomide Pa34958; decetaxel docetaxel docetaxel I want to remove duplicates and I want to separate anything before and after PAxxxx entry into columns or... (1 Reply)
Discussion started by: manigrover
1 Replies

8. Shell Programming and Scripting

Find smallest & largest in every column

Dear All, I have input like this, J_15TEST_ASH05_33A22.13885.txt: $$ 1 MAKE SP1501 1 1 4 6101 7392 2 2442 2685 18 3201 4008 20 120 4158 J_15TEST_ASH05_33A22.13885.txt: $$ 1 MAKE SP1502 1 1 4 5125 6416 2 ... (4 Replies)
Discussion started by: attila
4 Replies

9. Shell Programming and Scripting

Trying to remove duplicates based on field and row

I am trying to see if I can use awk to remove duplicates from a file. This is the file: -==> Listvol <== deleting /vol/eng_rmd_0941 deleting /vol/eng_rmd_0943 deleting /vol/eng_rmd_0943 deleting /vol/eng_rmd_1006 deleting /vol/eng_rmd_1012 rearrange /vol/eng_rmd_0943 ... (6 Replies)
Discussion started by: newbie2010
6 Replies

10. Shell Programming and Scripting

How to remove duplicated column in a text file?

Dear all, How can I remove duplicated column in a text file? Input: LG10_PM_map_19_LEnd 1000560 G AA AA AA AA AA GG LG10_PM_map_19_LEnd 1005621 G GG GG GG AA AA GG LG10_PM_map_19_LEnd 1011214 A AA AA AA AA GG GG LG10_PM_map_19_LEnd 1011673 T TT TT TT TT CC CC... (1 Reply)
Discussion started by: huiyee1
1 Replies
All times are GMT -4. The time now is 03:24 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy