deleteing duplicate lines sing uniq while ignoring a column


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers deleteing duplicate lines sing uniq while ignoring a column
# 1  
Old 06-15-2010
deleteing duplicate lines sing uniq while ignoring a column

I have a data set that has 4 columns, I want to know if I can delete duplicate lines while ignoring one of the columns, for example


Code:
10 chr1 ASF 30
15 chr1 ASF 20
5 chr1 ASF 30
6 chr2 EBC 15
4 chr2 EBC 30
...

I want to know if I can delete duplicate lines while ignoring column 1, so the results will look like (I will of course sort, etc before I use uniq)

Code:
10 chr1 ASF 30
15 chr1 ASF 20
6 chr2 EBC 15
4 chr2 EBC 30
...

the 3rd line deleted since the information in column 2, 3, 4 has a duplicate somewhere else in the file.

I know that there's a command with uniq that lets you ignore the first n characters, but I don't have a set n, so I cannot use that. Thanks.

Last edited by Scott; 06-15-2010 at 06:47 PM.. Reason: Please use code tags
# 2  
Old 06-15-2010
A long shot...

Code:
awk '!A[substr($0, index($0, " "))]++' file1 
10 chr1 ASF 30
15 chr1 ASF 20
6 chr2 EBC 15
4 chr2 EBC 30

# 3  
Old 06-15-2010
Or
Code:
awk '!A[$2,$3,$4]++' infile

# 4  
Old 06-15-2010
Code:
sort -t" " -k2 -k3 -k4 -u file

These 2 Users Gave Thanks to anbu23 For This Post:
# 5  
Old 06-15-2010
I think that could be reduced to:
Code:
sort -uk2 infile

Which would also sort in the 2nd, 3rd and 4th key.
Thanks Anbu, I always suspected the -u option might be in relation to the sort keys only as opposed to the whole line.
I wonder if all sort implementations work this way though.

Last edited by Scrutinizer; 06-15-2010 at 07:15 PM..
# 6  
Old 06-16-2010
thanks everyone!
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Find lines with duplicate values in a particular column

I have a file with 5 columns. I want to pull out all records where the value in column 4 is not unique. For example in the sample below, I would want it to print out all lines except for the last two. 40991764 2419 724 47182 Cand A 40992936 3591 724 47182 Cand B 40993016 3671 724 47182 Cand C... (5 Replies)
Discussion started by: kaktus
5 Replies

2. Shell Programming and Scripting

Removing duplicate lines on first column based with pipe delimiter

Hi, I have tried to remove dublicate lines based on first column with pipe delimiter . but i ma not able to get some uniqu lines Command : sort -t'|' -nuk1 file.txt Input : 38376KZ|09/25/15|1.057 38376KZ|09/25/15|1.057 02006YB|09/25/15|0.859 12593PS|09/25/15|2.803... (2 Replies)
Discussion started by: parithi06
2 Replies

3. Shell Programming and Scripting

Remove duplicate lines after ignoring case and spaces between

Oracle Linux 6.5 $ cat someStrings.txt GRANT select on MANHPRD.S_PROD_INT TO OR_PHIL; GRANT select on MANHPRD.S_PROD_INT TO OR_PHIL; GRANT select on SCOTT.emp to JOHN; grant select on scott.emp to john; grant select on scott.dept to hr;If you ignore the case and the empty space between the... (6 Replies)
Discussion started by: kraljic
6 Replies

4. Shell Programming and Scripting

Count duplicate lines ignoring certain columns

I have this structure: col1 col2 col3 col4 col5 27 xxx 38 aaa ttt 2 xxx 38 aaa yyy 1 xxx 38 aaa yyy I need to collapse duplicate lines ignoring column 1 and add values of duplicate lines (col1) so it will look like this: col1 col2 col3 col4 col5 27 xxx 38 aaa ttt ... (3 Replies)
Discussion started by: coppuca
3 Replies

5. UNIX for Dummies Questions & Answers

awk solution to duplicate lines based on column

Hi experts, I have a tab-delimited file with one column containing values separated by a comma. I wish to duplicate the entire line for every value in that comma-delimited field. For example: $cat file 4444 4444 4444 4444 9990 2222,7777 6666 2222 ... (3 Replies)
Discussion started by: torchij
3 Replies

6. UNIX for Dummies Questions & Answers

awk to sum column field from duplicate row/lines

Hello, I am new to Linux environment , I working on Linux script which should send auto email based on the specific condition from log file. Below is the sample log file Name m/c usage abc xxx 10 abc xxx 20 abc xxx 5 xyz ... (6 Replies)
Discussion started by: asjaiswal
6 Replies

7. Shell Programming and Scripting

Replace a column with a value by ignoring the header lines

i have a file in the gz format , the content of the file is as follow. gzcat f1.gz # 1.name # 2.location # 3.age # 4.dob . . . . . . . . . # 43.hobbies < Aravind,33,chennai,09091980, , , , , , , surfing> (5 Replies)
Discussion started by: aravindj80
5 Replies

8. UNIX for Dummies Questions & Answers

[SOLVED] remove lines that have duplicate values in column two

Hi, I've got a file that I'd like to uniquely sort based on column 2 (values in column 2 begin with "comp"). I tried sort -t -nuk2,3 file.txtBut got: sort: multi-character tab `-nuk2,3' "man sort" did not help me out Any pointers? Input: Output: (5 Replies)
Discussion started by: pathunkathunk
5 Replies

9. Shell Programming and Scripting

[uniq + awk?] How to remove duplicate blocks of lines in files?

Hello again, I am wanting to remove all duplicate blocks of XML code in a file. This is an example: input: <string-array name="threeItems"> <item>item1</item> <item>item2</item> <item>item3</item> </string-array> <string-array name="twoItems"> <item>item1</item> <item>item2</item>... (19 Replies)
Discussion started by: raidzero
19 Replies

10. Shell Programming and Scripting

oneliner:sing SED on a specific column

is this possible a one liner sed command. I have a text file ex. from : xxx yyy ZZZ /test/devl/aasdasd.log1 xxx yyy ZZZ /test/devl/aasdasd.log2 to : xxx yyy ZZZ /test/prod/aasdasd.log1 xxx yyy ZZZ /test/prod/aasdasd.log2 and I want to sed only the fourth column sed 's/devl/prod/g' ... (8 Replies)
Discussion started by: chaseeem
8 Replies
Login or Register to Ask a Question