Removing duplicates except the last occurrence


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Removing duplicates except the last occurrence
# 8  
Old 11-06-2014
Sometimes the order matters.
For example, your input file is
Code:
1
2
3
2
8
5
3
4
5
9
6
7
8
9

# 9  
Old 11-06-2014
Quote:
Originally Posted by MadeInGermany
Sometimes the order matters.
For example, your input file is
Code:
1
2
3
2
8
5
3
4
5
9
6
7
8
9

Then it essentially becomes a funky sort scenario...assuming that the low values and their duplicates always come before the high ones...Smilie
# 10  
Old 11-06-2014
Hi.

Observations, comments.
Quote:
Originally Posted by drl
... If you were to run out of memory, you could use tac file | awk '!($0 in S) {print; S[$0]}' | tac posted by MadeInGermany.
...
This was a wrong assertion on my part. The tac ... tac still saves all the lines in memory (except duplicates).

@shamrock:
I am confused by your comment:
Quote:
...assuming that the low values and their duplicates always come before the high ones...
in my solution the line numbers are added specifically so that the ordering is preserved, as the OP requested. Do you have a non-memory solution that does not do something like that?

Best wishes ... cheers, drl
# 11  
Old 11-06-2014
Quote:
Originally Posted by drl
@shamrock:
I am confused by your comment:
...assuming that the low values and their duplicates always come before the high ones
That means that the input data posted by "MadeInGermany" has a pattern...the low values and their duplicates always come before the high values and their duplicates...like the last "2" is far away from the first "2" but still comes before the last "3"...and since the OP requested that (s)he needs only the last of the dupes so the final output will come out sorted...
Code:
2
3
2
8
5
3

Quote:
Originally Posted by drl
in my solution the line numbers are added specifically so that the ordering is preserved, as the OP requested. Do you have a non-memory solution that does not do something like that?
No I don't...as there is nothing like a non-memory solution in the computing world...data to be processed on disk is first brought into memory before it can be worked on...firstly because the mpu can only address data that is in memory and not on disk and secondly because the operations would be very slow...
# 12  
Old 11-06-2014
Hi, Shamrock.

I think the OP meant that the last value of the duplicates was important because of the position in the file, not because it had some other magical properties, after all, they are duplicates.

Quote:
Originally Posted by MadeInGermany
Sometimes the order matters.
When I wrote about memory, I was referring to all the solutions that kept all of the data in an array (except for the duplicates of course). My solution was different in that nothing was kept in arrays like that, everything was pipelined, so that there would be little or no risk of running out of memory (say with awk's arrays).

Does this makes my comments more clear? ... cheers, drl
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing duplicates from new file

i hav two files like i want to remove/delete all the duplicate lines in file2 which are viz unix,unix2,unix3.I have tried previous post also,but in that complete line must be similar.In this case i have to verify first column only regardless what is the content in succeeding columns. (3 Replies)
Discussion started by: sagar_1986
3 Replies

2. Shell Programming and Scripting

Removing duplicates from new file

i hav two files like i want to remove/delete all the duplicate lines in file2 which are viz unix,unix2,unix3 (2 Replies)
Discussion started by: sagar_1986
2 Replies

3. Shell Programming and Scripting

Help in removing duplicates

I have an input file abc.txt with info like: abcd rateuse inklite robet rateuse abcd I need to remove duplicates from the file (eg: abcd,rateuse) from the file and need to place the contents in same file abc.txt if needed can be placed in another file. can anyone help me in this :( (4 Replies)
Discussion started by: rkrish
4 Replies

4. Emergency UNIX and Linux Support

Removing all the duplicates

i want to remove all the duplictaes in a file.I dont want even a single entry. For the input data: 12345|12|34 12345|13|23 3456|12|90 15670|12|13 12345|10|14 3456|12|13 i need the below data in one file 15670|12|13 and the below data in another file (9 Replies)
Discussion started by: pandeesh
9 Replies

5. Shell Programming and Scripting

Removing duplicates

I have a test file with the following 2 columns: Col 1 | Col 2 T1 | 1 <= remove T5 | 1 T4 | 2 T1 | 3 T3 | 3 T4 | 1 <= remove T1 | 2 <= remove T3 ... (7 Replies)
Discussion started by: gctex
7 Replies

6. UNIX for Advanced & Expert Users

removing duplicates.

Hi All In unix ,we have a file ,there we have to remove the duplicates by using one specific column. Can any body tell me the command. ex: file1 id,name 1,ww 2,qwq 2,asas 3,asa 4,asas 4,asas o/p: 1,ww 2,qwq 3,asa (7 Replies)
Discussion started by: raju4u
7 Replies

7. Shell Programming and Scripting

Removing duplicates

Hi, I have a file in the below format., test test (10) to to (25) see see (45) and i need the output in the format of test 10 to 25 see 45 Some one help me? (6 Replies)
Discussion started by: imdadulla
6 Replies

8. Shell Programming and Scripting

removing duplicates

Hi I have a file that are a list of people & their credentials i recieve frequently The issue is that whne I catnet this list that duplicat entries exists & are NOT CONSECUTIVE (i.e. uniq -1 may not weork here ) I'm trying to write a scrip that will remove duplicate entries the script can... (5 Replies)
Discussion started by: stevie_velvet
5 Replies

9. UNIX for Dummies Questions & Answers

removing duplicates and sort -k

Hello experts, I am trying to remove all lines in a csv file where the 2nd columns is a duplicate. I am try to use sort with the key parameter sort -u -k 2,2 File.csv > Output.csv File.csv File Name|Document Name|Document Title|Organization Word Doc 1.doc|Word Document|Sample... (3 Replies)
Discussion started by: orahi001
3 Replies

10. Shell Programming and Scripting

Removing duplicates

Hi, I've been trying to removed duplicates lines with similar columns in a fixed width file and it's not working. I've search the forum but nothing comes close. I have a sample file: 27147140631203RA CCD * 27147140631203RA PPN * 37147140631207RD AAA 47147140631203RD JNA... (12 Replies)
Discussion started by: giannicello
12 Replies
Login or Register to Ask a Question