fastest way to remove duplicates.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting fastest way to remove duplicates.
# 8  
Old 06-24-2005
It's equivalent to uniq, so it won't help you.
If your data is in fact already sorted then just use `uniq` instead of `sort -u`
# 9  
Old 06-24-2005
No, my data is not sorted.
# 10  
Old 06-24-2005
MySQL

The best possible approach will be push all the data in oracle using sqlloader.
Create index on the fly for the key u want unique.
And fire query to get the unique records.

Any better alternatives?
# 11  
Old 06-24-2005
I am not sure if I want to reload all that data again into another table and .....

As I am pulling data from a table using select * from table name into a text file and then doing sort -u file1 > file2.

Although, I could try doing a select distinct columns from the table.... and see if it will take more time than it took my original approach. Is it worth trying? I don't know.

I just don't have the luxury of trying different options at my will as it is a production database unless I know it's worth trying.
# 12  
Old 06-24-2005
It's already in a database!
Just do add a sort by in the select clause and
index the appropriate fields.
# 13  
Old 06-24-2005
MySQL

Definetly its worth a try.

Precautions u can take are:

1. Make sure all distinct columns are indexed.
2. If it is one table, then u need not worry about joins...else make sure the joins are in such a way that you get maximum throughput instead of least response time
3. Run the query at such a time when no other big activity is going on in same table, bcos if query will be long...it can give rollback segmetn too old error.

All the best.
# 14  
Old 06-24-2005
Sorry for reply back ....

>> Hi Amit,



>> sed '$!N; /^\(.*\)\n\1$/!P; D'

>> Could you explain the command - bit by bit if you don't mind.

>> Thanks!

I think u can refer the man page of sed and look for sed Addresses

I think the topic is self explainatory...

BTW ...

I tested this command with more than 1GB file.

it took about 13 min to sort that file. Much Much Faster than sort command.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicates

Hi I have a below file structure. 200,1245,E1,1,E1,,7611068,KWH,30, ,,,,,,,, 200,1245,E1,1,E1,,7611070,KWH,30, ,,,,,,,, 300,20140223,0.001,0.001,0.001,0.001,0.001 300,20140224,0.001,0.001,0.001,0.001,0.001 300,20140225,0.001,0.001,0.001,0.001,0.001 300,20140226,0.001,0.001,0.001,0.001,0.001... (1 Reply)
Discussion started by: tejashavele
1 Replies

2. Shell Programming and Scripting

Remove duplicates

I have a file with the following format: fields seperated by "|" title1|something class|long...content1|keys title2|somhing class|log...content1|kes title1|sothing class|lon...content1|kes title3|shing cls|log...content1|ks I want to remove all duplicates with the same "title field"(the... (3 Replies)
Discussion started by: dtdt
3 Replies

3. Shell Programming and Scripting

bash - remove duplicates

I need to use a bash script to remove duplicate files from a download list, but I cannot use uniq because the urls are different. I need to go from this: http://***/fae78fe/file1.wmv http://***/39du7si/file1.wmv http://***/d8el2hd/file2.wmv http://***/h893js3/file2.wmv to this: ... (2 Replies)
Discussion started by: locoroco
2 Replies

4. Shell Programming and Scripting

Fastest way to delete duplicates from a large filelist.....

OK I have two filelists...... The first is formatted like this.... /path/to/the/actual/file/location/filename.jpg and has up to a million records The second list shows filename.jpg where there is more then on instance. and has maybe up to 65,000 records I want to copy files... (4 Replies)
Discussion started by: Bashingaway
4 Replies

5. Shell Programming and Scripting

remove duplicates and sort

Hi, I'm using the below command to sort and remove duplicates in a file. But, i need to make this applied to the same file instead of directing it to another. Thanks (6 Replies)
Discussion started by: dvah
6 Replies

6. Shell Programming and Scripting

Script to remove duplicates

Hi I need a script that removes the duplicate records and write it to a new file for example I have a file named test.txt and it looks like abcd.23 abcd.24 abcd.25 qwer.25 qwer.26 qwer.98 I want to pick only $1 and compare with the next record and the output should be abcd.23... (6 Replies)
Discussion started by: antointoronto
6 Replies

7. Shell Programming and Scripting

Remove duplicates from a file

Hi, I need to remove duplicates from a file. The file will be like this 0003 10101 20100120 abcdefghi 0003 10101 20100121 abcdefghi 0003 10101 20100122 abcdefghi 0003 10102 20100120 abcdefghi 0003 10103 20100120 abcdefghi 0003 10103 20100121 abcdefghi Here if the first colum and... (6 Replies)
Discussion started by: gpaulose
6 Replies

8. Shell Programming and Scripting

Remove duplicates

Hello Experts, I have two files named old and new. Below are my example files. I need to compare and print the records that only exist in my new file. I tried the below awk script, this script works perfectly well if the records have exact match, the issue I have is my old file has got extra... (4 Replies)
Discussion started by: forumthreads
4 Replies

9. UNIX for Dummies Questions & Answers

How to remove duplicates without sorting

Hello, I can remove duplicate entries in a file by: sort File1 | uniq > File2 but how can I remove duplicates without sorting the file? I tried cat File1 | uniq > File2 but it doesn't work thanks (4 Replies)
Discussion started by: orahi001
4 Replies

10. Shell Programming and Scripting

how to delete/remove directory in fastest way

hello i need help to remove directory . The directory is not empty ., it contains several sub directories and files inside that.. total number of files in one directory is 12,24,446 . rm -rf doesnt work . it is prompting for every file .. i want to delete without prompting and... (6 Replies)
Discussion started by: getdpg
6 Replies
Login or Register to Ask a Question