Checking file for duplicates

05-20-2010

Registered User

34, 0

Join Date: Aug 2008

Last Activity: 14 April 2016, 5:28 AM EDT

Posts: 34

Thanks Given: 1

Thanked 0 Times in 0 Posts

Fixing this data at source is not currently an option although I agree it would make more sense, I currently have to work with the data I am sent.

A typical example of the data would be

Code:

9999,20-NOV-2009,XXX,YYY,0,LF,BUN,EE,L,14,07-NOV-2009,0,1,0,0,0,0,.003
9999,26-OCT-2009,XXX,YYY,0,GU,BUN,LE,L,42,15-SEP-2009,0,1,0,0,0,.131,.131
6666,24-MAR-2010,AAA,BBB,0,BO,MUB,EE,L,1,24-MAR-2010,0,1,0,0,0,.077,.077

Note that even though each row contains 18 fields it is only the first ten fields which make up the unique key.

I am currently playing with the idea of loading the total sent rows into an oracle table and then loading each file recieved into a temp table allowing me to do some set arithmetic on it to get what I need (SELECT TABLE2 MINUS TABLE1), but if possible I would prefer to do this just using files on the server.

Thanks for your help so far ppl

pxy2d1

View Public Profile for pxy2d1

Find all posts by pxy2d1

05-20-2010

Registered User

3,216, 33

Join Date: Mar 2005

Last Activity: 4 September 2020, 7:11 AM EDT

Location: classification algos

Posts: 3,216

Thanks Given: 19

Thanked 33 Times in 30 Posts

One approach,

use BDB's to store unique records ( based on the set of keys ) and this BDB needed not be loaded to primary memory completely, tie it to the disk and treat as though you are working on hash ( internally it will keep flipping between using primary and secondary memory )

Code:

- before any new record is processed
- form a key from the record with data from the current file
- check if the key is there in the BDB - hash lookup
- if there, record is there already there but probably could have different values, check that
- if not totally new entry, store it as a key value pair

matrixmadhan

View Public Profile for matrixmadhan

Find all posts by matrixmadhan

Shell Programming and Scripting

Checking file for duplicates

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing duplicates from new file

Discussion started by: sagar_1986

2. UNIX for Dummies Questions & Answers

Removing duplicates from a file

Discussion started by: Sri3001

3. UNIX for Dummies Questions & Answers

Remove duplicates from a file

Discussion started by: saga20

4. Programming

[Solved] Removing duplicates from the file and saving as new file

Discussion started by: bala06

5. Shell Programming and Scripting

Remove the partial duplicates by checking the length of a field

Discussion started by: asyed

6. Shell Programming and Scripting

Duplicates in an XML file

Discussion started by: TasosARISFC

7. Shell Programming and Scripting

Removing Duplicates from file

Discussion started by: tinufarid

8. UNIX for Dummies Questions & Answers

CSV file:Find duplicates, save original and duplicate records in a new file

Discussion started by: arvindosu

9. Shell Programming and Scripting

Remove duplicates from a file

Discussion started by: gpaulose

10. UNIX for Dummies Questions & Answers

Avoid Duplicates in a file

Discussion started by: pssandeep