Remove somewhat Duplicate records from a flat file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove somewhat Duplicate records from a flat file
# 1  
Old 09-29-2011
Tools Remove somewhat Duplicate records from a flat file

I have a flat file that contains records similar to the following two lines;


Code:
1984/11/08            7 700000 123456789 2
1984/11/08 1941/05/19 7 700000 123456789 2


The 123456789 2 represents an account number, this is how I identify the duplicate record.

The ### signs represent blank spaces in the file. This thread keeps stripping them out.

As you can see the second line has a second date in it. This is the line I need to KEEP and need to REMOVE the line before it.

How can I find these situations and then remove the first record?

Thanks for any help.

Jeff
Moderator's Comments:
Mod Comment
Please use code tags when posting data and code samples!

Last edited by vgersh99; 09-29-2011 at 12:12 PM.. Reason: code tags, please!
# 2  
Old 09-29-2011
so whats the key? 123456789?

--ahamed
# 3  
Old 09-29-2011
It keeps stripping them out because you're not putting them in code tags.

It could be as simple as
Code:
awk 'NF>5' < infile > outfile

to exclude all records with less than 6 fields.

Or, if some 'short' fields do NOT have duplicates, then:

Code:
awk '{if(NF == 6)  {        K=$1 $3 $4 $5 $6;        }
        else           {        K=$1 $2 $3 $4 $5; }

        if(!L[K]) { O[N++]=K; L[K]=$0; }
        else if(length(L[K]) < length($0)) L[K]=$0; }
END { for(M=0; M<N; M++) print L[O[M]]; }' < data

# 4  
Old 09-29-2011
Code:
nawk '{a[$(NF-1),$NF]=$0}END {for (i in a) print a[i]}' myFile

# 5  
Old 09-29-2011
Code:
$
$ cat f29
1984/11/08            7 700000 123456789 2
1984/11/08 1941/05/19 7 700000 123456789 2
1999/06/08            8 800000 234567891 5
1999/06/08 1956/11/23 8 800000 234567891 5
$
$ # print only those lines that have 5 fields
$
$ perl -lane 'print if $#F==5' f29
1984/11/08 1941/05/19 7 700000 123456789 2
1999/06/08 1956/11/23 8 800000 234567891 5
$
$ # print only those lines that do have 2 dates at the beginning
$
$ perl -lne 'print if /^(\s*\d{4}\/\d\d\/\d\d){2}/' f29
1984/11/08 1941/05/19 7 700000 123456789 2
1999/06/08 1956/11/23 8 800000 234567891 5
$
$ # If the file is fixed-format, then you could try the following two approaches
$ # print only those lines whose column positions 12 through 21 are not blank spaces
$
$ perl -lne 'print if substr($_,11,10) !~ /^\s+$/' f29
1984/11/08 1941/05/19 7 700000 123456789 2
1999/06/08 1956/11/23 8 800000 234567891 5
$
$ # print only those lines whose column position 12 is not a blank space
$
$ perl -lne 'print if substr($_,11,1) ne " "' f29
1984/11/08 1941/05/19 7 700000 123456789 2
1999/06/08 1956/11/23 8 800000 234567891 5
$
$

tyler_durden
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicate records

Hi, i am working on a script that would remove records or lines in a flat file. The only difference in the file is the "NOT NULL" word. Please see below example of the input file. INPUT FILE:> CREATE a ( TRIAL_CLIENT NOT NULL VARCHAR2(60), TRIAL_FUND NOT NULL... (3 Replies)
Discussion started by: reignangel2003
3 Replies

2. Shell Programming and Scripting

Deleting duplicate records from file 1 if records from file 2 match

I have 2 files "File 1" is delimited by ";" and "File 2" is delimited by "|". File 1 below (3 record shown): Doc1;03/01/2012;New York;6 Main Street;Mr. Smith 1;Mr. Jones Doc2;03/01/2012;Syracuse;876 Broadway;John Davis;Barbara Lull Doc3;03/01/2012;Buffalo;779 Old Windy Road;Charles... (2 Replies)
Discussion started by: vestport
2 Replies

3. Shell Programming and Scripting

Remove Duplicate Records

Hi frinds, Need your help. item , color ,desc ==== ======= ==== 1,red ,abc 1,red , a b c 2,blue,x 3,black,y 4,brown,xv 4,brown,x v 4,brown, x v I have to elemnet the duplicate rows on the basis of item. the final out put will be 1,red ,abc (6 Replies)
Discussion started by: imipsita.rath
6 Replies

4. Shell Programming and Scripting

Remove duplicate records

I want to remove the records based on duplicate. I want to remove if two or more records exists with combination fields. Those records should not come once also file abc.txt ABC;123;XYB;HELLO; ABC;123;HKL;HELLO; CDE;123;LLKJ;HELLO; ABC;123;LSDK;HELLO; CDF;344;SLK;TEST key fields are... (7 Replies)
Discussion started by: svenkatareddy
7 Replies

5. Shell Programming and Scripting

How do I load records from table to a flat file!

Hi All, I need to load records from oracle table XYZ to a flat file say ABC.dat. could any one tell me how do i do this in UNXI, Regards Ann (1 Reply)
Discussion started by: Haque123
1 Replies

6. Shell Programming and Scripting

Sorting the records in the Flat file

Hi all, I am using this command "sort -d -u -k1 IMSTEST.74E -o tmp.txt" to the records in the flat. Can any tell me how to sort the file except first line in the file For ex: i/p First line: DXYZ Second line : jumy third : cmhk fourth : andy Output should... (5 Replies)
Discussion started by: sudhir_barker
5 Replies

7. Shell Programming and Scripting

How to remove duplicate records with out sort

Can any one give me command How to delete duplicate records with out sort. Suppose if the records like below: 345,bcd,789 123,abc,456 234,abc,456 712,bcd,789 out tput should be 345,bcd,789 123,abc,456 Key for the records is 2nd and 3rd fields.fields are seperated by colon(,). (19 Replies)
Discussion started by: svenkatareddy
19 Replies

8. Solaris

How to remove duplicate records with out sort

Can any one give me command How to delete duplicate records with out sort. Suppose if the records like below: 345,bcd,789 123,abc,456 234,abc,456 712,bcd,789 out tput should be 345,bcd,789 123,abc,456 Key for the records is 2nd and 3rd fields.fields are seperated by colon(,). (2 Replies)
Discussion started by: svenkatareddy
2 Replies

9. Shell Programming and Scripting

Remove all instances of duplicate records from the file

Hi experts, I am new to scripting. I have a requirement as below. File1: A|123|NAME1 A|123|NAME2 B|123|NAME3 File2: C|123|NAME4 C|123|NAME5 D|123|NAME6 1) I have 2 merge both the files. 2) need to do a sort ( key fields are first and second field) 3) remove all the instances... (3 Replies)
Discussion started by: vukkusila
3 Replies

10. Shell Programming and Scripting

Inserting records from flat file to db table

I have 20000 numbers present in a file in each line like 25663, 65465, 74579, 56446, .. .. I have created a table in db with single number column in it. create table testhari (no number(9)); I want to insert all these numbers into that table. how can i do it? can anybody please... (4 Replies)
Discussion started by: Hara
4 Replies
Login or Register to Ask a Question