Remove duplicate records


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove duplicate records
# 1  
Old 08-30-2013
Remove duplicate records

Hi,

i am working on a script that would remove records or lines in a flat file. The only difference in the file is the "NOT NULL" word. Please see below example of the input file.

INPUT FILE:>
Code:
CREATE a
(
TRIAL_CLIENT              NOT NULL VARCHAR2(60),
TRIAL_FUND                NOT NULL VARCHAR2(60),
LOCAL_ACC_NO              NOT NULL VARCHAR2(60),
TRIAL_BROKER              NOT NULL VARCHAR2(12),
CORR_ACC_NO               NOT NULL NUMBER(10),
CURRENCY                  NOT NULL VARCHAR2(3),
AS_OF_DATE                NOT NULL DATE,
COUNT_OUR_TRANSACTIONS             NUMBER,
SUM_OUR_POSITIONS                  NUMBER,
SUM_OUR_CASH_TXNS                  NUMBER,
SUM_OUR_TRANSACTIONS               NUMBER,
COUNT_BROKER_TRANSACTIONS          NUMBER,
SUM_BROKER_POSITIONS               NUMBER,
SUM_BROKER_CASH_TXNS               NUMBER,
SUM_BROKER_TRANSACTIONS            NUMBER,
SUM_OUR_CASH_BALS                  NUMBER,
SUM_OUR_UNREAL_BALS                NUMBER,
SUM_OUR_BALANCES                   NUMBER,
SUM_BROKER_CASH_BALS               NUMBER,
SUM_BROKER_UNREAL_BALS             NUMBER,
SUM_UNSETT_INT                     NUMBER,
SUM_OPEN_FWDS                      NUMBER,
SUM_BROKER_BALANCES                NUMBER,
NET_TRANSACTIONS                   NUMBER,
NET_BALANCES                       NUMBER,
TRIAL                              NUMBER,
CASH_ADJ                           NUMBER,
WO_AMT                             NUMBER,
ITEMS                              NUMBER
);

CREATE b
(
TRIAL_CLIENT               VARCHAR2(60) NOT NULL,
TRIAL_FUND                 VARCHAR2(60) NOT NULL,
LOCAL_ACC_NO               VARCHAR2(60) NOT NULL,
TRIAL_BROKER               VARCHAR2(12) NOT NULL,
CORR_ACC_NO                NUMBER(10)   NOT NULL,
CURRENCY                   VARCHAR2(3)  NOT NULL,
AS_OF_DATE                 DATE         NOT NULL,
COUNT_OUR_TRANSACTIONS     NUMBER,
SUM_OUR_POSITIONS          NUMBER,
SUM_OUR_CASH_TXNS          NUMBER,
SUM_OUR_TRANSACTIONS       NUMBER,
COUNT_BROKER_TRANSACTIONS  NUMBER,
SUM_BROKER_POSITIONS       NUMBER,
SUM_BROKER_CASH_TXNS       NUMBER,
SUM_BROKER_TRANSACTIONS    NUMBER,
SUM_OUR_CASH_BALS          NUMBER,
SUM_OUR_UNREAL_BALS        NUMBER,
SUM_OUR_BALANCES           NUMBER,
SUM_BROKER_CASH_BALS       NUMBER,
SUM_BROKER_UNREAL_BALS     NUMBER,
SUM_UNSETT_INT             NUMBER,
SUM_OPEN_FWDS              NUMBER,
SUM_BROKER_BALANCES        NUMBER,
NET_TRANSACTIONS           NUMBER,
NET_BALANCES               NUMBER,
TRIAL                      NUMBER,
CASH_ADJ                   NUMBER,
WO_AMT                     NUMBER,
ITEMS                      NUMBER
);

CREATE c
(
TRIAL_CLIENT  VARCHAR2(60) NOT NULL,
TRIAL_FUND    VARCHAR2(60) NOT NULL,
LOCAL_ACC_NO  VARCHAR2(60) NOT NULL,
TRIAL_BROKER  VARCHAR2(12) NOT NULL,
CORR_ACC_NO   NUMBER(10)   NOT NULL,
CURRENCY      VARCHAR2(3)  NOT NULL,
VALUE_DATE    DATE         NOT NULL
);

OUTPUT:
Code:
CREATE a
(
TRIAL_CLIENT               VARCHAR2(60) NOT NULL,
TRIAL_FUND                 VARCHAR2(60) NOT NULL,
LOCAL_ACC_NO               VARCHAR2(60) NOT NULL,
TRIAL_BROKER               VARCHAR2(12) NOT NULL,
CORR_ACC_NO                NUMBER(10)   NOT NULL,
CURRENCY                   VARCHAR2(3)  NOT NULL,
AS_OF_DATE                 DATE         NOT NULL,
COUNT_OUR_TRANSACTIONS     NUMBER,
SUM_OUR_POSITIONS          NUMBER,
SUM_OUR_CASH_TXNS          NUMBER,
SUM_OUR_TRANSACTIONS       NUMBER,
COUNT_BROKER_TRANSACTIONS  NUMBER,
SUM_BROKER_POSITIONS       NUMBER,
SUM_BROKER_CASH_TXNS       NUMBER,
SUM_BROKER_TRANSACTIONS    NUMBER,
SUM_OUR_CASH_BALS          NUMBER,
SUM_OUR_UNREAL_BALS        NUMBER,
SUM_OUR_BALANCES           NUMBER,
SUM_BROKER_CASH_BALS       NUMBER,
SUM_BROKER_UNREAL_BALS     NUMBER,
SUM_UNSETT_INT             NUMBER,
SUM_OPEN_FWDS              NUMBER,
SUM_BROKER_BALANCES        NUMBER,
NET_TRANSACTIONS           NUMBER,
NET_BALANCES               NUMBER,
TRIAL                      NUMBER,
CASH_ADJ                   NUMBER,
WO_AMT                     NUMBER,
ITEMS                      NUMBER
);

CREATE c
(
TRIAL_CLIENT  VARCHAR2(60) NOT NULL,
TRIAL_FUND    VARCHAR2(60) NOT NULL,
LOCAL_ACC_NO  VARCHAR2(60) NOT NULL,
TRIAL_BROKER  VARCHAR2(12) NOT NULL,
CORR_ACC_NO   NUMBER(10)   NOT NULL,
CURRENCY      VARCHAR2(3)  NOT NULL,
VALUE_DATE    DATE         NOT NULL
);

as you can see , based from the output file, the one with the TRIAL_CLIENT NOT NULL VARCHAR2(60), ... etc where remove from the output file.

Thanks,
# 2  
Old 08-30-2013
And what code did you use to transform your sample input into your sample output?

The title of this thread says you want to delete duplicate records. What constitutes a record? (A line, a create x {...} where x is the same in both records, or what???)

You have shown us your input file and you have shown us what seems to be incorrect output that you're getting from your code. What output do you want?
# 3  
Old 08-30-2013
am using bash shell below is just the part of the code i am using. the script needs a parameter for it to run.

Code:
Infile=$1
NotFinalOutFile=`echo $Infile | awk -F '.' '{print $1}'`

for files in `echo $Infile`
do
   sed 's/[ \t]*$//' $files | \
   sed 's/^$/)\n/' | egrep -vw 'Name|-' | \
   sed -e 's/desc/CREATE TABLE/;/CREATE / a\(' | \
   awk '{
         sub(/^\)/,"&;");                                       # Replace  ")" with ");"
         s=($0~/^[A-Z]/&& a~/^[A-Z]/ && !/"^CREATE "/)?",":"";  # IF this line and previous line start with "A-Z" and not "CREATE" set "s" to "," else set it to ""
         printf s"\n%s",                                        # Print this line "$0" using "s" as formating pluss new line
         a=$0}                                                      # Set a=this line
   END {
         print ""}'                                             # Print a new line
done >> $NotFinalOutFile.dat

---------- Post updated at 03:01 PM ---------- Previous update was at 02:48 PM ----------

actually, the script above just creates the create table statement.

---------- Post updated at 03:04 PM ---------- Previous update was at 03:01 PM ----------

below is the sample input file for the script:

Code:
desc a
Name         Null     Type
------------ -------- ------------
TRIAL_CLIENT NOT NULL VARCHAR2(60)
TRIAL_FUND   NOT NULL VARCHAR2(60)
LOCAL_ACC_NO NOT NULL VARCHAR2(60)
TRIAL_BROKER NOT NULL VARCHAR2(12)
CORR_ACC_NO  NOT NULL NUMBER(10)
CURRENCY     NOT NULL VARCHAR2(3)
VALUE_DATE   NOT NULL DATE

desc b
Name                      Null     Type
------------------------- -------- ------------
TRIAL_CLIENT              NOT NULL VARCHAR2(60)
TRIAL_FUND                NOT NULL VARCHAR2(60)
LOCAL_ACC_NO              NOT NULL VARCHAR2(60)
TRIAL_BROKER              NOT NULL VARCHAR2(12)
CORR_ACC_NO               NOT NULL NUMBER(10)
CURRENCY                  NOT NULL VARCHAR2(3)
AS_OF_DATE                NOT NULL DATE
SUM_OUR_CASH_TXNS                  NUMBER
SUM_OUR_POSITIONS                  NUMBER
COUNT_OUR_TRANSACTIONS             NUMBER
SUM_OUR_CASH_BALS                  NUMBER
SUM_OUR_UNREAL_BALS                NUMBER
SUM_BROKER_CASH_TXNS               NUMBER
SUM_BROKER_POSITIONS               NUMBER
COUNT_BROKER_TRANSACTIONS          NUMBER
SUM_BROKER_CASH_BALS               NUMBER
SUM_BROKER_UNREAL_BALS             NUMBER
SUM_UNSETT_INT                     NUMBER
SUM_OPEN_FWDS                      NUMBER
WO_AMT                             NUMBER

# 4  
Old 08-30-2013
I repeat:
Quote:
The title of this thread says you want to delete duplicate records. What constitutes a record? (A line, a create x {...} where x is the same in both records, or what???)

You have shown us your input file and you have shown us what seems to be incorrect output that you're getting from your code. What output do you want?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Duplicate records

Gents, Please give a help file --BAD STATUS NOT RESHOOTED-- *** VP 41255/51341 in sw 2973 *** VP 41679/51521 in sw 2973 *** VP 41687/51653 in sw 2973 *** VP 41719/51629 in sw 2976 --BAD COG NOT RESHOOTED-- *** VP 41689/51497 in sw 2974 *** VP 41699/51677 in sw 2974 *** VP... (18 Replies)
Discussion started by: jiam912
18 Replies

2. Shell Programming and Scripting

Deleting duplicate records from file 1 if records from file 2 match

I have 2 files "File 1" is delimited by ";" and "File 2" is delimited by "|". File 1 below (3 record shown): Doc1;03/01/2012;New York;6 Main Street;Mr. Smith 1;Mr. Jones Doc2;03/01/2012;Syracuse;876 Broadway;John Davis;Barbara Lull Doc3;03/01/2012;Buffalo;779 Old Windy Road;Charles... (2 Replies)
Discussion started by: vestport
2 Replies

3. UNIX for Dummies Questions & Answers

Need to keep duplicate records

Consider my input is 10 10 20 then, uniq -u will give 20 and uniq -dwill return 10. But i need the output as , 10 10 How we can achieve this? Thanks (4 Replies)
Discussion started by: pandeesh
4 Replies

4. Shell Programming and Scripting

Remove somewhat Duplicate records from a flat file

I have a flat file that contains records similar to the following two lines; 1984/11/08 7 700000 123456789 2 1984/11/08 1941/05/19 7 700000 123456789 2 The 123456789 2 represents an account number, this is how I identify the duplicate record. The ### signs represent... (4 Replies)
Discussion started by: jolney
4 Replies

5. Shell Programming and Scripting

Remove Duplicate Records

Hi frinds, Need your help. item , color ,desc ==== ======= ==== 1,red ,abc 1,red , a b c 2,blue,x 3,black,y 4,brown,xv 4,brown,x v 4,brown, x v I have to elemnet the duplicate rows on the basis of item. the final out put will be 1,red ,abc (6 Replies)
Discussion started by: imipsita.rath
6 Replies

6. Shell Programming and Scripting

Remove duplicate records

I want to remove the records based on duplicate. I want to remove if two or more records exists with combination fields. Those records should not come once also file abc.txt ABC;123;XYB;HELLO; ABC;123;HKL;HELLO; CDE;123;LLKJ;HELLO; ABC;123;LSDK;HELLO; CDF;344;SLK;TEST key fields are... (7 Replies)
Discussion started by: svenkatareddy
7 Replies

7. Shell Programming and Scripting

How to remove duplicate records with out sort

Can any one give me command How to delete duplicate records with out sort. Suppose if the records like below: 345,bcd,789 123,abc,456 234,abc,456 712,bcd,789 out tput should be 345,bcd,789 123,abc,456 Key for the records is 2nd and 3rd fields.fields are seperated by colon(,). (19 Replies)
Discussion started by: svenkatareddy
19 Replies

8. Solaris

How to remove duplicate records with out sort

Can any one give me command How to delete duplicate records with out sort. Suppose if the records like below: 345,bcd,789 123,abc,456 234,abc,456 712,bcd,789 out tput should be 345,bcd,789 123,abc,456 Key for the records is 2nd and 3rd fields.fields are seperated by colon(,). (2 Replies)
Discussion started by: svenkatareddy
2 Replies

9. Shell Programming and Scripting

Remove all instances of duplicate records from the file

Hi experts, I am new to scripting. I have a requirement as below. File1: A|123|NAME1 A|123|NAME2 B|123|NAME3 File2: C|123|NAME4 C|123|NAME5 D|123|NAME6 1) I have 2 merge both the files. 2) need to do a sort ( key fields are first and second field) 3) remove all the instances... (3 Replies)
Discussion started by: vukkusila
3 Replies

10. Shell Programming and Scripting

Records Duplicate

Hi Everyone, I have a flat file of 1000 unique records like following : For eg Andy,Flower,201-987-0000,12/23/01 Andrew,Smith,101-387-3400,11/12/01 Ani,Ross,401-757-8640,10/4/01 Rich,Finny,245-308-0000,2/27/06 Craig,Ford,842-094-8740,1/3/04 . . . . . . Now I want to duplicate... (9 Replies)
Discussion started by: ganesh123
9 Replies
Login or Register to Ask a Question