Need awk script for removing duplicate records Post: 302303638

Sponsored Content

Operating Systems Linux Need awk script for removing duplicate records Post 302303638 by nmumbarkar on Friday 3rd of April 2009 07:00:14 AM

04-03-2009

Registered User

Need awk script for removing duplicate records

I have huge txt file having millions of trade data.
For e.g
Trade.txt (first 8 lines in the file is header info)

Code:

COB_DATE,TRADE_ID,SOURCE_SYSTEM_TRADE_ID,TRADE_GROUP_ID,
TRADE_TYPE,DEALER_NAME,EXTERNAL_COUNTERPARTY_ID,
EXTERNAL_COUNTERPARTY_NAME,DB_COUNTERPARTY_ID,
DB_COUNTERPARTY_NAME,TRADE_DATE,SOURCE_MTM,
SOURCE_MTM_DATE,PAY_RATE,PAY_AMOUNT,PAY_CURRENCY,
RCV_RATE,RCV_AMOUNT,RCV_CURRENCY,CALL_PUT,BUY_SELL,
MATURITY_DATE,STRIKE,UNDERLYING,PREMIUM,PREMIUM_CURRENCY,
MTM_CCY,COUNTRY,,,,MATURITY_DATE,,
30/03/2009,17981-G,MIDAS,,FX Euro Option,,MELLON ADM,MELLON ADM,DBSA,DBSA,26/03/2009,84450.7476,30/03/2009,,4200000,BRL,,,USD,C,B,26/05/2009,139,USD,199061.35,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17980-G,MIDAS,,FX Euro Option,,MELLON ADM,MELLON ADM,DBSA,DBSA,26/03/2009,183108.5122,30/03/2009,,6600000,BRL,,,USD,C,B,26/05/2009,137,USD,374182.77,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17978-G,MIDAS,,FX Euro Option,,QUEST MACRO 30,QUEST MACRO 30,DBSA,DBSA,24/03/2009,-7841.8551,30/03/2009,,-390000,BRL,,,USD,C,S,26/05/2009,139,USD,-20803.77,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17977-G,MIDAS,,FX Euro Option,,ADVANTAGE QUEST,ADVANTAGE QUEST,DBSA,DBSA,24/03/2009,-1709.1223,30/03/2009,,-85000,BRL,,,USD,C,S,26/05/2009,139,USD,-4534.15,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17976-G,MIDAS,,FX Euro Option,,QUEST90 FIM,QUEST90 FIM,DBSA,DBSA,24/03/2009,-9651.514,30/03/2009,,-480000,BRL,,,USD,C,S,26/05/2009,139,USD,-25604.64,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17975-G,MIDAS,,FX Euro Option,,QUESTX FIM,QUESTX FIM,DBSA,DBSA,24/03/2009,-8042.9283,30/03/2009,,-400000,BRL,,,USD,C,S,26/05/2009,139,USD,-21337.2,BRL,BRL,BRZ,,,,26/05/2009,,

In this file I have duplicate records having same trade id(2nd field) but cob dates(1st field) are different .
I want to write awk script that will remove duplicate record(s) (i.e having older cob date) and keep record having latest cob date.
Can anybody please help me?

Last edited by vgersh99; 04-03-2009 at 08:26 AM.. Reason: BB

nmumbarkar

View Public Profile for nmumbarkar

Find all posts by nmumbarkar

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Issues with filtering duplicate records using gawk script

Hi All, I have huge trade file with milions of trades.I need to remove duplicate records (e.g I have following records) 30/10/2009,trdeId1,..,.. 26/10/2009.tradeId1,..,..,, 30/10/2009,tradeId2,.. In the above case i need to filter duplicate recods and I should get following output....

2. Shell Programming and Scripting

Removing duplicate records from 2 files

Can anyone help me to removing duplicate records from 2 separate files in UNIX? Please find the sample records for both the files cat Monday.dat 3FAHP0JA1AR319226MOHMED ATEK 966504453742 SAU2010DE 3LNHL2GC6AR636361HEA DEUK CHOI 821057314531 KOR2010LE 3MEHM0JG7AR652083MUTLAB NAL-NAFISAH...

3. Linux

Need awk script for removing duplicate records

I have log file having Traffic line 2011-05-21 15:11:50.356599 TCP (6), length: 52) 10.10.10.1.3020 > 10.10.10.254.50404: 2011-05-21 15:11:50.652739 TCP (6), length: 52) 10.10.10.254.50404 > 10.10.10.1.3020: 2011-05-21 15:11:50.652558 TCP (6), length: 89) 10.10.10.1.3020 >...

4. Shell Programming and Scripting

Removing duplicate records in a file based on single column

Hi, I want to remove duplicate records including the first line based on column1. For example inputfile(filer.txt): ------------- 1,3000,5000 1,4000,6000 2,4000,600 2,5000,700 3,60000,4000 4,7000,7777 5,999,8888 expected output: ---------------- 3,60000,4000 4,7000,7777...

5. Shell Programming and Scripting

removing duplicate records comparing 2 csv files

Hi All, I want to remove the rows from File1.csv by comparing a column/field in the File2.csv. If both columns matches then I want that row to be deleted from File1 using shell script(awk). Here is an example on what I need. File1.csv: RAJAK,ACTIVE,1 VIJAY,ACTIVE,2 TAHA,ACTIVE,3...

6. Shell Programming and Scripting

Removing duplicate records in a file based on single column explanation

I was reading this thread. It looks like a simpler way to say this is to only keep uniq lines based on field or column 1. https://www.unix.com/shell-programming-scripting/165717-removing-duplicate-records-file-based-single-column.html Can someone explain this command please? How are there no...

7. Shell Programming and Scripting

Help with removing duplicate entries with awk or Perl

Hi, I have a file which looks like:ke this : chr1 11127067 11132181 89 chr1 11128023 11128311 chr1 11130990 11131025 chr1 11127067 11132181 89 chr1 11128023 11128311 chr1 11131583...

8. Homework & Coursework Questions

Script: Removing HTML tags and duplicate lines

Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted! 1. The problem statement, all variables and given/known data: You will write a script that will remove all HTML tags from an HTML document and remove any consecutive...

9. Shell Programming and Scripting

To select non-duplicate records using awk

Friends, I have data sorted on id like this id addressl 1 abc 2 abc 2 abc 2 abc 3 aabc 4 abc 4 abc I want to pick all ids with addressesses leaving out duplicate records. Desired output would be id address 1 abc 2 abc 3 abc 4 abc

10. Shell Programming and Scripting

Removing specific records from files when duplicate key

Hello I have been trying to remove a row from a file which has the same first three columns as another row - I have tried lots of different combinations of suggestion on this forum but can't get it exactly right. what I have is 900 - 1000 = 0 900 - 1000 = 2562 1000 - 1100 = 0 1000 - 1100...

LEARN ABOUT MINIX

roff

is  a  text formatter.	Its input consists of the text to be out-
put, intermixed with formatting commands.  A  formatting  command
is  a  line  containing  the  control character followed by a two
character command name, and possibly one or more arguments.   The
control  character is initially . (dot).  The formatted output is
produced on standard output.  The formatting commands are  listed
below, with being a number, being a character, and being a title.
A + before n means it may be signed,  indicating  a  positive  or
negative change from the current value.  Initial values for where
relevant, are given in parentheses.
  .ad	  Adjust right margin.
  .ar	  Arabic page numbers.
  .br	  Line break.  Subsequent text will begin on a new line.
  .bl n   Insert n blank lines.
  .bp +n  Begin new page and number it n. No n means +1.
  .cc c   Control character is set to c.
  .ce n   Center the next n input lines.
  .de zz  Define a macro called zz. A line with .. ends definition.
  .ds	  Double space the output. Same as .ls 2.
  .ef t   Even page footer title is set to t.
  .eh t   Even page header title is set to t.
  .fi	  Begin filling output lines as full as possible.
  .fo t   Footer titles (even and odd) are set to t.
  .hc c   The character c (e.g., %) tells roff where hyphens are permitted.
  .he t   Header titles (even and odd) are set to t.
  .hx	  Header titles are suppressed.
  .hy n   Hyphenation is done if n is 1, suppressed if it is 0. Default is 1.
  .ig	  Ignore input lines until a line beginning with .. is found.
  .in n   Indent n spaces from the left margin; force line break.
  .ix n   Same as .in but continue filling output on current line.
  .li n   Literal text on next n lines.  Copy to output unmodified.
  .ll +n  Line length (including indent) is set to n (65).
  .ls +n  Line spacing: n (1) is 1 for single spacing, 2 for double, etc.
  .m1 n   Insert n (2) blank lines between top of page and header.
  .m2 n   Insert n (2) blank lines between header and start of text.
  .m3 n   Insert n (1) blank lines between end of text and footer.
  .m4 n   Insert n (3) blank lines between footer and end of page.
  .na	  No adjustment of the right margin.
  .ne n   Need n lines.  If fewer are left, go to next page.
  .nn +n  The next n output lines are not numbered.
  .n1	  Number output lines in left margin starting at 1.
  .n2 n   Number output lines starting at n.  If 0, stop numbering.
  .ni +n  Indent line numbers by n (0) spaces.
  .nf	  No more filling of lines.
  .nx f   Switch input to file f.
  .of t   Odd page footer title is set to t.
  .oh t   Odd page header title is set to t.
  .pa +n  Page adjust by n (1).  Same as .bp
  .pl +n  Paper length is n (66) lines.
  .po +n  Page offset.	Each line is started with n (0) spaces.
  .ro	  Page numbers are printed in Roman numerals.
  .sk n   Skip n pages (i.e., make them blank), starting with next one.
  .sp n   Insert n blank lines, except at top of page.
  .ss	  Single spacing.  Equivalent to .ls 1.
  .ta	  Set tab stops, e.g., .ta 9 17 25 33 41 49 57 65 73 (default).
  .tc c   Tabs are expanded into c.  Default is space.
  .ti n   Indent next line n spaces; then go back to previous indent.
  .tr ab  Translate a into b on output.
  .ul n   Underline the letters and numbers in the next n lines.

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Issues with filtering duplicate records using gawk script

Discussion started by: nmumbarkar

2. Shell Programming and Scripting

Removing duplicate records from 2 files

Discussion started by: zooby

3. Linux

Need awk script for removing duplicate records

Discussion started by: Rastamed

4. Shell Programming and Scripting

Removing duplicate records in a file based on single column

Discussion started by: G.K.K