Sponsored Content
Top Forums Shell Programming and Scripting Issues with filtering duplicate records using gawk script Post 302305842 by nmumbarkar on Thursday 9th of April 2009 10:24:48 PM
Old 04-09-2009
Issues with filtering duplicate records using gawk script

Hi All,

I have huge trade file with milions of trades.I need to remove duplicate records (e.g I have following records)
30/10/2009,trdeId1,..,..
26/10/2009.tradeId1,..,..,,
30/10/2009,tradeId2,..

In the above case i need to filter duplicate recods and I should get following output.
30/10/2009,trdeId1,..,..
30/10/2009,tradeId2,..
(trade record with latest COB date)
COB -closed of business day

I need to handle following three conditions.
1.Trade file will be sorted in ascending order on first two columns(COB date and trade id)
2.Trade file will be sorted in descending order on first two columns(COB date and trade id)
3.Trade file may not have duplicate records.
In all the above condition my code should work.

I have written following code.but it doen't seems to be working.As i m new to awk can anybody help me in getting this.

#!/usr/bin/gawk
BEGIN {
FS = ","
}
END {
print prevLine;
}
{
if( FNR == 1)
{
prevDate=$1;
prevSourceTradeId=$2;
prevLine=$0;
}
else
{
if(prevSourceTradeId==$2)
{
if((compareDate(prevDate,$1) == 1))
{
print prevLine;
flag=true
}
else
{
prevDate=$1;
prevLine=$0;
prevSourceTradeId=$2;
print prevLine;
flag=true
}
}
else
{
if(flag)
{
prevDate=$1;
prevSourceTradeId=$2;
prevLine=$0;
}
else print prevLine;
prevDate=$1;
prevSourceTradeId=$2;
prevLine=$0;
flag=false;
}
}
}
}
function compareDate(lhsDate, rhsDate)
{
lhsSize = split(lhsDate, lhsFields, "/");
rhsSize = split(rhsDate, rhsFields, "/");
if(lhsSize != rhsSize)
{
print "Invalid prevDate " lhsDate " "rhsDate;
return 0;
}
for(i = rhsSize; i > 0; i--)
{
if(lhsFields[i] > rhsFields[i]) return 1;
}
return 0;
}
{

Trade.txt file
30/03/2009,17981-G,MIDAS,,FX Euro Option,,MELLON ADM,MELLON ADM,DBSA,DBSA,26/03/2009,84450.7476,30/03/2009,,4200000,BRL,,,USD,C,B,26/05/2009,139,USD,199061.35,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17980-G,MIDAS,,FX Euro Option,,MELLON ADM,MELLON ADM,DBSA,DBSA,26/03/2009,183108.5122,30/03/2009,,6600000,BRL,,,USD,C,B,26/05/2009,137,USD,374182.77,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17978-G,MIDAS,,FX Euro Option,,QUEST MACRO 30,QUEST MACRO 30,DBSA,DBSA,24/03/2009,-7841.8551,30/03/2009,,-390000,BRL,,,USD,C,S,26/05/2009,139,USD,-20803.77,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17977-G,MIDAS,,FX Euro Option,,ADVANTAGE QUEST,ADVANTAGE QUEST,DBSA,DBSA,24/03/2009,-1709.1223,30/03/2009,,-85000,BRL,,,USD,C,S,26/05/2009,139,USD,-4534.15,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17976-G,MIDAS,,FX Euro Option,,QUEST90 FIM,QUEST90 FIM,DBSA,DBSA,24/03/2009,-9651.514,30/03/2009,,-480000,BRL,,,USD,C,S,26/05/2009,139,USD,-25604.64,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17975-G,MIDAS,,FX Euro Option,,QUESTX FIM,QUESTX FIM,DBSA,DBSA,24/03/2009,-8042.9283,30/03/2009,,-400000,BRL,,,USD,C,S,26/05/2009,139,USD,-21337.2,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17974-G,MIDAS,,FX Euro Option,,MELLONQUEST30,MELLONQUEST30,DBSA,DBSA,24/03/2009,-51173.1316,30/03/2009,,-2545000,BRL,,,USD,C,S,26/05/2009,139,USD,-135757.93,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17973-G,MIDAS,,FX Euro Option,,MELLONQUEST I,MELLONQUEST I,DBSA,DBSA,24/03/2009,-6032.1963,30/03/2009,,-300000,BRL,,,USD,C,S,26/05/2009,139,USD,-16002.9,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17972-G,MIDAS,,FX Euro Option,,QUEST MACRO 30,QUEST MACRO 30,DBSA,DBSA,24/03/2009,-16923.6655,30/03/2009,,-610000,BRL,,,USD,C,S,26/05/2009,137,USD,-34583.55,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17971-G,MIDAS,,FX Euro Option,,QUEST90 FIM,QUEST90 FIM,DBSA,DBSA,24/03/2009,-21085.2226,30/03/2009,,-760000,BRL,,,USD,C,S,26/05/2009,137,USD,-43087.7,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17970-G,MIDAS,,FX Euro Option,,QUESTX FIM,QUESTX FIM,DBSA,DBSA,24/03/2009,-17201.1027,30/03/2009,,-620000,BRL,,,USD,C,S,26/05/2009,137,USD,-35150.49,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17969-G,MIDAS,,FX Euro Option,,MELLONQUEST30,MELLONQUEST30,DBSA,DBSA,24/03/2009,-110974.8559,30/03/2009,,-4000000,BRL,,,USD,C,S,26/05/2009,137,USD,-226777.44,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17968-G,MIDAS,,FX Euro Option,,MELLONQUEST I,MELLONQUEST I,DBSA,DBSA,24/03/2009,-13316.9827,30/03/2009,,-480000,BRL,,,USD,C,S,26/05/2009,137,USD,-27213.28,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17962-G,MIDAS,,FX Euro Option,,ADVANTAGE QUEST,ADVANTAGE QUEST,DBSA,DBSA,23/03/2009,-3606.6828,30/03/2009,,-130000,BRL,,,USD,C,S,26/05/2009,137,USD,-7370.26,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17960-G,MIDAS,,FX Euro Option,,MELGLOMKTFICFIM,MELGLOMKTFICFIM,DBSA,DBSA,18/03/2009,-149704.8449,30/03/2009,,-15000000,BRL,,,USD,C,S,01/04/2011,2.3,USD,-6880999.5,BRL,BRL,BRZ,,,,01/04/2011,,
30/03/2009,17959-G,MIDAS,,FX Euro Option,,MELGLOMKTFICFIM,MELGLOMKTFICFIM,DBSA,DBSA,18/03/2009,-435720.3749,30/03/2009,,-20000000,BRL,,,USD,C,S,03/01/2011,2,USD,-12858000,BRL,BRL,BRZ,,,,03/01/2011,,
30/03/2009,17958-G,MIDAS,,FX Euro Option,,MELGLOMKTFICFIM,MELGLOMKTFICFIM,DBSA,DBSA,18/03/2009,-256346.867,30/03/2009,,-15000000,BRL,,,USD,C,S,03/01/2011,2.2,USD,-7200000,BRL,BRL,BRZ,,,,03/01/2011,,
30/03/2009,17957-G,MIDAS,,FX Euro Option,,MELGLOMKTFICFIM,MELGLOMKTFICFIM,DBSA,DBSA,18/03/2009,-762701.3198,30/03/2009,,-30000000,BRL,,,USD,C,S,01/07/2010,2,USD,-16455000,BRL,BRL,BRZ,,,,01/07/2010,,
30/03/2009,17956-G,MIDAS,,FX Euro Option,,MELGLOMKTFICFIM,MELGLOMKTFICFIM,DBSA,DBSA,18/03/2009,-269765.1783,30/03/2009,,-15000000,BRL,,,USD,C,S,01/07/2010,2.2,USD,-5856999,BRL,BRL,BRZ,,,,01/07/2010,,
 

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Filtering duplicate lines

Does anybody know a command that filters duplicate lines out of a file. Similar to the uniq command but can handle duplicate lines no matter where they occur in a file? (9 Replies)
Discussion started by: AreaMan
9 Replies

2. UNIX for Dummies Questions & Answers

Filtering records of a file based on a value of a column

Hi all, I would like to extract records of a file based on a condition. The file contains 47 fields, and I would like to extract only those records that match a certain value in one of the columns, e.g. COL1 COL2 COL3 ............... COL47 1 XX 45 ... (4 Replies)
Discussion started by: risk_sly
4 Replies

3. Linux

Need awk script for removing duplicate records

I have huge txt file having millions of trade data. For e.g Trade.txt (first 8 lines in the file is header info) COB_DATE,TRADE_ID,SOURCE_SYSTEM_TRADE_ID,TRADE_GROUP_ID, TRADE_TYPE,DEALER_NAME,EXTERNAL_COUNTERPARTY_ID, EXTERNAL_COUNTERPARTY_NAME,DB_COUNTERPARTY_ID,... (6 Replies)
Discussion started by: nmumbarkar
6 Replies

4. Shell Programming and Scripting

Filtering issues with multiple columns in a single file

Hi, I am new to unix and would greatly appreciate some help. I have a file containing multiple colums containing different sets of data e.g. File 1: John Ireland 27_December_69 Mary England 13_March_55 Mike France 02_June_80 I am currently using the awk... (10 Replies)
Discussion started by: crunchie
10 Replies

5. Shell Programming and Scripting

Filtering Issues Using sed and awk

Hi, I am currently using the sed and awk commands to filter a file that has multiple sets of data in different columns. An example of part of the file i am filtering is as follows; Sat Oct 2 07:42:45 2010 01:33:46 R1_CAR_12.34 Sun Oct 3 13:09:53 2010 00:02:34 R2_BUS_56.78 Sun... (4 Replies)
Discussion started by: crunchie
4 Replies

6. Shell Programming and Scripting

Filtering issues while using nawk

Hi, I am currently filtering a file that has multiple sets of data. An example of some of the data is as follows; Sat Oct 2 07:42:45 2010 01:33:46 R1_CAR_12.34 Sun Oct 3 13:09:53 2010 00:02:34 R2_BUS_56.78 Sun Oct 3 21:11:39 2010 00:43:21 R3_TRAIN_COACH_90.12 Mon Oct 4... (1 Reply)
Discussion started by: crunchie
1 Replies

7. Linux

Need awk script for removing duplicate records

I have log file having Traffic line 2011-05-21 15:11:50.356599 TCP (6), length: 52) 10.10.10.1.3020 > 10.10.10.254.50404: 2011-05-21 15:11:50.652739 TCP (6), length: 52) 10.10.10.254.50404 > 10.10.10.1.3020: 2011-05-21 15:11:50.652558 TCP (6), length: 89) 10.10.10.1.3020 >... (1 Reply)
Discussion started by: Rastamed
1 Replies

8. Shell Programming and Scripting

Perl: filtering lines based on duplicate values in a column

Hi I have a file like this. I need to eliminate lines with first column having the same value 10 times. 13 18 1 + chromosome 1, 122638287 AGAGTATGGTCGCGGTTG 13 18 1 + chromosome 1, 128904080 AGAGTATGGTCGCGGTTG 13 18 1 - chromosome 14, 13627938 CAACCGCGACCATACTCT 13 18 1 + chromosome 1,... (5 Replies)
Discussion started by: polsum
5 Replies

9. Shell Programming and Scripting

Need help with filtering records in a file

Hi, I have following records in a file more file1.txt setting applicaction ABC for user setting applicaction CDE for user setting applicaction XXX for user logging applicaction XXX for user I need to filter out records which have strings " setting... (5 Replies)
Discussion started by: manid
5 Replies

10. UNIX for Beginners Questions & Answers

Filtering records of a csv file based on a value of a column

Hi, I tried filtering the records in a csv file using "awk" command listed below. awk -F"~" '$4 ~ /Active/{print }' inputfile > outputfile The output always has all the entries. The same command worked for different users from one of the forum links. content of file I was... (3 Replies)
Discussion started by: sunilmudikonda
3 Replies
All times are GMT -4. The time now is 08:28 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy