Issues with filtering duplicate records using gawk script
Hi All,
I have huge trade file with milions of trades.I need to remove duplicate records (e.g I have following records)
30/10/2009,trdeId1,..,..
26/10/2009.tradeId1,..,..,,
30/10/2009,tradeId2,..
In the above case i need to filter duplicate recods and I should get following output.
30/10/2009,trdeId1,..,..
30/10/2009,tradeId2,..
(trade record with latest COB date)
COB -closed of business day
I need to handle following three conditions.
1.Trade file will be sorted in ascending order on first two columns(COB date and trade id)
2.Trade file will be sorted in descending order on first two columns(COB date and trade id)
3.Trade file may not have duplicate records.
In all the above condition my code should work.
I have written following code.but it doen't seems to be working.As i m new to awk can anybody help me in getting this.
#!/usr/bin/gawk
BEGIN {
FS = ","
}
END {
print prevLine;
}
{
if( FNR == 1)
{
prevDate=$1;
prevSourceTradeId=$2;
prevLine=$0;
}
else
{
if(prevSourceTradeId==$2)
{
if((compareDate(prevDate,$1) == 1))
{
print prevLine;
flag=true
}
else
{
prevDate=$1;
prevLine=$0;
prevSourceTradeId=$2;
print prevLine;
flag=true
}
}
else
{
if(flag)
{
prevDate=$1;
prevSourceTradeId=$2;
prevLine=$0;
}
else print prevLine;
prevDate=$1;
prevSourceTradeId=$2;
prevLine=$0;
flag=false;
}
}
}
}
function compareDate(lhsDate, rhsDate)
{
lhsSize = split(lhsDate, lhsFields, "/");
rhsSize = split(rhsDate, rhsFields, "/");
if(lhsSize != rhsSize)
{
print "Invalid prevDate " lhsDate " "rhsDate;
return 0;
}
for(i = rhsSize; i > 0; i--)
{
if(lhsFields[i] > rhsFields[i]) return 1;
}
return 0;
}
{
Trade.txt file
30/03/2009,17981-G,MIDAS,,FX Euro Option,,MELLON ADM,MELLON ADM,DBSA,DBSA,26/03/2009,84450.7476,30/03/2009,,4200000,BRL,,,USD,C,B,26/05/2009,139,USD,199061.35,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17980-G,MIDAS,,FX Euro Option,,MELLON ADM,MELLON ADM,DBSA,DBSA,26/03/2009,183108.5122,30/03/2009,,6600000,BRL,,,USD,C,B,26/05/2009,137,USD,374182.77,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17978-G,MIDAS,,FX Euro Option,,QUEST MACRO 30,QUEST MACRO 30,DBSA,DBSA,24/03/2009,-7841.8551,30/03/2009,,-390000,BRL,,,USD,C,S,26/05/2009,139,USD,-20803.77,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17977-G,MIDAS,,FX Euro Option,,ADVANTAGE QUEST,ADVANTAGE QUEST,DBSA,DBSA,24/03/2009,-1709.1223,30/03/2009,,-85000,BRL,,,USD,C,S,26/05/2009,139,USD,-4534.15,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17976-G,MIDAS,,FX Euro Option,,QUEST90 FIM,QUEST90 FIM,DBSA,DBSA,24/03/2009,-9651.514,30/03/2009,,-480000,BRL,,,USD,C,S,26/05/2009,139,USD,-25604.64,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17975-G,MIDAS,,FX Euro Option,,QUESTX FIM,QUESTX FIM,DBSA,DBSA,24/03/2009,-8042.9283,30/03/2009,,-400000,BRL,,,USD,C,S,26/05/2009,139,USD,-21337.2,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17974-G,MIDAS,,FX Euro Option,,MELLONQUEST30,MELLONQUEST30,DBSA,DBSA,24/03/2009,-51173.1316,30/03/2009,,-2545000,BRL,,,USD,C,S,26/05/2009,139,USD,-135757.93,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17973-G,MIDAS,,FX Euro Option,,MELLONQUEST I,MELLONQUEST I,DBSA,DBSA,24/03/2009,-6032.1963,30/03/2009,,-300000,BRL,,,USD,C,S,26/05/2009,139,USD,-16002.9,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17972-G,MIDAS,,FX Euro Option,,QUEST MACRO 30,QUEST MACRO 30,DBSA,DBSA,24/03/2009,-16923.6655,30/03/2009,,-610000,BRL,,,USD,C,S,26/05/2009,137,USD,-34583.55,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17971-G,MIDAS,,FX Euro Option,,QUEST90 FIM,QUEST90 FIM,DBSA,DBSA,24/03/2009,-21085.2226,30/03/2009,,-760000,BRL,,,USD,C,S,26/05/2009,137,USD,-43087.7,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17970-G,MIDAS,,FX Euro Option,,QUESTX FIM,QUESTX FIM,DBSA,DBSA,24/03/2009,-17201.1027,30/03/2009,,-620000,BRL,,,USD,C,S,26/05/2009,137,USD,-35150.49,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17969-G,MIDAS,,FX Euro Option,,MELLONQUEST30,MELLONQUEST30,DBSA,DBSA,24/03/2009,-110974.8559,30/03/2009,,-4000000,BRL,,,USD,C,S,26/05/2009,137,USD,-226777.44,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17968-G,MIDAS,,FX Euro Option,,MELLONQUEST I,MELLONQUEST I,DBSA,DBSA,24/03/2009,-13316.9827,30/03/2009,,-480000,BRL,,,USD,C,S,26/05/2009,137,USD,-27213.28,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17962-G,MIDAS,,FX Euro Option,,ADVANTAGE QUEST,ADVANTAGE QUEST,DBSA,DBSA,23/03/2009,-3606.6828,30/03/2009,,-130000,BRL,,,USD,C,S,26/05/2009,137,USD,-7370.26,BRL,BRL,BRZ,,,,26/05/2009,,
30/03/2009,17960-G,MIDAS,,FX Euro Option,,MELGLOMKTFICFIM,MELGLOMKTFICFIM,DBSA,DBSA,18/03/2009,-149704.8449,30/03/2009,,-15000000,BRL,,,USD,C,S,01/04/2011,2.3,USD,-6880999.5,BRL,BRL,BRZ,,,,01/04/2011,,
30/03/2009,17959-G,MIDAS,,FX Euro Option,,MELGLOMKTFICFIM,MELGLOMKTFICFIM,DBSA,DBSA,18/03/2009,-435720.3749,30/03/2009,,-20000000,BRL,,,USD,C,S,03/01/2011,2,USD,-12858000,BRL,BRL,BRZ,,,,03/01/2011,,
30/03/2009,17958-G,MIDAS,,FX Euro Option,,MELGLOMKTFICFIM,MELGLOMKTFICFIM,DBSA,DBSA,18/03/2009,-256346.867,30/03/2009,,-15000000,BRL,,,USD,C,S,03/01/2011,2.2,USD,-7200000,BRL,BRL,BRZ,,,,03/01/2011,,
30/03/2009,17957-G,MIDAS,,FX Euro Option,,MELGLOMKTFICFIM,MELGLOMKTFICFIM,DBSA,DBSA,18/03/2009,-762701.3198,30/03/2009,,-30000000,BRL,,,USD,C,S,01/07/2010,2,USD,-16455000,BRL,BRL,BRZ,,,,01/07/2010,,
30/03/2009,17956-G,MIDAS,,FX Euro Option,,MELGLOMKTFICFIM,MELGLOMKTFICFIM,DBSA,DBSA,18/03/2009,-269765.1783,30/03/2009,,-15000000,BRL,,,USD,C,S,01/07/2010,2.2,USD,-5856999,BRL,BRL,BRZ,,,,01/07/2010,,