05-20-2010
Quote:
Originally Posted by
methyl
We must assume you have a mainstream database engine which can handle CSV files and a computer which has capacity for this task. You really don't mention much about the data or the computer.
At a design level, each record must contain a unique key and whatever information is a parameter to the "purge". Unless you know the source "purge" rules your database of "data already processed" will just grow.
It would make more sense to fix the data feed design at at source. A convention is to mark the record in the source database with a unique extract run reference to prevent repeat extracts - whilst also allowing a rerun.
Unique key - this is not required, record as a whole could be unique and there is no need for unique columns in it. Worst, after applying normalization or some form of transformation, 2 records could be unique or not be so.
Fixing the problem at scope, is definitely out of question, though I agree it makes sense to do that, most of the time, its completely out of scope. Not everything flows in line. We need to work on what has been received or what could be potentially received in a real world scenario.
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
Hi Gurus,
I had a question regarding avoiding duplicates.i have a file abc.txt
abc.txt
-------
READER_1_1_1> HIER_28056 XML Reader: Error occurred while parsing:; line number ; column number
READER_1_3_1> Sun Mar 23 23:52:48 2008
READER_1_3_1> HIER_28056 XML Reader: Error occurred while... (7 Replies)
Discussion started by: pssandeep
7 Replies
2. Shell Programming and Scripting
Hi,
I need to remove duplicates from a file. The file will be like this
0003 10101 20100120 abcdefghi
0003 10101 20100121 abcdefghi
0003 10101 20100122 abcdefghi
0003 10102 20100120 abcdefghi
0003 10103 20100120 abcdefghi
0003 10103 20100121 abcdefghi
Here if the first colum and... (6 Replies)
Discussion started by: gpaulose
6 Replies
3. UNIX for Dummies Questions & Answers
Hi Unix gurus,
Maybe it is too much to ask for but please take a moment and help me out. A very humble request to you gurus. I'm new to Unix and I have started learning Unix. I have this project which is way to advanced for me.
File format: CSV file
File has four columns with no header... (8 Replies)
Discussion started by: arvindosu
8 Replies
4. Shell Programming and Scripting
Hi Experts,
Please check the following new requirement. I got data like the following in a file.
FILE_HEADER
01cbbfde7898410| 3477945| home| 1
01cbc275d2c122| 3478234| WORK| 1
01cbbe4362743da| 3496386| Rich Spare| 1
01cbc275d2c122| 3478234| WORK| 1
This is pipe separated file with... (3 Replies)
Discussion started by: tinufarid
3 Replies
5. Shell Programming and Scripting
Hi All,
I have an xml file that contains information like this
<ID>574922<COMMENT>TEXT
TEXT
TEXT</COMMENT></ID>
<ID>574922<COMMENT>TEXT
TEXT
TEXT</COMMENT></ID>
<ID>412659<COMMENT>TEXT
TEXT
TEXT
TEXT
TEXT</COMMENT></ID>
<ID>873520<COMMENT>TEXT</COMMENT></ID>... (5 Replies)
Discussion started by: TasosARISFC
5 Replies
6. Shell Programming and Scripting
Hi Folks -
I'm quite new to awk and didn't come across such issues before. The problem statement is that, I've a file with duplicate records in 3rd and 4th fields. The sample is as below:
aaaaaa|a12|45|56
abbbbaaa|a12|45|56
bbaabb|b1|51|45
bbbbbabbb|b2|51|45
aaabbbaaaa|a11|45|56
... (3 Replies)
Discussion started by: asyed
3 Replies
7. Programming
Dear All
I have 200 data files and each files has many duplicates.
I am looking for the automated awk script such that it checks and removes the duplicates from the each file and saving them as new files for all 200 files in the respective folder.
For example my data looks like this..
... (12 Replies)
Discussion started by: bala06
12 Replies
8. UNIX for Dummies Questions & Answers
Can u tell me how to remove duplicate records from a file? (11 Replies)
Discussion started by: saga20
11 Replies
9. UNIX for Dummies Questions & Answers
Hi All,
I am merging files coming from 2 different systems ,while doing that I am getting duplicates entries in the merged file
I,01,000131,764,2,4.00
I,01,000131,765,2,4.00
I,01,000131,772,2,4.00
I,01,000131,773,2,4.00
I,01,000168,762,2,2.00
I,01,000168,763,2,2.00... (5 Replies)
Discussion started by: Sri3001
5 Replies
10. Shell Programming and Scripting
i hav two files like
i want to remove/delete all the duplicate lines in file2 which are viz unix,unix2,unix3.I have tried previous post also,but in that complete line must be similar.In this case i have to verify first column only regardless what is the content in succeeding columns. (3 Replies)
Discussion started by: sagar_1986
3 Replies
LEARN ABOUT DEBIAN
ical::parser::html
iCal::Parser::HTML(3pm) User Contributed Perl Documentation iCal::Parser::HTML(3pm)
NAME
iCal::Parser::HTML - Generate HTML calendars from iCalendars
SYNOPSIS
use iCal::Parser::HTML;
my $parser=iCal::Parser::HTML->new;
print $parser->parse(type=>$type,start=>$date,files=>[@icals]);
DESCRIPTION
This module uses iCal::Parser::SAX and XML::LibXSLT with included stylesheets to generates html calendars from icalendars.
The html document generated includes (when appropriate) a sidebar containing a legend, a list of todos and a three month calendar for the
previous, current and next months.
The stylesheets are stored in the HTML/stylesheet directory under the installed package directory.
Also included in this package are an optionally installed command line program "ical2html" in scripts and, in the example directory, a cgi
handler ("ical.cgi" in examples) and a stylesheet ("calendar.css" in examples) for formatting the html output. Note that the html output
will look quite broken without the stylesheet.
ARGUMENTS
The following arguments are processed by this module. Any addtional arguments are passed to iCal::Parser::SAX.
type
The type of calendar to generate. One of: "day", "week", "month" or "year". The daily, weekly and monthly calendars include the
sidebar. The calendar generated will be for the specified period (day, week, etc.) which includes the specified date.
start
The date to generated the calendar for. The date only needs to be specified to the precision necessary for the type of calendar. That
is, "YYYY" for a yearly calendar, "YYYYMM" for a monthly, and "YYYYMMDD" for daily and weekly. In addition, the date can be in one of
the following forms:
YYYY[MM[DD]]
YYYY[-MM[-DD]]
A DateTime object initialized to the necessary precision
files
An array reference to the list of icalendars to include in the results.
url If this params is specified, then the html output will contain links back to this url for getting other calendar periods. The params
"type" and "date" will be appended to this url when generating the links.
AUTHOR
Rick Frankel, cpan@rickster.com
COPYRIGHT
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
The full text of the license can be found in the LICENSE file included with this module.
SEE ALSO
iCal::Parser::SAX, XML::LibXML::SAX::Builder, XML::LibXSLT, DateTime
perl v5.14.2 2012-01-25 iCal::Parser::HTML(3pm)