Your description and code are not clear enough to be sure that this is what you want, but it works with the sample data provided:
Clearly field #2 is not the key to determining duplicate records, it is at least field #2 when and only when field #1 is "D". And, since you are storing the entire line into the a[] array for some reason, maybe you only want to delete identical lines instead of deleting lines with identical keys???
The above code assumes you just want to delete lines with identical keys where the key is the combination of field #1 being "D" and field #2 being unique. The second field in the line with field #1 being "T" is written with whatever was in field #2 changed to the number of lines with field #1 being "D" and field #2 being unique that have been seen before the line that has field #1 being "T". All lines that do not have field #1 being "D" or "T" are copied to the output without being counted.
You should always tell us what operating system and shell you're using when you start a new thread in this forum. The behavior of many utilities varies from operating system to operating system and the features provided by shells vary from shell to shell.
If you want to try the above code on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
This User Gave Thanks to Don Cragun For This Post:
Hi,
If i have a file with xml format, i would like to remove duplicated records and save to a new file. Is it possible...to write script to do it? (8 Replies)
hi all,
i have a file contain multicolumns, this file is sorted by col2 and col3.
i want to remove the duplicated columns if the col2 and col3 are the same in another line.
example
fileA
AA BB CC DD
CC XX CC DD
BB CC ZZ FF
DD FF HH HH
the output is
AA BB CC DD
BB CC ZZ FF... (6 Replies)
Hi,
I need help with a maybe total simple issue but somehow I am not getting it.
I am not able to etablish a sed or awk command which is adding to the first line in a text and removing only from the last line the ",".
The file is looking like follow:
TABLE1,
TABLE2,
.
.
.
TABLE99,... (4 Replies)
I am trying to load data into 3 tables simultaneously (which is working fine). Then when loaded, it should count the total number of records in all the 3 input files and send an e-mail to the user.
The script is working fine, as far as loading all the 3 input files into the database tables, but... (3 Replies)
Hi Gurus,
I need to cut single record in the file(asdf) to multile records based on the number of bytes..(44 characters). So every record will have 44 characters. All the records should be in the same file..to each of these lines I need to add the folder(<date>) name.
I have a dir. in which... (20 Replies)
HI ,
I am having a huge comma delimiter file, I have to append the following four lines before the starting of the file through a shell script.
FILE NAME = TEST_LOAD
DATETIME = CURRENT DATE TIME
LOAD DATE = CURRENT DATE
RECORD COUNT = TOTAL RECORDS IN FILE
Source data
1,2,3,4,5,6,7... (7 Replies)
Hi,
I need help regarding below concern.
There is a script and it has 7 existing files(in a path say,. usr/appl/temp/file1.txt) and I need to create one new blank file say “file_count.txt” in the same script itself.
Then the new file <file_count.txt> should store all the 7 filenames and... (1 Reply)
I have a file, in which a single record spans across multiple lines,
File 1
====
14|\n
leave request \n
accepted|Yes|
15|\n
leave request not \n
acccepted|No|
I wanted to remove the '\n charecters. I used the below code (foudn somewhere in this forum)
perl -e 'while (<>) { if... (1 Reply)
DBS_UPDATE(1p) User Contributed Perl Documentation DBS_UPDATE(1p)NAME
dbs_update - Update SQL Databases
DESCRIPTION
dbs_update is an utility to update SQL databases from text files.
FORMAT OF THE TEXT FILES
dbs_update assumes that each line of the input contains a data record and that the field within the records are separated by tabulators.
You can tell dbs_update about the input format with the --format option.
The first field of the data record is used as table specification. These consists of the table name and optionally the index of starting
column, separated by a dot.
Alternatively dbs_update can read the column names from the first line of input (see the -h/--headline option). These can even be aliases
for the real column names (see the -m/--map option).
COMMAND LINE PARAMETERS
Required command line parameters are the DBI driver ("Pg" for Postgres or "mysql" for MySQL) and the database name. The third parameter is
optionally and specifies the database user and/or the host where the database resides ("racke", "racke@linuxia.de" or "@linuxia.de").
OPTIONS --cleanse
Removes all records which remain unaffected from the update process. The same result as deleting all records from the table first and then
running dbs_update, but the table is not empty in the meantime.
-c COLUMN,COLUMN,..., --columns=COLUMN,COLUMN,...
Update only the table columns given by the COLUMN parameters. To exclude columns from the update prepend "!" or "^" to the parameters.
--rows=ROW,ROW,...
Update only the input rows given by the ROW parameters. The first row is 1 where headlines doesn't count. To exclude rows from the update
prepend "!" or "^" to the parameters.
-f FILE, --file=FILE
Reads records from file FILE instead of from standard input.
--format=FORMAT[SEPCHAR]
Assumes FORMAT as format for the input. Only CSV can be specified for now, default is TAB. The default field separator for CSV is a comma,
you may change this by appending the separator to the format.
-h, --headline
Reads the column names from the first line of the input instead of dedicting them from the database layout. Requires the -t/--table option.
-k COUNT, -k KEY,KEY,..., --keys=COUNT, --keys=KEY,KEY,...
Specifies the keys for the table(s) either as the number of columns used as keys or by specifying them explicitly as comma separated argu-
ments to the option. This is used for the detection of existing records.
-m ALIASDEF, --map=ALIASDEF
Maps the names found in the first line of input to the actual column names in the database. The alias and the column name are separated
with "=" signs and the different entries are separated by ";" signs, e.g. "Art-No.=code;Short Description=shortdescr'".
--map-filter=FILTER
Applies a filter to the column names read from the input file. Currently there is only the "lc" filter available.
--match-sql=FIELD:{STATEMENT}
Updates only records where the value of the column FIELD is in the result set of the SQL statement STATEMENT, e.g. "category:{select dis-
tinct name from categories}".
-o, --update-only
Updates existing database entries only, stops if it detects new ones.
-r ROUTINE, --routine=ROUTINE
Applies ROUTINE to any data record. ROUTINE must be a subroutine. dbs_update passes the table name and a hash reference to this subrou-
tine. The keys of the hash are the column names and the values are the corresponding field values. If the return value of ROUTINE is not a
truth value, the data record will be skipped.
"sub {my ($table, $valref) = @_;
unless (defined $$valref{country} && $$valref{country} !~ /S/) {
$$valref{country} = "Germany";
}
1; }"
--skipbadlines
Lines not matching the assumed format are ignored. Without this option, dbs_update simply stops.
-t TABLE, --table=TABLE
Uses TABLE as table name for all records instead of the first field name.
AUTHOR
Stefan Hornburg (Racke), racke@linuxia.de
SEE ALSO perl(1), DBIx::Easy(3)perl v5.8.8 2007-02-01 DBS_UPDATE(1p)