Your description and code are not clear enough to be sure that this is what you want, but it works with the sample data provided:
Clearly field #2 is not the key to determining duplicate records, it is at least field #2 when and only when field #1 is "D". And, since you are storing the entire line into the a[] array for some reason, maybe you only want to delete identical lines instead of deleting lines with identical keys???
The above code assumes you just want to delete lines with identical keys where the key is the combination of field #1 being "D" and field #2 being unique. The second field in the line with field #1 being "T" is written with whatever was in field #2 changed to the number of lines with field #1 being "D" and field #2 being unique that have been seen before the line that has field #1 being "T". All lines that do not have field #1 being "D" or "T" are copied to the output without being counted.
You should always tell us what operating system and shell you're using when you start a new thread in this forum. The behavior of many utilities varies from operating system to operating system and the features provided by shells vary from shell to shell.
If you want to try the above code on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
thanks Don Cragun.
the result is exactly I want.
sorry, I didn't explain my request more detail. you are right. actually, the whole line is identical if field #2 is identical.
My OS is Solaris/SunOS. I will put my OS infor next time.
Thank you again.
Hi,
If i have a file with xml format, i would like to remove duplicated records and save to a new file. Is it possible...to write script to do it? (8 Replies)
hi all,
i have a file contain multicolumns, this file is sorted by col2 and col3.
i want to remove the duplicated columns if the col2 and col3 are the same in another line.
example
fileA
AA BB CC DD
CC XX CC DD
BB CC ZZ FF
DD FF HH HH
the output is
AA BB CC DD
BB CC ZZ FF... (6 Replies)
Hi,
I need help with a maybe total simple issue but somehow I am not getting it.
I am not able to etablish a sed or awk command which is adding to the first line in a text and removing only from the last line the ",".
The file is looking like follow:
TABLE1,
TABLE2,
.
.
.
TABLE99,... (4 Replies)
I am trying to load data into 3 tables simultaneously (which is working fine). Then when loaded, it should count the total number of records in all the 3 input files and send an e-mail to the user.
The script is working fine, as far as loading all the 3 input files into the database tables, but... (3 Replies)
Hi Gurus,
I need to cut single record in the file(asdf) to multile records based on the number of bytes..(44 characters). So every record will have 44 characters. All the records should be in the same file..to each of these lines I need to add the folder(<date>) name.
I have a dir. in which... (20 Replies)
HI ,
I am having a huge comma delimiter file, I have to append the following four lines before the starting of the file through a shell script.
FILE NAME = TEST_LOAD
DATETIME = CURRENT DATE TIME
LOAD DATE = CURRENT DATE
RECORD COUNT = TOTAL RECORDS IN FILE
Source data
1,2,3,4,5,6,7... (7 Replies)
Hi,
I need help regarding below concern.
There is a script and it has 7 existing files(in a path say,. usr/appl/temp/file1.txt) and I need to create one new blank file say “file_count.txt” in the same script itself.
Then the new file <file_count.txt> should store all the 7 filenames and... (1 Reply)
I have a file, in which a single record spans across multiple lines,
File 1
====
14|\n
leave request \n
accepted|Yes|
15|\n
leave request not \n
acccepted|No|
I wanted to remove the '\n charecters. I used the below code (foudn somewhere in this forum)
perl -e 'while (<>) { if... (1 Reply)
JOIN(1) General Commands Manual JOIN(1)NAME
join - relational database operator
SYNOPSIS
join [ options ] file1 file2
DESCRIPTION
Join forms, on the standard output, a join of the two relations specified by the lines of file1 and file2. If one of the file names is the
standard input is used.
File1 and file2 must be sorted in increasing ASCII collating sequence on the fields on which they are to be joined, normally the first in
each line.
There is one line in the output for each pair of lines in file1 and file2 that have identical join fields. The output line normally con-
sists of the common field, then the rest of the line from file1, then the rest of the line from file2.
Input fields are normally separated spaces or tabs; output fields by space. In this case, multiple separators count as one, and leading
separators are discarded.
The following options are recognized, with POSIX syntax.
-a n In addition to the normal output, produce a line for each unpairable line in file n, where n is 1 or 2.
-v n Like -a, omitting output for paired lines.
-e s Replace empty output fields by string s.
-1 m
-2 m Join on the mth field of file1 or file2.
-jn m Archaic equivalent for -n m.
-ofields
Each output line comprises the designated fields. The comma-separated field designators are either 0, meaning the join field, or
have the form n.m, where n is a file number and m is a field number. Archaic usage allows separate arguments for field designators.
-tc Use character c as the only separator (tab character) on input and output. Every appearance of c in a line is significant.
EXAMPLES
sort /adm/users | join -t: -a 1 -e "" - bdays
Add birthdays to password information, leaving unknown birthdays empty. The layout of is given in users(6); bdays contains sorted
lines like
tr : ' ' </adm/users | sort -k 3 3 >temp
join -1 3 -2 3 -o 1.1,2.1 temp temp | awk '$1 < $2'
Print all pairs of users with identical userids.
SOURCE
/sys/src/cmd/join.c
SEE ALSO sort(1), comm(1), awk(1)BUGS
With default field separation, the collating sequence is that of sort -b -ky,y; with -t, the sequence is that of sort -tx -ky,y.
One of the files must be randomly accessible.
JOIN(1)