Remove duplicated records and update last line record counts Post: 303032029

Sponsored Content

Top Forums Shell Programming and Scripting Remove duplicated records and update last line record counts Post 303032029 by Don Cragun on Saturday 9th of March 2019 07:04:23 PM

03-09-2019

Registered User

Your description and code are not clear enough to be sure that this is what you want, but it works with the sample data provided:

Code:

awk '
BEGIN {	FS = OFS = ","
}
$1 == "D" {
	if($2 in a)
		next
	a[$2]
	printed++
}
$1 == "T" {
	$2 = printed
}
1' file.CSV

Clearly field #2 is not the key to determining duplicate records, it is at least field #2 when and only when field #1 is "D". And, since you are storing the entire line into the a[] array for some reason, maybe you only want to delete identical lines instead of deleting lines with identical keys???

The above code assumes you just want to delete lines with identical keys where the key is the combination of field #1 being "D" and field #2 being unique. The second field in the line with field #1 being "T" is written with whatever was in field #2 changed to the number of lines with field #1 being "D" and field #2 being unique that have been seen before the line that has field #1 being "T". All lines that do not have field #1 being "D" or "T" are copied to the output without being counted.

You should always tell us what operating system and shell you're using when you start a new thread in this forum. The behavior of many utilities varies from operating system to operating system and the features provided by shells vary from shell to shell.

If you want to try the above code on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

remove duplicated xml record in a file under unix

Hi, If i have a file with xml format, i would like to remove duplicated records and save to a new file. Is it possible...to write script to do it?

2. Shell Programming and Scripting

remove duplicated columns

hi all, i have a file contain multicolumns, this file is sorted by col2 and col3. i want to remove the duplicated columns if the col2 and col3 are the same in another line. example fileA AA BB CC DD CC XX CC DD BB CC ZZ FF DD FF HH HH the output is AA BB CC DD BB CC ZZ FF...

3. Shell Programming and Scripting

Help to Add and Remove Records only from first line/last line

Hi, I need help with a maybe total simple issue but somehow I am not getting it. I am not able to etablish a sed or awk command which is adding to the first line in a text and removing only from the last line the ",". The file is looking like follow: TABLE1, TABLE2, . . . TABLE99,...

4. Shell Programming and Scripting

Sending e-mail of record counts in 3 or more files

I am trying to load data into 3 tables simultaneously (which is working fine). Then when loaded, it should count the total number of records in all the 3 input files and send an e-mail to the user. The script is working fine, as far as loading all the 3 input files into the database tables, but...

5. Shell Programming and Scripting

Split a single record to multiple records & add folder name to each line

Hi Gurus, I need to cut single record in the file(asdf) to multile records based on the number of bytes..(44 characters). So every record will have 44 characters. All the records should be in the same file..to each of these lines I need to add the folder(<date>) name. I have a dir. in which...

6. UNIX for Dummies Questions & Answers

Hardcoding & Record counts in a file

HI , I am having a huge comma delimiter file, I have to append the following four lines before the starting of the file through a shell script. FILE NAME = TEST_LOAD DATETIME = CURRENT DATE TIME LOAD DATE = CURRENT DATE RECORD COUNT = TOTAL RECORDS IN FILE Source data 1,2,3,4,5,6,7...

7. Shell Programming and Scripting

New file should store all the 7 existing filenames and their record counts and ftp th

Hi, I need help regarding below concern. There is a script and it has 7 existing files(in a path say,. usr/appl/temp/file1.txt) and I need to create one new blank file say “file_count.txt” in the same script itself. Then the new file <file_count.txt> should store all the 7 filenames and...

8. Shell Programming and Scripting

How to Remove the new line character inbetween a record

I have a file, in which a single record spans across multiple lines, File 1 ==== 14|\n leave request \n accepted|Yes| 15|\n leave request not \n acccepted|No| I wanted to remove the '\n charecters. I used the below code (foudn somewhere in this forum) perl -e 'while (<>) { if...

9. Shell Programming and Scripting

How to remove duplicated lines?

Hi, if i have a file like this: Query=1 a a b c c c d Query=2 b b b c c e . . .

10. Shell Programming and Scripting

Join files, omit duplicated records from one file

Hello I have 2 files, eg more file1 file2 :::::::::::::: file1 :::::::::::::: 1 fromfile1 2 fromfile1 3 fromfile1 4 fromfile1 5 fromfile1 6 fromfile1 7 fromfile1 :::::::::::::: file2 :::::::::::::: 3 fromfile2 5 fromfile2

LEARN ABOUT DEBIAN

srec_emon52

srec_emon52(5)							File Formats Manual						    srec_emon52(5)

NAME

       srec_emon52 - Elektor Monitor (EMON52) file format

DESCRIPTION

       This  format is used by the monitor EMON52, developed by the European electronics magazine Elektor (Elektuur in Holland).  Elektor wouldn't
       be Elektor if they didn't try to reinvent the wheel.  It's a mystery why they didn't use an existing format  for  the  project.	 Only  the
       Elektor Assembler will produce this file format, reducing the choice of development tools dramatically.

   Records
       All data lines are called records, and each record contains the following four fields:

							   +---+------+---+-----------+------+
							   |cc | aaaa | : | dd ... dd | ssss |
       The field are defined as follows:		   +---+------+---+-----------+------+

       cc      The  byte  count.   A two digit hex value (1 byte), counting the actual data bytes in the record.  The byte count is separated from
	       the next field by a space.

       aaaa    The address field.  A four hex digit (2 byte) number representing the first address to be used by this record.

       :       The address field and the data field are separated by a colon.

       dd      The actual data of this record.	There can be 1 to 255 data bytes per record (see cc) All bytes in the record  are  separated  from
	       each other (and the checksum) by a space.

       ssss    Data  Checksum,	adding	all  bytes  of	the data line together, forming a 16 bit checksum.  Covers only all the data bytes of this
	       record.

       Please note that there is no End Of File record defined.

   Byte Count
       The byte count cc counts the actual data bytes in the current record.  Usually records have 16 data bytes.  I don't know what  the  maximum
       number of data bytes is.  It depends on the size of the data buffer in the EMON52.

   Address Field
       This  is the address where the first data byte of the record should be stored.  After storing that data byte, the address is incremented by
       1 to point to the address for the next data byte of the record.	And so on, until all data bytes are stored.

       The address is represented by a 4 digit hex number (2 bytes), with the MSD first.

   Data Field
       The payload of the record is formed by the Data field.  The number of data bytes expected is given by the Byte Count field.

   Checksum
       The checksum is a 16 bit result from adding all data bytes of the record together.

   Size Multiplier
       In general, binary data will expand in sized by approximately 3.8 times when represented with this format.

EXAMPLE

       Here is an example of an EMON52 file:
	      10 0000:57 6F 77 21 20 44 69 64 20 79 6F 75 20 72 65 61 0564
	      10 0010:6C 6C 79 20 67 6F 20 74 68 72 6F 75 67 68 20 61 05E9
	      10 0020:6C 6C 20 74 68 69 73 20 74 72 6F 75 62 6C 65 20 05ED
	      10 0030:74 6F 20 72 65 61 64 20 74 68 69 73 20 73 74 72 05F0
	      04 0040:69 6E 67 21 015F

SEE ALSO

       http://sbprojects.fol.nl/knowledge/fileformats/emon52.htm

AUTHOR

       This man page was taken from the above Web page.  It was written by San Bergmans <sanmail@bigfoot.com>

Reference Manual						      SRecord							    srec_emon52(5)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

remove duplicated xml record in a file under unix

Discussion started by: happyv

2. Shell Programming and Scripting

remove duplicated columns

Discussion started by: kamel.seg

3. Shell Programming and Scripting

Help to Add and Remove Records only from first line/last line

Discussion started by: enjoy

4. Shell Programming and Scripting

Sending e-mail of record counts in 3 or more files

Discussion started by: msrahman