Remove duplicated records and update last line record counts Post: 303032040

Sponsored Content

Top Forums Shell Programming and Scripting Remove duplicated records and update last line record counts Post 303032040 by Don Cragun on Sunday 10th of March 2019 01:37:35 AM

03-10-2019

Registered User

Quote:

Originally Posted by nezabudka

Code:

awk -F, '/^T/ {for(i in A) sum+=(A[i]-1); $2=$2-sum} !A[$0]++' file

Hi nezabudka,
Nice approach. My code counts the number of lines output and ignores the value originally found in the "T" line field #2; your code subtracts the number of duplicates found.

If there were to be input files with multiple "T" lines, mine would output all of them each containing the number of unique "D" lines seen up to that point while yours will only print the first one found. I assume that an input file will only contain one "T" line, so this difference shouldn't matter.

If there are lines other than "D" and "T" lines, my code will copy them to the output but not include them in the count included in the "T" line; your code will include a count of non-duplicated non-"D" (except for the first "T" line) in its calculations. I have no idea whether or not the actual data to be processed might contain any header lines that should not be included in the in the "T" line output. If header lines are present and should be ignored in the "T" line output, that should have been mentioned in the requirements.

Note that your code replaces the commas in the "T" line output with <space>s because you didn't set OFS to a comma.

These 2 Users Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

remove duplicated xml record in a file under unix

Hi, If i have a file with xml format, i would like to remove duplicated records and save to a new file. Is it possible...to write script to do it?

2. Shell Programming and Scripting

remove duplicated columns

hi all, i have a file contain multicolumns, this file is sorted by col2 and col3. i want to remove the duplicated columns if the col2 and col3 are the same in another line. example fileA AA BB CC DD CC XX CC DD BB CC ZZ FF DD FF HH HH the output is AA BB CC DD BB CC ZZ FF...

3. Shell Programming and Scripting

Help to Add and Remove Records only from first line/last line

Hi, I need help with a maybe total simple issue but somehow I am not getting it. I am not able to etablish a sed or awk command which is adding to the first line in a text and removing only from the last line the ",". The file is looking like follow: TABLE1, TABLE2, . . . TABLE99,...

4. Shell Programming and Scripting

Sending e-mail of record counts in 3 or more files

I am trying to load data into 3 tables simultaneously (which is working fine). Then when loaded, it should count the total number of records in all the 3 input files and send an e-mail to the user. The script is working fine, as far as loading all the 3 input files into the database tables, but...

5. Shell Programming and Scripting

Split a single record to multiple records & add folder name to each line

Hi Gurus, I need to cut single record in the file(asdf) to multile records based on the number of bytes..(44 characters). So every record will have 44 characters. All the records should be in the same file..to each of these lines I need to add the folder(<date>) name. I have a dir. in which...

6. UNIX for Dummies Questions & Answers

Hardcoding & Record counts in a file

HI , I am having a huge comma delimiter file, I have to append the following four lines before the starting of the file through a shell script. FILE NAME = TEST_LOAD DATETIME = CURRENT DATE TIME LOAD DATE = CURRENT DATE RECORD COUNT = TOTAL RECORDS IN FILE Source data 1,2,3,4,5,6,7...

7. Shell Programming and Scripting

New file should store all the 7 existing filenames and their record counts and ftp th

Hi, I need help regarding below concern. There is a script and it has 7 existing files(in a path say,. usr/appl/temp/file1.txt) and I need to create one new blank file say “file_count.txt” in the same script itself. Then the new file <file_count.txt> should store all the 7 filenames and...

8. Shell Programming and Scripting

How to Remove the new line character inbetween a record

I have a file, in which a single record spans across multiple lines, File 1 ==== 14|\n leave request \n accepted|Yes| 15|\n leave request not \n acccepted|No| I wanted to remove the '\n charecters. I used the below code (foudn somewhere in this forum) perl -e 'while (<>) { if...

9. Shell Programming and Scripting

How to remove duplicated lines?

Hi, if i have a file like this: Query=1 a a b c c c d Query=2 b b b c c e . . .

10. Shell Programming and Scripting

Join files, omit duplicated records from one file

Hello I have 2 files, eg more file1 file2 :::::::::::::: file1 :::::::::::::: 1 fromfile1 2 fromfile1 3 fromfile1 4 fromfile1 5 fromfile1 6 fromfile1 7 fromfile1 :::::::::::::: file2 :::::::::::::: 3 fromfile2 5 fromfile2

LEARN ABOUT SUSE

g3cat

g3cat(1)						       mgetty+sendfax manual							  g3cat(1)

NAME

       g3cat - concatenate multiple g3 documents

SYNOPSIS

       g3cat [-l] [-a] g3-file1 ...

DESCRIPTION

       g3cat  concatenates  g3	files.	These can either be 'raw', that is, bitmaps packed according to the CCITT T.4 standard for one-dimensional
       bitmap encoding, or 'digifax' files, created by GNU's GhostScript package with the digifax drivers. Its output is a  concatenation  of  all
       the input files, in raw G3 format, with two white lines in between.

       If a - is given as input file, stdin is used.

       If the input data is malformed, a warning is printed to stderr, and the output file will have a blank line at this place.

OPTIONS

       -l     separate files with a one-pixel wide black line.

       -h <blank lines>
	      specifies the number of blank lines g3cat should prepend to each page. Default is 0.

       -L <lines>
	      limit lenght of output page to maximum <lines> lines.

SPECIAL-CASE OPTIONS
       -w <width>
	      specifies  the  desired  page width in pixels per line. Default is 1728 PELs, and this is mandatory if you want to send the fax to a
	      standard fax machine.  If one of the input files doesn't match this line width (for example because it was created by  a	broken	G3
	      creator), a warning is printed, and the line width is transparently fixed.

       -a     byte-align the end-of-line codes (EOL) in the file. Every EOL will end at a byte boundary, that is, with a  01 byte.

       -p <pad>
	      specifies a minimum number of bytes that each output line must be padded to.  Padding is done with 0-bits before the EOL code.

       -R     suppress output of end-of-page code (RTC).

Example
       The following example will put a header line on a given g3 page, 'page1' and put the result into 'page2':

       echo '$header' | pbmtext | pbm2g3 | g3cat - page1 >page2

FILES

       --

BUGS

       Hopefully none :-).

SEE ALSO

       g32pbm(1), sendfax(8), faxspool(1)

AUTHORS

       g3cat is Copyright (C) 1993 by Gert Doering, <gert@greenie.muc.de>

greenie 							     27 Oct 93								  g3cat(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

remove duplicated xml record in a file under unix

Discussion started by: happyv

2. Shell Programming and Scripting

remove duplicated columns

Discussion started by: kamel.seg

3. Shell Programming and Scripting

Help to Add and Remove Records only from first line/last line

Discussion started by: enjoy

4. Shell Programming and Scripting

Sending e-mail of record counts in 3 or more files

Discussion started by: msrahman