Hi nezabudka,
Nice approach. My code counts the number of lines output and ignores the value originally found in the "T" line field #2; your code subtracts the number of duplicates found.
If there were to be input files with multiple "T" lines, mine would output all of them each containing the number of unique "D" lines seen up to that point while yours will only print the first one found. I assume that an input file will only contain one "T" line, so this difference shouldn't matter.
If there are lines other than "D" and "T" lines, my code will copy them to the output but not include them in the count included in the "T" line; your code will include a count of non-duplicated non-"D" (except for the first "T" line) in its calculations. I have no idea whether or not the actual data to be processed might contain any header lines that should not be included in the in the "T" line output. If header lines are present and should be ignored in the "T" line output, that should have been mentioned in the requirements.
Note that your code replaces the commas in the "T" line output with <space>s because you didn't set OFS to a comma.
These 2 Users Gave Thanks to Don Cragun For This Post:
Hi,
If i have a file with xml format, i would like to remove duplicated records and save to a new file. Is it possible...to write script to do it? (8 Replies)
hi all,
i have a file contain multicolumns, this file is sorted by col2 and col3.
i want to remove the duplicated columns if the col2 and col3 are the same in another line.
example
fileA
AA BB CC DD
CC XX CC DD
BB CC ZZ FF
DD FF HH HH
the output is
AA BB CC DD
BB CC ZZ FF... (6 Replies)
Hi,
I need help with a maybe total simple issue but somehow I am not getting it.
I am not able to etablish a sed or awk command which is adding to the first line in a text and removing only from the last line the ",".
The file is looking like follow:
TABLE1,
TABLE2,
.
.
.
TABLE99,... (4 Replies)
I am trying to load data into 3 tables simultaneously (which is working fine). Then when loaded, it should count the total number of records in all the 3 input files and send an e-mail to the user.
The script is working fine, as far as loading all the 3 input files into the database tables, but... (3 Replies)
Hi Gurus,
I need to cut single record in the file(asdf) to multile records based on the number of bytes..(44 characters). So every record will have 44 characters. All the records should be in the same file..to each of these lines I need to add the folder(<date>) name.
I have a dir. in which... (20 Replies)
HI ,
I am having a huge comma delimiter file, I have to append the following four lines before the starting of the file through a shell script.
FILE NAME = TEST_LOAD
DATETIME = CURRENT DATE TIME
LOAD DATE = CURRENT DATE
RECORD COUNT = TOTAL RECORDS IN FILE
Source data
1,2,3,4,5,6,7... (7 Replies)
Hi,
I need help regarding below concern.
There is a script and it has 7 existing files(in a path say,. usr/appl/temp/file1.txt) and I need to create one new blank file say “file_count.txt” in the same script itself.
Then the new file <file_count.txt> should store all the 7 filenames and... (1 Reply)
I have a file, in which a single record spans across multiple lines,
File 1
====
14|\n
leave request \n
accepted|Yes|
15|\n
leave request not \n
acccepted|No|
I wanted to remove the '\n charecters. I used the below code (foudn somewhere in this forum)
perl -e 'while (<>) { if... (1 Reply)
getcol(1) General Commands Manual getcol(1)Name
getcol - Extract specified columns from an ASCII table file
Synopsis
getcol [-amv][-n num][-r lines][-s num] filename [column number range]
Description
Extract specified columns from an ASCII table file
Options
filename
Name of a ASCII table file. At least one of these must be present for any values to be printed. If it is stdin or STDIN, an ASCII
table is expected as standard input. If there is no input file, standard input is assumed.
@filename
Name of a file containing a list of ASCII table files. If this is present, any other file names on the command line will be
ignored.
field range
Print value of these columns for the number of lines of the table specified by the -n argument after the skippiing the number of
lines specified by the -s argument. A value of 0 causes the entire input line to be printed.
-a Sum all numeric columns selected, printing the sum on the line following the result. Columns with no sum are filled with ___.
(Added in version 2.6.9)
-b Input is bar-separate table file
-c Add count of number of lines in each column at end
-d <number>
Number of decimal places in f.p. output
-e Compute medians of selected columns
-f Print range of values in selected columns
-h Print Starbase tab table header
-i Input is tab-separate table file
-k Print number of columns on first line
-l <number>
Number of lines to add to each line
-m Compute the means of all numeric columns selected, printing the mean on the line following the result (or the line following the sum
if -a is used). Columns with no mean are filled with ___. (Added in version 2.6.9)
-n num Print selected columns for this many lines. If not specified, all lines will be read after the number of lines specified by -s have
been skipped.
-o OR conditions insted of ANDing them
-p Print only sum, mmean, sigma, median, or range, not entries
-r @listfile
-r line range Print columns from the lines specified as either the first nonzero number on each line of the file listfile or the
comma- and hyphen- delimitied range; i.e. 1-5,10-12 will print values from lines 1, 2, 3, 4, 5, 10, 11, and 12. (added in version
2.6.12)
-s num Skip this many line before starting to print values. If not specified, no lines will be skipped.
-t Starbase (tab-separated) table output
-v Print more information about process.
Web Page
http://tdc-www.harvard.edu/software/wcstools/getcol.html
Author
Doug Mink, SAO (dmink@cfa.harvard.edu)
8 November 2001 WCSTools getcol(1)