12-15-2008
Thanks a lot for your amazing code!
But it worked for sample data I have given.
Your first code is giving following error:
awk: 0602-590 Internal software error in the tostring function on
and second code really worked:
It took 3 min 16 seconds to process 3407871 records
Really cool! I was breaking my head in sort and uniq command !
Once again thank you!
Regards
Sumit
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi All,
I want to delete duplicate records from a tilde delimited file. Criteria is considering the first 2 fields, the combination of which has to be unique, below is a sample of records in the input file
1620000010338~2446694087~0~20061130220000~A00BCC1CT... (5 Replies)
Discussion started by: irshadm
5 Replies
2. Shell Programming and Scripting
hii i have a huge amt of data stored in a file.Here in this file i need to remove duplicates rows in such a way that the last column has different data & i must check for greatest among last colmn data & print the largest data along with other entries but just one of other duplicate entries is... (16 Replies)
Discussion started by: reva
16 Replies
3. UNIX for Dummies Questions & Answers
Hey all, a relative bash/script newbie trying solve a problem.
I've got a text file with lots of lines that I've been able to clean up and format with awk/sed/cut, but now I'd like to remove the lines with duplicate usernames based on time stamp. Here's what the data looks like
2007-11-03... (3 Replies)
Discussion started by: mattv
3 Replies
4. UNIX for Dummies Questions & Answers
if the key (A or B or ...others) has 4 in its 3rd column the 1st A row has to form 4 dupicates along with the all the values of A in 4th column (2.9, 3.8, 4.2) .
Hope I explain the question clearly.
Cheers
Ruby
input
"A" 1 4 2.9
"A" 2 5 ... (7 Replies)
Discussion started by: ruby_sgp
7 Replies
5. Shell Programming and Scripting
Hello gurus,
I am new to "awk" and trying to break a large file having 4 million records into several output files each having half million but at the same time I want to keep the similar key records in the same output file, not to exist accross the files.
e.g. my data is like:
Row_Num,... (6 Replies)
Discussion started by: kam66
6 Replies
6. Shell Programming and Scripting
Hi,
I want to remove duplicate records including the first line based on column1. For example
inputfile(filer.txt):
-------------
1,3000,5000
1,4000,6000
2,4000,600
2,5000,700
3,60000,4000
4,7000,7777
5,999,8888
expected output:
----------------
3,60000,4000
4,7000,7777... (5 Replies)
Discussion started by: G.K.K
5 Replies
7. Shell Programming and Scripting
I was reading this thread. It looks like a simpler way to say this is to only keep uniq lines based on field or column 1.
https://www.unix.com/shell-programming-scripting/165717-removing-duplicate-records-file-based-single-column.html
Can someone explain this command please? How are there no... (5 Replies)
Discussion started by: cokedude
5 Replies
8. UNIX for Dummies Questions & Answers
Hi,
To load a big file in a table,I have a make sure that all rows in the file has same number of the columns .
So in my file if I am getting any rows which have columns not equal to 6 , I need to delete it . Delimiter is space and columns are optionally enclosed by "".
This can be ... (1 Reply)
Discussion started by: hemantraijain
1 Replies
9. Shell Programming and Scripting
Hello
I have been trying to remove a row from a file which has the same first three columns as another row - I have tried lots of different combinations of suggestion on this forum but can't get it exactly right.
what I have is
900 - 1000 = 0
900 - 1000 = 2562
1000 - 1100 = 0
1000 - 1100... (7 Replies)
Discussion started by: tinytimmay
7 Replies
10. Shell Programming and Scripting
Hi,
I have an input file as shown below:
20140102;13:30;FR-AUD-LIBOR-1W;2.495
20140103;13:30;FR-AUD-LIBOR-1W;2.475
20140106;13:30;FR-AUD-LIBOR-1W;2.495
20140107;13:30;FR-AUD-LIBOR-1W;2.475
20140108;13:30;FR-AUD-LIBOR-1W;2.475
20140109;13:30;FR-AUD-LIBOR-1W;2.475... (2 Replies)
Discussion started by: shash
2 Replies
LEARN ABOUT OPENDARWIN
uniq
UNIQ(1) BSD General Commands Manual UNIQ(1)
NAME
uniq -- report or filter out repeated lines in a file
SYNOPSIS
uniq [-c | -d | -u] [-i] [-f num] [-s chars] [input_file [output_file]]
DESCRIPTION
The uniq utility reads the specified input_file comparing adjacent lines, and writes a copy of each unique input line to the output_file. If
input_file is a single dash ('-') or absent, the standard input is read. If output_file is absent, standard output is used for output. The
second and succeeding copies of identical adjacent input lines are not written. Repeated lines in the input will not be detected if they are
not adjacent, so it may be necessary to sort the files first.
The following options are available:
-c Precede each output line with the count of the number of times the line occurred in the input, followed by a single space.
-d Only output lines that are repeated in the input.
-f num Ignore the first num fields in each input line when doing comparisons. A field is a string of non-blank characters separated from
adjacent fields by blanks. Field numbers are one based, i.e. the first field is field one.
-s chars
Ignore the first chars characters in each input line when doing comparisons. If specified in conjunction with the -f option, the
first chars characters after the first num fields will be ignored. Character numbers are one based, i.e. the first character is
character one.
-u Only output lines that are not repeated in the input.
-i Case insensitive comparison of lines.
DIAGNOSTICS
The uniq utility exits 0 on success, and >0 if an error occurs.
COMPATIBILITY
The historic +number and -number options have been deprecated but are still supported in this implementation.
SEE ALSO
sort(1)
STANDARDS
The uniq utility is expected to be IEEE Std 1003.2 (``POSIX.2'') compatible.
HISTORY
A uniq command appeared in Version 3 AT&T UNIX.
BSD
June 6, 1993 BSD