05-13-2014
Quote:
Originally Posted by
yifangt
I did not forget this discussion, the left challenge is the speed. The script by vgersh99 took ~45hours to finish the file of 76,235 rows. Probably awk is not the best choice to process big file. Thank you all anyway!
I don't think it's a matter of a 'tool of choice', but rather of an algorithm to deal with the cached/stored matches in order to minimize the 'full table scan' for each newly found row/line....
I couldn't think of an easy non-convoluted way of minimizing the full table scan...
Maybe others would have better ideas.....
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
Hi all,
I have a file that contains a list of codes (shown below).
I want to 'uniq' the file using only the first field. Anyone know an easy way of doing it?
Cheers,
Dave
##### Input File #####
1xr1 1xws 1yxt 1yxu 1yxv 1yxx 2o3p 2o63 2o64 2o65
1xr1 1xws 1yxt 1yxv 1yxx 2o3p 2o63 2o64... (8 Replies)
Discussion started by: Digby
8 Replies
2. UNIX for Dummies Questions & Answers
Hi ;
I have a question regarding the uniq command in unix
How do I uniq 3rd field in a file ?
original file :
zoom coord 39 18652 39 18652
zoom coord 39 18653 39 18653
zoom coord 39 18818 39 18818
zoom coord 39 18840 39 18840
zoom coord 41 15096 41 15096
zoom... (1 Reply)
Discussion started by: babycakes
1 Replies
3. Shell Programming and Scripting
How can I use uniq on a certain field or what else could I use? If I want to use uniq on the second field and the output would remove one of the lines with a 5.
bob 5 hand
jane 3 leg
jon 4 head
chris 5 lungs (1 Reply)
Discussion started by: Bandit390
1 Replies
4. Shell Programming and Scripting
Anyone can help for filter the uniq record for below example? Thank you very much
Input file
20090503011111|test|abc
20090503011112|tet1|abc|def
20090503011112|test1|bcd|def
20090503011131|abc|abc
20090503011131|bbc|bcd
20090503011152|bcd|abc
20090503011151|abc|abc... (8 Replies)
Discussion started by: bleach8578
8 Replies
5. Shell Programming and Scripting
Hi New to unix.
I want to display only the unrepeated lines from a file using first field.
Ex:
1234 uname1 status1
1235 uname2 status2
1234 uname3 status3
1236 uname5 status5
I used
sort filename | uniq -u
output:
1234 uname1 status1
1235 uname2 status2
1234 uname3 status3
1236... (10 Replies)
Discussion started by: venummca
10 Replies
6. Shell Programming and Scripting
I have a flatfile A.txt
2012/12/04 14:06:07 |trees|Boards 2, 3|denver|mekong|mekong12
2012/12/04 17:07:22 |trees|Boards 2, 3|denver|mekong|mekong12
2012/12/04 17:13:27 |trees|Boards 2, 3|denver|mekong|mekong12
2012/12/04 14:07:39 |rain|Boards 1|tampa|merced|merced11
How do i sort and get... (3 Replies)
Discussion started by: sabercats
3 Replies
7. Shell Programming and Scripting
Hi All,
I am searching for a script which will produce an output file with the uniq first field with the second field having highest value among all the duplicates..
The output file will produce only the uniqs which are duplicate 3 times..
Input file
X 9
B 5
A 1
Z 9
T 4
C 9
A 4... (13 Replies)
Discussion started by: ailnilanjan
13 Replies
8. Shell Programming and Scripting
Hii,
I am reading data from files by defining path as *.log etc,
Files names are like app1a_test2_heep.log , cdc2a_test3_heep.log etc
How to configure logstash so that the part of string that is string before underscore (app1a, cdc2a..) should be grepped and added to host field and... (7 Replies)
Discussion started by: Ravi Kishore
7 Replies
9. Shell Programming and Scripting
Hi All,
I am trying to output uniq values per column. see file below. can you please assist? Thank you in advance.
cat names
joe allen ibm
joe smith ibm
joe allen google
joe smith google
rachel allen google
desired output is:
joe allen google
rachel smith ibm (5 Replies)
Discussion started by: Apollo
5 Replies
10. Shell Programming and Scripting
In the awk below I am trying to set/update the value of $14 in file2 in
bold, using the matching NM_ in $12 or $9 in file2
with the NM_ in $2 of file1.
The lengths of $9 and $12 can be variable but what is consistent is the start pattern
will always be NM_ and the end pattern is always ;... (2 Replies)
Discussion started by: cmccabe
2 Replies
UNIQ(1) User Commands UNIQ(1)
NAME
uniq - report or omit repeated lines
SYNOPSIS
uniq [OPTION]... [INPUT [OUTPUT]]
DESCRIPTION
Filter adjacent matching lines from INPUT (or standard input), writing to OUTPUT (or standard output).
With no options, matching lines are merged to the first occurrence.
Mandatory arguments to long options are mandatory for short options too.
-c, --count
prefix lines by the number of occurrences
-d, --repeated
only print duplicate lines
-D, --all-repeated[=delimit-method]
print all duplicate lines delimit-method={none(default),prepend,separate} Delimiting is done with blank lines
-f, --skip-fields=N
avoid comparing the first N fields
-i, --ignore-case
ignore differences in case when comparing
-s, --skip-chars=N
avoid comparing the first N characters
-u, --unique
only print unique lines
-z, --zero-terminated
end lines with 0 byte, not newline
-w, --check-chars=N
compare no more than N characters in lines
--help display this help and exit
--version
output version information and exit
A field is a run of blanks (usually spaces and/or TABs), then non-blank characters. Fields are skipped before chars.
Note: 'uniq' does not detect repeated lines unless they are adjacent. You may want to sort the input first, or use `sort -u' without
`uniq'. Also, comparisons honor the rules specified by `LC_COLLATE'.
AUTHOR
Written by Richard M. Stallman and David MacKenzie.
REPORTING BUGS
Report uniq bugs to bug-coreutils@gnu.org
GNU coreutils home page: <http://www.gnu.org/software/coreutils/>
General help using GNU software: <http://www.gnu.org/gethelp/>
Report uniq translation bugs to <http://translationproject.org/team/>
COPYRIGHT
Copyright (C) 2011 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.
SEE ALSO
comm(1), join(1)
The full documentation for uniq is maintained as a Texinfo manual. If the info and uniq programs are properly installed at your site, the
command
info coreutils 'uniq invocation'
should give you access to the complete manual.
GNU coreutils 8.12.197-032bb September 2011 UNIQ(1)