Since the strings tested aren't regular expressions, using the regular expression operator is, at best, unnecessarily expensive. At worst, if the strings are allowed to contain regular expression metacharacters, it can lead to an erroneous result.
I suggest using index() instead. For non-trivial data sets, it will also speed things up dramatically.
Testing a near-worst case scenario. The file contains 1501 lines and only the last line contains a string which is a substring of another. Note that while gawk is used, testing with mawk and nawk showed similar improvements:
Regards,
Alister
These 4 Users Gave Thanks to alister For This Post:
Hi all,
I have a file that contains a list of codes (shown below).
I want to 'uniq' the file using only the first field. Anyone know an easy way of doing it?
Cheers,
Dave
##### Input File #####
1xr1 1xws 1yxt 1yxu 1yxv 1yxx 2o3p 2o63 2o64 2o65
1xr1 1xws 1yxt 1yxv 1yxx 2o3p 2o63 2o64... (8 Replies)
Hi ;
I have a question regarding the uniq command in unix
How do I uniq 3rd field in a file ?
original file :
zoom coord 39 18652 39 18652
zoom coord 39 18653 39 18653
zoom coord 39 18818 39 18818
zoom coord 39 18840 39 18840
zoom coord 41 15096 41 15096
zoom... (1 Reply)
How can I use uniq on a certain field or what else could I use? If I want to use uniq on the second field and the output would remove one of the lines with a 5.
bob 5 hand
jane 3 leg
jon 4 head
chris 5 lungs (1 Reply)
Anyone can help for filter the uniq record for below example? Thank you very much
Input file
20090503011111|test|abc
20090503011112|tet1|abc|def
20090503011112|test1|bcd|def
20090503011131|abc|abc
20090503011131|bbc|bcd
20090503011152|bcd|abc
20090503011151|abc|abc... (8 Replies)
Hi New to unix.
I want to display only the unrepeated lines from a file using first field.
Ex:
1234 uname1 status1
1235 uname2 status2
1234 uname3 status3
1236 uname5 status5
I used
sort filename | uniq -u
output:
1234 uname1 status1
1235 uname2 status2
1234 uname3 status3
1236... (10 Replies)
I have a flatfile A.txt
2012/12/04 14:06:07 |trees|Boards 2, 3|denver|mekong|mekong12
2012/12/04 17:07:22 |trees|Boards 2, 3|denver|mekong|mekong12
2012/12/04 17:13:27 |trees|Boards 2, 3|denver|mekong|mekong12
2012/12/04 14:07:39 |rain|Boards 1|tampa|merced|merced11
How do i sort and get... (3 Replies)
Hi All,
I am searching for a script which will produce an output file with the uniq first field with the second field having highest value among all the duplicates..
The output file will produce only the uniqs which are duplicate 3 times..
Input file
X 9
B 5
A 1
Z 9
T 4
C 9
A 4... (13 Replies)
Hii,
I am reading data from files by defining path as *.log etc,
Files names are like app1a_test2_heep.log , cdc2a_test3_heep.log etc
How to configure logstash so that the part of string that is string before underscore (app1a, cdc2a..) should be grepped and added to host field and... (7 Replies)
Hi All,
I am trying to output uniq values per column. see file below. can you please assist? Thank you in advance.
cat names
joe allen ibm
joe smith ibm
joe allen google
joe smith google
rachel allen google
desired output is:
joe allen google
rachel smith ibm (5 Replies)
In the awk below I am trying to set/update the value of $14 in file2 in
bold, using the matching NM_ in $12 or $9 in file2
with the NM_ in $2 of file1.
The lengths of $9 and $12 can be variable but what is consistent is the start pattern
will always be NM_ and the end pattern is always ;... (2 Replies)
Discussion started by: cmccabe
2 Replies
LEARN ABOUT PLAN9
tail
TAIL(1) General Commands Manual TAIL(1)NAME
tail - deliver the last part of a file
SYNOPSIS
tail [ +-number[lbc][rf] ] [ file ]
tail [ -fr ] [ -n nlines ] [ -c ncharacters ] [ file ]
DESCRIPTION
Tail copies the named file to the standard output beginning at a designated place. If no file is named, the standard input is copied.
Copying begins at position +number measured from the beginning, or -number from the end of the input. Number is counted in lines, 1K
blocks or characters, according to the appended flag or Default is -10l (ten ell).
The further flag causes tail to print lines from the end of the file in reverse order; (follow) causes tail, after printing to the end, to
keep watch and print further data as it appears.
The second syntax is that promulgated by POSIX, where the numbers rather than the options are signed.
EXAMPLES
tail file
Print the last 10 lines of a file.
tail +0f file
Print a file, and continue to watch data accumulate as it grows.
sed 10q file
Print the first 10 lines of a file.
SOURCE
/sys/src/cmd/tail.c
BUGS
Tails relative to the end of the file are treasured up in a buffer, and thus are limited in length.
According to custom, option +number counts lines from 1, and counts blocks and characters from 0.
TAIL(1)