Sponsored Content
Top Forums Shell Programming and Scripting Filter uniq field values (non-substring) Post 302900799 by alister on Thursday 8th of May 2014 07:35:54 PM
Old 05-08-2014
Nevermind me. I didn't register the "next".

Regards,
Alister

---------- Post updated at 07:35 PM ---------- Previous update was at 06:59 PM ----------

Quote:
Originally Posted by vgersh99
see simplified version - with no deletes - just next-ing...
You are correct in correcting me; not every line is added. However, if, like he original problem, substrings can precede their superstrings, then your suggestion is inadequate.

Consider:
Code:
1 abcd    idx01    ijklm
2 abc    idx03    klm
3 abcd    idx05    jkl
4 cdef    idx06    ijklm
5 efgh    idx07    abcd
6 efg    idx09    abc
7 efx    idx11    abcd
8 fgh    idx12    bcd
9 fefx  blah  zabcdz

If, like the original data sample, substrings can precede their superstings, line 7 should be excluded because both its $2 and $4 are substrings of line 9. Your code won't catch that.

Again, I could be mistaken. yifangt has not been strictly comprehensive in describing the problem.

I hope my nitpicking isn't getting on your last nerve.

Regards,
Alister
This User Gave Thanks to alister For This Post:
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Uniq using only the first field

Hi all, I have a file that contains a list of codes (shown below). I want to 'uniq' the file using only the first field. Anyone know an easy way of doing it? Cheers, Dave ##### Input File ##### 1xr1 1xws 1yxt 1yxu 1yxv 1yxx 2o3p 2o63 2o64 2o65 1xr1 1xws 1yxt 1yxv 1yxx 2o3p 2o63 2o64... (8 Replies)
Discussion started by: Digby
8 Replies

2. UNIX for Dummies Questions & Answers

How to uniq third field in a file

Hi ; I have a question regarding the uniq command in unix How do I uniq 3rd field in a file ? original file : zoom coord 39 18652 39 18652 zoom coord 39 18653 39 18653 zoom coord 39 18818 39 18818 zoom coord 39 18840 39 18840 zoom coord 41 15096 41 15096 zoom... (1 Reply)
Discussion started by: babycakes
1 Replies

3. Shell Programming and Scripting

How to use uniq on a certain field?

How can I use uniq on a certain field or what else could I use? If I want to use uniq on the second field and the output would remove one of the lines with a 5. bob 5 hand jane 3 leg jon 4 head chris 5 lungs (1 Reply)
Discussion started by: Bandit390
1 Replies

4. Shell Programming and Scripting

filter the uniq record problem

Anyone can help for filter the uniq record for below example? Thank you very much Input file 20090503011111|test|abc 20090503011112|tet1|abc|def 20090503011112|test1|bcd|def 20090503011131|abc|abc 20090503011131|bbc|bcd 20090503011152|bcd|abc 20090503011151|abc|abc... (8 Replies)
Discussion started by: bleach8578
8 Replies

5. Shell Programming and Scripting

Uniq based on first field

Hi New to unix. I want to display only the unrepeated lines from a file using first field. Ex: 1234 uname1 status1 1235 uname2 status2 1234 uname3 status3 1236 uname5 status5 I used sort filename | uniq -u output: 1234 uname1 status1 1235 uname2 status2 1234 uname3 status3 1236... (10 Replies)
Discussion started by: venummca
10 Replies

6. Shell Programming and Scripting

Sort field and uniq

I have a flatfile A.txt 2012/12/04 14:06:07 |trees|Boards 2, 3|denver|mekong|mekong12 2012/12/04 17:07:22 |trees|Boards 2, 3|denver|mekong|mekong12 2012/12/04 17:13:27 |trees|Boards 2, 3|denver|mekong|mekong12 2012/12/04 14:07:39 |rain|Boards 1|tampa|merced|merced11 How do i sort and get... (3 Replies)
Discussion started by: sabercats
3 Replies

7. Shell Programming and Scripting

Printing uniq first field with the the highest second field

Hi All, I am searching for a script which will produce an output file with the uniq first field with the second field having highest value among all the duplicates.. The output file will produce only the uniqs which are duplicate 3 times.. Input file X 9 B 5 A 1 Z 9 T 4 C 9 A 4... (13 Replies)
Discussion started by: ailnilanjan
13 Replies

8. Shell Programming and Scripting

Grok filter to extract substring from path and add to host field in logstash

Hii, I am reading data from files by defining path as *.log etc, Files names are like app1a_test2_heep.log , cdc2a_test3_heep.log etc How to configure logstash so that the part of string that is string before underscore (app1a, cdc2a..) should be grepped and added to host field and... (7 Replies)
Discussion started by: Ravi Kishore
7 Replies

9. Shell Programming and Scripting

HELP - uniq values per column

Hi All, I am trying to output uniq values per column. see file below. can you please assist? Thank you in advance. cat names joe allen ibm joe smith ibm joe allen google joe smith google rachel allen google desired output is: joe allen google rachel smith ibm (5 Replies)
Discussion started by: Apollo
5 Replies

10. Shell Programming and Scripting

awk to update field using matching value in file1 and substring in field in file2

In the awk below I am trying to set/update the value of $14 in file2 in bold, using the matching NM_ in $12 or $9 in file2 with the NM_ in $2 of file1. The lengths of $9 and $12 can be variable but what is consistent is the start pattern will always be NM_ and the end pattern is always ;... (2 Replies)
Discussion started by: cmccabe
2 Replies
fnmatch(5)						Standards, Environments, and Macros						fnmatch(5)

NAME
fnmatch - file name pattern matching DESCRIPTION
The pattern matching notation described below is used to specify patterns for matching strings in the shell. Historically, pattern match- ing notation is related to, but slightly different from, the regular expression notation. For this reason, the description of the rules for this pattern matching notation is based on the description of regular expression notation described on the regex(5) manual page. Patterns Matching a Single Character The following patterns matching a single character match a single character: ordinary characters, special pattern characters and pattern bracket expressions. The pattern bracket expression will also match a single collating element. An ordinary character is a pattern that matches itself. It can be any character in the supported character set except for NUL, those spe- cial shell characters that require quoting, and the following three special pattern characters. Matching is based on the bit pattern used for encoding the character, not on the graphic representation of the character. If any character (ordinary, shell special, or pattern spe- cial) is quoted, that pattern will match the character itself. The shell special characters always require quoting. When unquoted and outside a bracket expression, the following three characters will have special meaning in the specification of patterns: ? A question-mark is a pattern that will match any character. * An asterisk is a pattern that will match multiple characters, as described in Patterns Matching Multiple Characters, below. [ The open bracket will introduce a pattern bracket expression. The description of basic regular expression bracket expressions on the regex(5) manual page also applies to the pattern bracket expression, except that the exclamation-mark character ( ! ) replaces the circumflex character (^) in its role in a non-matching list in the regular expression notation. A bracket expression starting with an unquoted circumflex character produces unspecified results. The restriction on a circumflex in a bracket expression is to allow implementations that support pattern matching using the circumflex as the negation character in addition to the exclamation-mark. A portable application must use something like [^!] to match either character. When pattern matching is used where shell quote removal is not performed (such as in the argument to the find -name primary when find is being called using one of the exec functions, or in the pattern argument to the fnmatch(3C) function, special characters can be escaped to remove their special meaning by preceding them with a backslash character. This escaping backslash will be discarded. The sequence \ rep- resents one literal backslash. All of the requirements and effects of quoting on ordinary, shell special and special pattern characters will apply to escaping in this context. Both quoting and escaping are described here because pattern matching must work in three separate circumstances: o Calling directly upon the shell, such as in pathname expansion or in a case statement. All of the following will match the string or file abc: abc "abc" a"b"c ac a[b]c a["b"]c a[]c a[""]c a?c a*c The following will not: "a?c" a*c a[b]c o Calling a utility or function without going through a shell, as described for find(1) and the function fnmatch(3C) o Calling utilities such as find, cpio, tar or pax through the shell command line. In this case, shell quote removal is performed before the utility sees the argument. For example, in: find /bin -name ec[h]o -print after quote removal, the backslashes are presented to find and it treats them as escape characters. Both precede ordinary characters, so the c and h represent themselves and echo would be found on many historical systems (that have it in /bin). To find a file name that con- tained shell special characters or pattern characters, both quoting and escaping are required, such as: pax -r ... "*a(?" to extract a filename ending with a(?. Conforming applications are required to quote or escape the shell special characters (sometimes called metacharacters). If used without this protection, syntax errors can result or implementation extensions can be triggered. For example, the KornShell supports a series of extensions based on parentheses in patterns; see ksh(1) Patterns Matching Multiple Characters The following rules are used to construct patterns matching multiple characters from patterns matching a single character: o The asterisk (*) is a pattern that will match any string, including the null string. o The concatenation of patterns matching a single character is a valid pattern that will match the concatenation of the single charac- ters or collating elements matched by each of the concatenated patterns. o The concatenation of one or more patterns matching a single character with one or more asterisks is a valid pattern. In such patterns, each asterisk will match a string of zero or more characters, matching the greatest possible number of characters that still allows the remainder of the pattern to match the string. Since each asterisk matches zero or more occurrences, the patterns a*b and a**b have identical functionality. Examples: a[bc] matches the strings ab and ac. a*d matches the strings ad, abd and abcd, but not the string abc. a*d* matches the strings ad, abcd, abcdef, aaaad and adddd. *a*d matches the strings ad, abcd, efabcd, aaaad and adddd. Patterns Used for Filename Expansion The rules described so far in Patterns Matching Multiple Characters and Patterns Matching a Single Character are qualified by the following rules that apply when pattern matching notation is used for filename expansion. 1. The slash character in a pathname must be explicitly matched by using one or more slashes in the pattern; it cannot be matched by the asterisk or question-mark special characters or by a bracket expression. Slashes in the pattern are identified before bracket expres- sions; thus, a slash cannot be included in a pattern bracket expression used for filename expansion. For example, the pattern a[b/c]d will not match such pathnames as abd or a/d. It will only match a pathname of literally a[b/c]d. 2. If a filename begins with a period (.), the period must be explicitly matched by using a period as the first character of the pattern or immediately following a slash character. The leading period will not be matched by: o the asterisk or question-mark special characters o a bracket expression containing a non-matching list, such as: [!a] a range expression, such as: [%-0] or a character class expression, such as: [[:punct:]] It is unspecified whether an explicit period in a bracket expression matching list, such as: [.abc] can match a leading period in a filename. 3. Specified patterns are matched against existing filenames and pathnames, as appropriate. Each component that contains a pattern char- acter requires read permission in the directory containing that component. Any component, except the last, that does not contain a pat- tern character requires search permission. For example, given the pattern: /foo/bar/x*/bam search permission is needed for directories / and foo, search and read permissions are needed for directory bar, and search permission is needed for each x* directory. If the pattern matches any existing filenames or pathnames, the pattern will be replaced with those filenames and pathnames, sorted according to the collating sequence in effect in the current locale. If the pattern contains an invalid bracket expression or does not match any existing filenames or pathnames, the pattern string is left unchanged. SEE ALSO
find(1), ksh(1), fnmatch(3C), regex(5) SunOS 5.10 28 Mar 1995 fnmatch(5)
All times are GMT -4. The time now is 09:44 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy