Find duplicates in 2 & 3rd column and their ID Post: 303000201

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find duplicates in the first column of text file

Hello, My text file has input of the form abc dft45.xml ert rt653.xml abc ert57.xml I need to write a perl script/shell script to find duplicates in the first column and write it into a text file of the form... abc dft45.xml abc ert57.xml Can some one help me plz?

2. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Given a file such as this I need to remove the duplicates. 00060011 PAUL BOWSTEIN ad_waq3_921_20100826_010517.txt 00060011 PAUL BOWSTEIN ad_waq3_921_20100827_010528.txt 0624-01 RUT CORPORATION ad_sade3_10_20100827_010528.txt 0624-01 RUT CORPORATION ...

3. Shell Programming and Scripting

AWK script to create max value of 3rd column, grouping by first column

Hi, I need an awk script (or whatever shell-construct) that would take data like below and get the max value of 3 column, when grouping by the 1st column. clientname,day-of-month,max-users ----------------------------------- client1,20120610,5 client2,20120610,2 client3,20120610,7...

4. Shell Programming and Scripting

Find duplicates in column 1 and merge their lines (awk?)

Hi, I have a file (sorted by sort) with 8 tab delimited columns. The first column contains duplicated fields and I need to merge all these identical lines. My input file: comp100002 aaa bbb ccc ddd eee fff ggg comp100003 aba aba aba aba aba aba aba comp100003 fff fff fff fff fff fff fff...

5. UNIX for Dummies Questions & Answers

Search word in 3rd column and move it to next column (4th)

Hi, I have a file with +/- 13000 lines and 4 column. I need to search the 3rd column for a word that begins with "SAP-" and move/skip it to the next column (4th). Because the 3rd column need to stay empty. Thanks in advance.:) 89653 36891 OTR-60 SAP-2 89653 36892 OTR-10 SAP-2...

6. Shell Programming and Scripting

Find smallest & largest in every column

Dear All, I have input like this, J_15TEST_ASH05_33A22.13885.txt: $$ 1 MAKE SP1501 1 1 4 6101 7392 2 2442 2685 18 3201 4008 20 120 4158 J_15TEST_ASH05_33A22.13885.txt: $$ 1 MAKE SP1502 1 1 4 5125 6416 2 ...

7. Shell Programming and Scripting

Changing values only in 3rd column and 4th column

#cat file testing test! nipw asdkjasjdk ok! what !ok host server1 check_ssh_disk!102.56.1.101!30!50!/ other host server 2 des check_ssh_disk!192.6.1.10!40!30!/ #grep check file| awk -F! '{print $3,$4}'|awk '{gsub($1,"",$1)}1' 50 30 # Output:

8. Shell Programming and Scripting

Solution for replacement of 4th column with 3rd column in a file using awk/sed preserving delimters

input "A","B","C,D","E","F" "S","T","U,V","W","X" "AA","BB","CC,DD","EEEE","FFF" required output: "A","B","C,D","C,D","F" "S", T","U,V","U,V","X" "AA","BB","CC,DD","CC,DD","FFF" tried using awk but double quotes not preserving for every field. any help to solve this is much...

9. Shell Programming and Scripting

awk to Sum columns when other column has duplicates and append one column value to another with Care

Hi Experts, Please bear with me, i need help I am learning AWk and stuck up in one issue. First point : I want to sum up column value for column 7, 9, 11,13 and column15 if rows in column 5 are duplicates.No action to be taken for rows where value in column 5 is unique. Second point : For...

10. UNIX for Beginners Questions & Answers

UNIX script to compare 3rd column value with first column and display

Hello Team, My source data (INput) is like below EPIC1 router EPIC2 Targetdefinition Exp1 Expres rtr1 Router SQL SrcQual Exp1 Expres rtr1 Router EPIC1 Targetdefinition My output like SQL SrcQual Exp1 Expres Exp1 Expres rtr1 Router rtr1 Router EPIC1 Targetdefinition...

LEARN ABOUT ULTRIX

sort5

sort5(1) General Commands Manual sort5(1)

Name
sort5 - internationalized System 5 sort and/or merge files

Syntax
sort5 [-cmu] [-ooutput] [-ykmem] [-zrecsz] [-X] [-dfiMnr] [-btx] [+pos1 [-pos2]] [files]

Description
The command sorts lines of the named files together and writes the result on the standard output. The standard input is read if a hyphen
(-) is used as a file name or if no input files are named.

Comparisons are based on one or more sort keys extracted from each line of input. By default, there is one sort key, the entire input
line, and ordering is determined by the collating sequence specified by the LC_COLLATE locale. The LC_COLLATE locale is controlled by the
settings of either the LANG or LC_COLLATE environment variables. See for more information.

Options
The following options alter the default behavior:

-c Checks that the input file is sorted according to the ordering rules; gives no output unless the file is out of order.

-m Merges only; the input files are already sorted.

-u Suppresses all but one in each set of lines having equal keys.

-ooutput
Specifies the name of an output file to use instead of the standard output. The file may be the same as one of the inputs. Blanks
between -o and output are optional.

-ykmem
Specifies the number of kilobytes of memory to use when sorting a file. If this option is omitted, sort5 begins using a system
default memory size, and continues to use more space as needed. If kmem is specified, sort5 starts using that number of kilobytes of
memory. If the administrative minimum or maximum is violated, the value of the corresponding minimum or maximum is used. Thus, -y0
is guaranteed to start with minimum memory. By convention, -y (with no argument) starts with maximum memory.

-zrecsz
Records the size of the longest line read in the sort phase so buffers can be allocated during the merge phase. If the sort phase is
omitted using either the -c or -m options, a system default size is used. Lines longer than the buffer size cause to terminate abnor-
mally. Supplying the actual number of bytes (or some larger value) in the longest line to be merged prevents abnormal termination.

-X Sorts using tags. Upon input each key is converted to a tag value which is sorted efficiently. This option makes international sorting
faster but it consumes more memory since both key and tag must be stored.

The following options override the default ordering rules:

-d Specifies Dictionary order. Only letters, digits and blanks (spaces and tabs) are significant in comparisons.

-f Folds lower case letters into upper case.

-i Ignores characters outside the ASCII range 040-0176 in non-numeric comparisons.

-n Sorts an initial numeric string, consisting of optional blanks, optional minus sign, and zero or more digits with optional decimal
point, by arithmetic value. The -n option implies the -b option, which tells the command to ignore leading blanks when determining
the starting and ending positions of a restricted sort key.

-r Reverses the sense of comparisons.

When ordering options appear before restricted sort key specifications, the requested ordering rules are applied globally to all sort keys.
When attached to a specific sort key (described below), the specified ordering options override all global ordering options for that key.

The notation +pos1 -pos2 restricts a sort key to one beginning at pos1 and ending at pos2. The characters at positions pos1 and pos2 are
included in the sort key (provided that pos2 does not precede pos1). A missing -pos2 means the end of the line.

Specifying pos1 and pos2 involves the notion of a field, that is a minimal sequence of characters followed by a field separator or a new-
line. By default, the first blank of a sequence of blanks acts as the field separator. The blank can be either a space or a tab. All
blanks in a sequence of blanks are interpreted as a part of the next field; for example, all blanks at the beginning of a line are consid-
ered to be part of the first field. The treatment of field separators is altered using the following options:

-tx Uses x as the field separator character. Although it may be included in a sort key, x is not considered part of a field. Each occur-
rence of x is significant (for example, xx delimits an empty field).

-b Ignores leading blanks when determining the starting and ending positions of a restricted sort key. If the -b option is specified
before the first +pos1 argument, it is applied to all +pos1 arguments. Otherwise, the b flag may be attached independently to each
+pos1 or -pos2 argument.

Pos1 and pos2 each have the form m.n optionally followed by one or more of the flags bdfinr. A starting position specified by +m.n is
interpreted to mean the n+1st character in the m+1st field. A missing .n means .0, indicating the first character of the m+1st field. If
the b flag is in effect n is counted from the first non-blank in the m+1st field; +m.0b refers to the first non-blank character in the
m+1st field.

A last position specified by -m.n is interpreted to mean the nth character (including separators) after the last character of the m th
field. A missing .n means .0, indicating the last character of the mth field. If the b flag is in effect n is counted from the last lead-
ing blank in the m+1st field; -m.1b refers to the first non-blank in the m+1st field.

When there are multiple sort keys, later keys are compared only after all earlier keys are found to be equal. Lines that otherwise compare
equal are ordered with all bytes significant.

Examples
Sort the contents of infile with the second field as the sort key:

sort5 +1 -2 infile

Sort, in reverse order, the contents of infile1 and infile2, placing the output in outfile and using the first character of the second
field as the sort key:

sort5 -r -o outfile +1.0 -1.2 infile1 infile2

Sort, in reverse order, the contents of infile1 and infile2 using the first non-blank character of the second field as the sort key:

sort5 -r +1.0b -1.1b infile1 infile2

Print the password file sorted by the numeric user ID (the third colon-separated field):

sort5 -t: +2n -3 /etc/passwd

Print the lines of the already sorted file infile, suppressing all but the first occurrence of lines having the same third field (the
options -um with just one input file make the choice of a unique representative from a set of equal lines predictable):

sort5 -um +2 -3 infile

Diagnostics
Comments and exits with non-zero status for various trouble conditions (for example, when input lines are too long), and for disorder dis-
covered under the -c option.

When the last line of an input file is missing a new-line character, sort5 appends one, prints a warning message, and continues.

Files
/usr/tmp/stm???

See Also
comm(1), join(1), uniq(1), setlocale(3int), strcoll(3int)

sort5(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find duplicates in the first column of text file

Discussion started by: gameboy87

2. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Discussion started by: script_op2a

3. Shell Programming and Scripting

AWK script to create max value of 3rd column, grouping by first column

Discussion started by: ckmehta

4. Shell Programming and Scripting

Find duplicates in column 1 and merge their lines (awk?)

Discussion started by: falcox