Sponsored Content
Full Discussion: Sort and Remove duplicates
Top Forums Shell Programming and Scripting Sort and Remove duplicates Post 302935490 by RudiC on Tuesday 17th of February 2015 05:09:42 AM
Old 02-17-2015
A few comments on your statement:

- the -c option would not sort:
Quote:
-c, --check, --check=diagnose-first
check for sorted input; do not sort
- as Don Cragun surmises, any white space before char 96 would count the fields up and destroy your key definitions. Set the terminator to an exotic char with -t
- you can use the short form -k more than one time in a statement
- if lines longer than 250 chars can occur (again DC'c suspicion), your printf format will expand the line; use the precision field as well: "%-250.250s\n"
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing duplicates [sort , uniq]

Hey Guys, I have file which looks like this, Contig201#numbPA Contig1452#nmdynD6PA dm022p15.r#CG6461PA dm005e16.f#SpatPA IGU001_0015_A06.f#CG17593PA I need to remove duplicates based on the chracter matching upto '#'. for example if we consider this.. Contig201#numbPA... (4 Replies)
Discussion started by: sharatz83
4 Replies

2. Shell Programming and Scripting

Sort, Uniq, Duplicates

Input File is : ------------- 25060008,0040,03, 25136437,0030,03, 25069457,0040,02, 80303438,0014,03,1st 80321837,0009,03,1st 80321977,0009,03,1st 80341345,0007,03,1st 84176527,0047,03,1st 84176527,0047,03, 20000735,0018,03,1st 25060008,0040,03, I am using the following in the script... (5 Replies)
Discussion started by: Amruta Pitkar
5 Replies

3. UNIX for Dummies Questions & Answers

removing duplicates and sort -k

Hello experts, I am trying to remove all lines in a csv file where the 2nd columns is a duplicate. I am try to use sort with the key parameter sort -u -k 2,2 File.csv > Output.csv File.csv File Name|Document Name|Document Title|Organization Word Doc 1.doc|Word Document|Sample... (3 Replies)
Discussion started by: orahi001
3 Replies

4. UNIX for Dummies Questions & Answers

sort and find duplicates for files with no white space

example data 5666700842511TAfmoham03151008075205999900000001000001000++ 5666700843130MAfmoham03151008142606056667008390315100005001 6666666663130MAfmoham03151008142606056667008390315100005001 I'd like to sort on position 10-14 where the characters are eq "130MA". Then based on positions... (0 Replies)
Discussion started by: mmarshall
0 Replies

5. Shell Programming and Scripting

remove duplicates and sort

Hi, I'm using the below command to sort and remove duplicates in a file. But, i need to make this applied to the same file instead of directing it to another. Thanks (6 Replies)
Discussion started by: dvah
6 Replies

6. Shell Programming and Scripting

Sort data by date first and then remove duplicates

Hi , I have below data inside a file named ref.psv . I want to create a shell script which will do the below 2 points : (1) sort the file content first based on the latest date which is the last column in the file (actual file its the 175th column) (2)after sorting the file based on latest date... (3 Replies)
Discussion started by: samrat dutta
3 Replies

7. Shell Programming and Scripting

Bash - remove duplicates without sort

I need to use bash to remove duplicates without using sort first. I can not use: cat file | sort | uniq But when I use only cat file | uniq some duplicates are not removed. (4 Replies)
Discussion started by: locoroco
4 Replies

8. Shell Programming and Scripting

Help in modifying a PERL script to sort Singletons and Duplicates

I have a large database which has the following structure a=b where a is one language and b is the other and = is the delimiter Since the data treats of language, homographs occur i.e. the same word on the left hand side can map in two different entries to two different glosses on the right... (3 Replies)
Discussion started by: gimley
3 Replies

9. UNIX for Beginners Questions & Answers

Sort and remove duplicates in directory based on first 5 columns:

I have /tmp dir with filename as: 010020001_S-FOR-Sort-SYEXC_20160229_2212101.marker 010020001_S-FOR-Sort-SYEXC_20160229_2212102.marker 010020001-S-XOR-Sort-SYEXC_20160229_2212104.marker 010020001-S-XOR-Sort-SYEXC_20160229_2212105.marker 010020001_S-ZOR-Sort-SYEXC_20160229_2212106.marker... (4 Replies)
Discussion started by: gnnsprapa
4 Replies

10. Shell Programming and Scripting

Concatenate and sort to remove duplicates

Following is the input. 1st and 3rd block are same(block starts here with '*' and ends before blank line) , 2nd and 4th blocks are also the same: cat <file> * Wed Feb 24 2016 Tariq Saeed <tariq.x.saeed@mail.com> 2.0.7-1.0.7 - add vmcore dump support for ocfs2 * Mon Jun 8 2015 Brian Maly... (4 Replies)
Discussion started by: Paras Pandey
4 Replies
SORT(1) 						      General Commands Manual							   SORT(1)

NAME
sort - sort lines of text files SYNOPSIS
sort [-cmus] [-t separator] [-o output-file] [-T tempdir] [-bdfiMnr] [+POS1 [-POS2]] [-k POS1[,POS2]] [file...] sort {--help,--version} DESCRIPTION
This manual page documents the GNU version of sort. sort sorts, merges, or compares all the lines from the given files, or the standard input if no files are given. A file name of `-' means standard input. By default, sort writes the results to the standard output. sort has three modes of operation: sort (the default), merge, and check for sortedness. The following options change the operation mode: -c Check whether the given files are already sorted: if they are not all sorted, print an error message and exit with a status of 1. -m Merge the given files by sorting them as a group. Each input file should already be individually sorted. It always works to sort instead of merge; merging is provided because it is faster, in the case where it works. A pair of lines is compared as follows: if any key fields have been specified, sort compares each pair of fields, in the order specified on the command line, according to the associated ordering options, until a difference is found or no fields are left. If any of the global options Mbdfinr are given but no key fields are specified, sort compares the entire lines according to the global options. Finally, as a last resort when all keys compare equal (or if no ordering options were specified at all), sort compares the lines byte by byte in machine collating sequence. The last resort comparison honors the -r global option. The -s (stable) option disables this last- resort comparison so that lines in which all fields compare equal are left in their original relative order. If no fields or global options are specified, -s has no effect. GNU sort has no limits on input line length or restrictions on bytes allowed within lines. In addition, if the final byte of an input file is not a newline, GNU sort silently supplies one. If the environment variable TMPDIR is set, sort uses it as the directory in which to put temporary files instead of the default, /tmp. The -T tempdir option is another way to select the directory for temporary files; it overrides the environment variable. The following options affect the ordering of output lines. They may be specified globally or as part of a specific key field. If no key fields are specified, global options apply to comparison of entire lines; otherwise the global options are inherited by key fields that do not specify any special options of their own. -b Ignore leading blanks when finding sort keys in each line. -d Sort in `phone directory' order: ignore all characters except letters, digits and blanks when sorting. -f Fold lower case characters into the equivalent upper case characters when sorting so that, for example, `b' is sorted the same way `B' is. -i Ignore characters outside the ASCII range 040-0176 octal (inclusive) when sorting. -M An initial string, consisting of any amount of white space, followed by three letters abbreviating a month name, is folded to UPPER case and compared in the order `JAN' < `FEB' < ... < `DEC.' Invalid names compare low to valid names. -n Compare according to arithmetic value an initial numeric string consisting of optional white space, an optional - sign, and zero or more digits, optionally followed by a decimal point and zero or more digits. -r Reverse the result of comparison, so that lines with greater key values appear earlier in the output instead of later. Other options are: -o output-file Write output to output-file instead of to the standard output. If output-file is one of the input files, sort copies it to a tempo- rary file before sorting and writing the output to output-file. -t separator Use character separator as the field separator when finding the sort keys in each line. By default, fields are separated by the empty string between a non-whitespace character and a whitespace character. That is to say, given the input line ` foo bar', sort breaks it into fields ` foo' and ` bar'. The field separator is not considered to be part of either the field preceding or the field following it. -u For the default case or the -m option, only output the first of a sequence of lines that compare equal. For the -c option, check that no pair of consecutive lines compares equal. +POS1 [-POS2] Specify a field within each line to use as a sorting key. The field consists of the portion of the line starting at POS1 and up to (but not including) POS2 (or to the end of the line if POS2 is not given). The fields and character positions are numbered starting with 0. -k POS1[,POS2] An alternate syntax for specifying sorting keys. The fields and character positions are numbered starting with 1. A position has the form f.c, where f is the number of the field to use and c is the number of the first character from the beginning of the field (for +pos) or from the end of the previous field (for -pos). The .c part of a position may be omitted in which case it is taken to be the first character in the field. If the -b option has been given, the .c part of a field specification is counted from the first non- blank character of the field (for +pos) or from the first nonblank character following the previous field (for -pos). A +pos or -pos argument may also have any of the option letters Mbdfinr appended to it, in which case the global ordering options are not used for that particular field. The -b option may be independently attached to either or both of the +pos and -pos parts of a field speci- fication, and if it is inherited from the global options it will be attached to both. If a -n or -M option is used, thus implying a -b option, the -b option is taken to apply to both the +pos and the -pos parts of a key specification. Keys may span multiple fields. In addition, when GNU sort is invoked with exactly one argument, the following options are recognized: --help Print a usage message on standard output and exit successfully. --version Print version information on standard output then exit successfully. COMPATIBILITY
Historical (BSD and System V) implementations of sort have differed in their interpretation of some options, particularly -b, -f, and -n. GNU sort follows the POSIX behavior, which is usually (but not always!) like the System V behavior. According to POSIX -n no longer implies -b. For consistency, -M has been changed in the same way. This may affect the meaning of character positions in field specifica- tions in obscure cases. If this bites you the fix is to add an explicit -b. BUGS
The different meaning of field numbers depending on whether -k is used is confusing. It's all POSIX's fault! FSF
GNU Text Utilities SORT(1)
All times are GMT -4. The time now is 01:49 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy