Sponsored Content
Full Discussion: sort split merge -u unique
Top Forums Shell Programming and Scripting sort split merge -u unique Post 302551979 by binlib on Thursday 1st of September 2011 09:26:09 AM
Old 09-01-2011
Make sure you are sorting in the C locale. The other locales can be 10 time slower.
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Sort and Unique in Perl

Hi, May I know, if a pipe separated File is large, what is the best method to calculate the unique row count of 3rd column and get a list of unique value of the 3rdcolum? Thanks in advance! (20 Replies)
Discussion started by: deepakwins
20 Replies

2. UNIX for Dummies Questions & Answers

split a file with unique sets

This may sound like a trivial problem, but I still need some help: I have a file with ids and I want to split it 'n' ways (could be any number) into files: 1 1 1 2 2 3 3 4 5 5 Let's assume 'n' is 3, and we cannot have the same id in two different partitions. So the partitions may... (8 Replies)
Discussion started by: ChicagoBlues
8 Replies

3. Shell Programming and Scripting

Awk sort and unique

Input file --------- 12:name1:|host1|host1|host2|host1 13:name2:|host1|host1|host2|host3 14:name3: ...... Required output --------------- 12:name1:host1(2)|host1(1) 13:name2:host1(2)|host2(1)|host3(1) 14:name3: where (x) - Count how many times field appears in last column ... (3 Replies)
Discussion started by: greycells
3 Replies

4. Shell Programming and Scripting

How to merge columns into lines, using unique keys?

I would really appreciate a sulution for this : invoice# client# 5929 231 4358 231 2185 231 6234 231 1166 464 1264 464 3432 464 1720 464 9747 464 1133 791 4930 791 5496 791 6291 791 8681 989 3023 989 (2 Replies)
Discussion started by: hemo21
2 Replies

5. UNIX for Dummies Questions & Answers

How to count specific columns and merge with unique ones?

Hi. I am not sure the title gives an optimal description of what I want to do. I have several text files that contain data in many columns. All the files are organized the same way, but the data in the columns might differ. I want to count the number of times data occur in specific columns,... (0 Replies)
Discussion started by: JamesT
0 Replies

6. UNIX for Dummies Questions & Answers

Print unique lines without sort or unique

I would like to print unique lines without sort or unique. Unfortunately the server I am working on does not have sort or unique. I have not been able to contact the administrator of the server to ask him to add it for several weeks. (7 Replies)
Discussion started by: cokedude
7 Replies

7. Shell Programming and Scripting

How to merge two files with unique values matching.?

I have one script as below: #!/bin/ksh Outputfile1="/home/OutputFile1.xls" Outputfile2="/home/OutputFile2.xls" InputFile1="/home/InputFile1.sql" InputFile2="/home/InputFile2.sql" echo "Select hobby, class, subject, sports, rollNumber from Student_Table" >> InputFile1 echo "Select rollNumber... (3 Replies)
Discussion started by: Sharma331
3 Replies

8. Shell Programming and Scripting

Sort unique

Hi, I have an input file that I have sorted in a previous stage by $1 and $4. I now need something that will take the first record from each group of data based on the key being $1 Input file 1000AAA|"ZZZ"|"Date"|"1"|"Y"|"ABC"|""|AA 1000AAA|"ZZZ"|"Date"|"2"|"Y"|"ABC"|""|AA... (2 Replies)
Discussion started by: Ads89
2 Replies

9. UNIX for Beginners Questions & Answers

Merge 4 bim files by keeping only the overlapping variants (unique rs values )

Dear community, I am facing a problem and I kindly ask your help: I have 4 different data sets consisted from 3 different types of array. On each file, column 1 is chromosome position, column 2 is SNP id etc... Lets say I have the following (bim) datasets: x2014: 1 rs3094315... (4 Replies)
Discussion started by: fondan
4 Replies
sort(1) 						      General Commands Manual							   sort(1)

Name
       sort - sort file data

Syntax
       sort [options] [-k keydef] [+pos1[-pos2]] [file...]

Description
       The  command  sorts  lines  of  all the named files together and writes the result on the standard output.  The name `-' means the standard
       input.  If no input files are named, the standard input is sorted.

Options
       The default sort key is an entire line.	Default ordering is lexicographic by  bytes  in  machine  collating  sequence.	 The  ordering	is
       affected globally by the following options, one or more of which may appear.

       -b	   Ignores leading blanks (spaces and tabs) in field comparisons.

       -d	   Sorts data according to dictionary ordering:  letters, digits, and blanks only.

       -f	   Folds uppercase to lowercase while sorting.

       -i	   Ignore characters outside the ASCII range 040-0176 in nonnumeric comparisons.

       -k keydef   The	keydefargument	is  a key field definition. The format is field_start, [field_end] [type], where field_start and field_end
		   are the definition of the restricted search key, and type is a modifier from the option list [bdfinr]. These modifiers have the
		   functionality, for this key only, that their command line counter-parts have for the entire record.

       -n	   Sorts fields with numbers numerically.  An initial numeric string, consisting of optional blanks, optional minus sign, and zero
		   or more digits with optional decimal point, is sorted by arithmetic value.  (Note that -0 is taken to be equal to 0.)  Option n
		   implies option b.

       -r	   Reverses the sense of comparisons.

       -tx	   Uses specified character as field separator.

       The  notation  +pos1 -pos2 restricts a sort key to a field beginning at pos1 and ending just before pos2.  Pos1 and pos2 each have the form
       m.n, optionally followed by one or more of the options bdfinr, where m tells a number of fields to skip from the beginning of the line  and
       n tells a number of characters to skip further.	If any options are present they override all the global ordering options for this key.	If
       the b option is in effect n is counted from the first nonblank in the field; b is attached independently to pos2.  A missing .n means .0; a
       missing	-pos2  means the end of the line.  Under the -tx option, fields are strings separated by x; otherwise fields are nonempty nonblank
       strings separated by blanks.

       When there are multiple sort keys, later keys are compared only after all earlier keys compare equal.  Lines that otherwise  compare  equal
       are ordered with all bytes significant.

       These are additional options:

       -c	   Checks sorting order and displays output only if out of order.

       -m	   Merges previously sorted data.

       -o name	   Uses specified file as output file.	This file may be the same as one of the inputs.

       -T dir	   Uses specified directory to build temporary files.

       -u	   Suppresses all duplicate entries.  Ignored bytes and bytes outside keys do not participate in this comparison.

Examples
       Print in alphabetical order all the unique spellings in a list of words.  Capitalized words differ from uncapitalized.
	       sort -u +0f +0 list

       Print the password file, sorted by user id number (the 3rd colon-separated field).
	       sort -t: +2n /etc/passwd

       Print the first instance of each month in an already sorted file of (month day) entries.  The options -um with just one input file make the
       choice of a unique representative from a set of equal lines predictable.
	       sort -um +0 -1 dates

Restrictions
       Very long lines are silently truncated.

Diagnostics
       Comments and exits with nonzero status for various trouble conditions and for disorder discovered under option c.

Files
       /usr/tmp/stm*, /tmp/*	first and second tries for temporary files

See Also
       comm(1), join(1), rev(1), uniq(1)

																	   sort(1)
All times are GMT -4. The time now is 05:39 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy