Sorting based on a particular colum


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Sorting based on a particular colum
# 1  
Old 07-30-2012
Question Sorting based on a particular colum

Hi,

I want a flat file(pipe delimited) to be sorted based on 2nd column only. Below is input file to be sorted.

Code:
AVERS|K50034|A|Y|N|N|Y|Y|Y|||N|N
AVERS|K50035|A|Y|N|N|Y|Y|Y|||N|N
ARTCL|K50034||K50034|||FE024|FE024|FE024|FE024|||01|20131|20131|50|20120724|RKPPLM|40|27|00|1|1|202|22|538|63|K32|000|00227|00||00000|||000|000|PAA|||||||000||||A|M|N|N|N|N|Y|N|04|00|00|20130301|20130801|90|20130101||N|0|T917|20|02|12|||||||SF|TZ|30|390||||
ARTCL|K50035||K50035|||FE025|FE025|FE025|FE025|||01|20131|20131|20|20120724|RKPPLM|40|27|00|1|1|202|22|538|63|K32|000|00227|00||00000|||000|000|PAA|||||||000||||A|M|N|N|N|N|Y|N|04|00|00|20130301|20130801|90|20130101||Y|0|T917|20|02|12|||||||SF|TZ|30|390||||
ALSIZ|K50034|58|670|13M|Y|0|||||0|12|||||0
ALSIZ|K50034|59|680|13W|Y|0|||||0|12|||||0
ALSIZ|K50034|60|690|13XW|Y|0|||||0|12|||||0
ALSIZ|K50034|62|710|14M|Y|0|||||0|12|||||0
ALSIZ|K50034|63|720|14W|Y|0|||||0|12|||||0
ALSIZ|K50035|66|750|15M|Y|0|||||0|12|||||0
ALSIZ|K50035|67|760|15W|Y|0|||||0|12|||||0
ASKUS|K50034|N|11|200|USA|6.5||886396816834|Y
ASKUS|K50034|N|14|230|USA|7M||886396995782|YADESC|K50034|02|2 EYE 45 MOC 1|2 EYE 45 MOC 1
AFACT|K50034|A|199501
ACOMP|K50034|400|LEATHER LOW NON ATHLETIC||N|
ACOMP|K50035|400|LEATHER LOW NON ATHLETIC||N|
ACUST|K50034|400|1|99|999|||6403996075|
ACUST|K50035|400|1|99|999|||6403996075|
ACWDL|K50034|02|BLACK|BLACK
ACWDL|K50035|02|BLACK|BLACK
ALRMH|K50034|8|1|1|39|32|81|811|81139|8113932
ALRMH|K50035|8|1|1|39|32|81|811|81139|8113932


and the desired output file is

Code:
AVERS|K50034|A|Y|N|N|Y|Y|Y|||N|N
ARTCL|K50034||K50034|||FE024|FE024|FE024|FE024|||01|20131|20131|50|20120724|RKPPLM|40|27|00|1|1|202|22|538|63|K32|000|00227|00||00000|||000|000|PAA|||||||000||||A|M|N|N|N|N|Y|N|04|00|00|20130301|20130801|90|20130101||N|0|T917|20|02|12|||||||SF|TZ|30|390||||
ALSIZ|K50034|58|670|13M|Y|0|||||0|12|||||0
ALSIZ|K50034|59|680|13W|Y|0|||||0|12|||||0
ALSIZ|K50034|60|690|13XW|Y|0|||||0|12|||||0
ALSIZ|K50034|62|710|14M|Y|0|||||0|12|||||0
ALSIZ|K50034|63|720|14W|Y|0|||||0|12|||||0
ASKUS|K50034|N|11|200|USA|6.5||886396816834|Y
ASKUS|K50034|N|14|230|USA|7M||886396995782|YADESC|K50034|02|2 EYE 45 MOC 1|2 EYE 45 MOC 1
AFACT|K50034|A|199501
ACOMP|K50034|400|LEATHER LOW NON ATHLETIC||N|
ACWDL|K50034|02|BLACK|BLACK
ALRMH|K50034|8|1|1|39|32|81|811|81139|8113932
AVERS|K50035|A|Y|N|N|Y|Y|Y|||N|N
ARTCL|K50035||K50035|||FE025|FE025|FE025|FE025|||01|20131|20131|20|20120724|RKPPLM|40|27|00|1|1|202|22|538|63|K32|000|00227|00||00000|||000|000|PAA|||||||000||||A|M|N|N|N|N|Y|N|04|00|00|20130301|20130801|90|20130101||Y|0|T917|20|02|12|||||||SF|TZ|30|390||||
ALSIZ|K50035|66|750|15M|Y|0|||||0|12|||||0
ALSIZ|K50035|67|760|15W|Y|0|||||0|12|||||0
ACOMP|K50035|400|LEATHER LOW NON ATHLETIC||N|
ACUST|K50035|400|1|99|999|||6403996075|
ACWDL|K50035|02|BLACK|BLACK
ALRMH|K50035|8|1|1|39|32|81|811|81139|8113932

Tried the below options but the output is not same as desired output. Sorting is being done even on 1st column using below commands.

sort -t "|" -k 2,2 sampleinput.dat
sort -t "|" -k 2.1,2.6 sampleinput.dat

Please help!

Moderator's Comments:
Mod Comment edit by bakunin: Please view this code tag video for how to use code tags when posting code and data.

Last edited by bakunin; 07-30-2012 at 02:47 AM..
# 2  
Old 07-30-2012
That is the way all of the UNIX and Linux sort utilities work. This is clearly documented on the POSIX sort man page:
Quote:
When there are multiple key fields, later keys are
compared only after all earlier keys compare equal.
Except when the -u option is specified, lines that
otherwise compare equal are ordered as if none of
the options -d, -f, -i, -n or -k were present (but
with -r still in effect, if it was specified) and
with all bytes in the lines significant to the com-
parison.
Unfortunately, this is not mentioned on the BSD, OS/X, and Linux man pages available in the Man Pages section of this site.
This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 07-30-2012
hmm....The output you posted is in fact sorted on the second field. Notice the penultimate line with "50034", which moved up in the output.

Still, there are only two distinct keys in your input and the question arises how sort should handle lines with identical keys. If you don't want the lines with identical second fields to be sorted upon the contents of the first field (or in a different manner, like reversed) you should define a compund sort key based on several fields.

Take your second sort key for example: it says the lines should be sorted upon a key composed from a string starting at the first character of the second field to the 6th character of the second field. Therefore, the sorting routine of the "sort" command "sees" these keys (i include line numbers for reference):


Code:
...
1  K50034
2  K50034
3  K50034
4  K50034
5  K50035
6  K50035
7  K50035
8  K50035
...

How is it supposed to determine, if, for instance, line 2 should go before or after line 3? This is why the file seems to be sorted based on field 1, but this is only the case because so many contents of your specified key are identical and for these the default sort (start of line) kicks in.

If you want to override this define a sort key which leads to distinct sort key contents.

Tip: In your "-k ..." definition you don't have to keep the line order. You may specify for instance "first field 2, then field 4, only then field 3" as well.

I hope this helps.

bakunin
# 4  
Old 07-30-2012
Try...
Code:
sort -t'|' -k2 test.txt

# 5  
Old 07-30-2012
stable sort

Hi,

Try this one,

Code:
sort -s -t "|" -k2,2 sampleinput.dat

Thanks,
Rangarasan N.
# 6  
Old 07-30-2012
Hi,

Thanks for the reply but sort -s is not working. Its showing below error.

Code:
sort: illegal option -- s
usage: sort [-cmu] [-o output] [-T directory] [-S mem] [-z recsz]
        [-dfiMnr] [-b] [-t char] [-k keydef] [+pos1 [-pos2]] files...


Last edited by Scott; 07-30-2012 at 07:01 AM.. Reason: Code tags
# 7  
Old 07-30-2012
What is your operating system and sort version?

Code:
uname -a
sort --version

Cheers,
RangaSmilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Sorting based on File name

Hi All I have a requirement to list all the files in chronological order based on the date value in the file name.For ex if I have three files as given below ABC_TEST_20160103_1012.txt ABC_TEST_20160229_1112.txt ABC_TEST_20160229_1112.txt I have written code as given below to list out... (2 Replies)
Discussion started by: ginrkf
2 Replies

2. Shell Programming and Scripting

Sorting based on filename

Hello , I have to write a bash script. I will explain the logic based on a scenario. Scenario : Suppose I have few files in a Folder X : FILE_201508.list FILE_201510.list FILE_201507.list abc_201510.csv xyz_201508.csv abc_201507.csv def_201507.csv 1) Now ,... (3 Replies)
Discussion started by: smamrm
3 Replies

3. Shell Programming and Scripting

Sorting based on the second field

Oracle Enterprise Linux 6 This is my file. Two fields separated by space $ cat testfile.txt MARCH9 MARCH4 MARCH1 MARCH5 MARCH2 MARCH326 MARCH821 MARCH7 MARCH6 MARCH2 $ $ The following numeric sort, based on the first field's 6th character works as expected. $ $ sort -n -k 1.6... (7 Replies)
Discussion started by: John K
7 Replies

4. Shell Programming and Scripting

Sorting file based on name

Hi team, We have few files landing to our server based on sequence number. These files have to be processed in the sequence number order. Once the sequence number has reached its maximum, the files with sequence number 0000 has to be processed. For example: IN9997 IN9998 IN9999 IN0000... (7 Replies)
Discussion started by: anijan
7 Replies

5. Shell Programming and Scripting

sorting based on a field

the below is sorted as it is. the fields that i'm interested in are the 4th and 5th field. i want to sort the based on the 4th field. my past attempt to do this was to do something like this: awk '{print $4}'| awk '{print $1":"$2}' datafile | sort | uniq however, if i do that, i lose... (2 Replies)
Discussion started by: SkySmart
2 Replies

6. UNIX for Dummies Questions & Answers

Sorting words based on length

i need to write a bash script that recive a list of varuables kaka pele ronaldo beckham zidane messi rivaldo gerrard platini i need the program to print the longest word of the list. word in the output appears on a separate line and word order in the output is in the order Llachsicografi costs.... (1 Reply)
Discussion started by: yairpg
1 Replies

7. Shell Programming and Scripting

sorting based on alternative lines

Hello, I have a file with multiple entries. @SFGF-GA2-1_58:5:36:11009:999#0/1 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN +SFGF-GA2-1_58:5:36:11009:999#0/1 ################################################################################... (10 Replies)
Discussion started by: Diya123
10 Replies

8. Shell Programming and Scripting

Sorting based on multiple delimiters

Hello, I have data where words are separated by a delimiter. In this case "=" The number of delimiters in a line can vary from 4to 8. The norm is 4. Is it possible to have a script where the file could be separated starting with highest number of delimiters and ending with the lowest An... (8 Replies)
Discussion started by: gimley
8 Replies

9. Shell Programming and Scripting

Sorting based on Multiple columns

Hi, I have a requirement whereby I have to sort a flat file based on Multiple Columns (similar to ORDER BY Clause of Oracle). I am getting 10 columns in the flat file and I want the file to be sorted on 1st, 3rd, 4th, 7th and 9th columns in ascending order. The flat file is pipe seperated. Any... (15 Replies)
Discussion started by: dharmesht
15 Replies

10. Shell Programming and Scripting

Sorting based on columns

Hi, I want a list of entries in 3 space delimited columns. I want to sort entries based on the very first column. Rows can't be changed. For example: If I have... Abc Abc Acc Bca Bda Bdd Cab Cab Cbc Dbc Dca Dda Abc Abc Acc the output should be... Abc Abc Acc Abc Abc Acc Bca... (7 Replies)
Discussion started by: MobileUser
7 Replies
Login or Register to Ask a Question