difference in unix vs. linux sort


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting difference in unix vs. linux sort
# 1  
Old 08-18-2010
difference in unix vs. linux sort

Hi,

I am using some codes that have been ported from unix to linux, and now the sorting no longer results in the desired ordering. I'm hoping to find a way to mimic the unix sort command in linux. The input file is structured the following:

$> cat file.txt
US;KSU1;00;LHZ;2006-07-26;17:41:00;2008-12-31;23:59:59
US;KSU1;10;LH2;2006-07-26;17:41:00;2999-12-31;23:59:59
US;KSU1; ;LHZ;2008-12-31;17:41:00;2999-12-31;23:59:59
US;KSU1;HR;LHZ;2006-06-01;00:00:00;2999-12-31;23:59:59
US;KSU1;XX;LH1;2006-07-26;17:41:00;2999-12-31;23:59:59
US;KSU1;HR;LH2;2006-06-01;00:00:00;2999-12-31;23:59:59
US;KSU1;HR;LH1;2006-06-01;00:00:00;2999-12-31;23:59:59
US;BSU2;00;LHZ;2006-07-26;17:41:00;2008-12-31;23:59:59
US;BSU2;10;LHN;2006-07-26;17:41:00;2999-12-31;23:59:59
US;BSU2; ;LHZ;2008-12-31;17:41:00;2999-12-31;23:59:59
US;BSU2;HR;LHZ;2006-06-01;00:00:00;2999-12-31;23:59:59
US;BSU2;XX;LHE;2006-07-26;17:41:00;2999-12-31;23:59:59
US;BSU2;HR;LHN;2006-06-01;00:00:00;2999-12-31;23:59:59
US;BSU2;HR;LHE;2006-06-01;00:00:00;2999-12-31;23:59:59


It is semi colon separated (although that doesn't particularly matter). Please note that in the 3rd and 10th rows, column three appears to be "missing" a value. It isn't, it is simple two blanks "<space><space>". This is a real entry in this file. The output should be sorted into a specified format, where it is keyed in order on each column. In unix, the default sort command (also removing unique lines is what we've always used). The result is

unix> cat file.txt | sort -u
US;BSU2; ;LHZ;2008-12-31;17:41:00;2999-12-31;23:59:59
US;BSU2;00;LHZ;2006-07-26;17:41:00;2008-12-31;23:59:59
US;BSU2;10;LHN;2006-07-26;17:41:00;2999-12-31;23:59:59
US;BSU2;HR;LHE;2006-06-01;00:00:00;2999-12-31;23:59:59
US;BSU2;HR;LHN;2006-06-01;00:00:00;2999-12-31;23:59:59
US;BSU2;HR;LHZ;2006-06-01;00:00:00;2999-12-31;23:59:59
US;BSU2;XX;LHE;2006-07-26;17:41:00;2999-12-31;23:59:59
US;KSU1; ;LHZ;2008-12-31;17:41:00;2999-12-31;23:59:59
US;KSU1;00;LHZ;2006-07-26;17:41:00;2008-12-31;23:59:59
US;KSU1;10;LH2;2006-07-26;17:41:00;2999-12-31;23:59:59
US;KSU1;HR;LH1;2006-06-01;00:00:00;2999-12-31;23:59:59
US;KSU1;HR;LH2;2006-06-01;00:00:00;2999-12-31;23:59:59
US;KSU1;HR;LHZ;2006-06-01;00:00:00;2999-12-31;23:59:59
US;KSU1;XX;LH1;2006-07-26;17:41:00;2999-12-31;23:59:59


There are two main entries, as determined by column one and column two. "US BSU2" and "US KSU1". For each of these, the "blank" in column three has been sorted highest, then in numerical order, followed by the alphabetical values. This is the correct formatting for this file. However, if I perform the same command within linux, the output is much different.

linux$> cat file.txt | sort -t';' -u
US;BSU2;00;LHZ;2006-07-26;17:41:00;2008-12-31;23:59:59
US;BSU2;10;LHN;2006-07-26;17:41:00;2999-12-31;23:59:59
US;BSU2;HR;LHE;2006-06-01;00:00:00;2999-12-31;23:59:59
US;BSU2;HR;LHN;2006-06-01;00:00:00;2999-12-31;23:59:59
US;BSU2;HR;LHZ;2006-06-01;00:00:00;2999-12-31;23:59:59
US;BSU2; ;LHZ;2008-12-31;17:41:00;2999-12-31;23:59:59
US;BSU2;XX;LHE;2006-07-26;17:41:00;2999-12-31;23:59:59
US;KSU1;00;LHZ;2006-07-26;17:41:00;2008-12-31;23:59:59
US;KSU1;10;LH2;2006-07-26;17:41:00;2999-12-31;23:59:59
US;KSU1;HR;LH1;2006-06-01;00:00:00;2999-12-31;23:59:59
US;KSU1;HR;LH2;2006-06-01;00:00:00;2999-12-31;23:59:59
US;KSU1;HR;LHZ;2006-06-01;00:00:00;2999-12-31;23:59:59
US;KSU1; ;LHZ;2008-12-31;17:41:00;2999-12-31;23:59:59
US;KSU1;XX;LH1;2006-07-26;17:41:00;2999-12-31;23:59:59


In this case, the rows with the "blanks" are no longer given the highest ranking, and instead slot between HR and XX.

Is there a way to emulate the behaviour of the unix sort command within linux. I imagine there is a difference in the precedence of the characters, but how the <space><space> is interpreted to fit between HR and XX, I don't know.

Thanks for any help.
# 2  
Old 08-18-2010
I think something's up with your data files. I can't get a sorting order anything like any of those.
# 3  
Old 08-18-2010
Hmmm.

I just copied and pasted from my browser the output from "cat file.txt" to an empty gedit, saved, then from the command line ran "cat file.txt | sort -t';' -u",
and it produced the same output as the bottom output (the linux one). I'm not sure what I'm doing to the files by copying and pasting.

What sorted output did you get?
# 4  
Old 08-18-2010
Code:
$ sort -u < file.txt
US;BSU2; ;LHZ;2008-12-31;17:41:00;2999-12-31;23:59:59
US;BSU2;00;LHZ;2006-07-26;17:41:00;2008-12-31;23:59:59
US;BSU2;10;LHN;2006-07-26;17:41:00;2999-12-31;23:59:59
US;BSU2;HR;LHE;2006-06-01;00:00:00;2999-12-31;23:59:59
US;BSU2;HR;LHN;2006-06-01;00:00:00;2999-12-31;23:59:59
US;BSU2;HR;LHZ;2006-06-01;00:00:00;2999-12-31;23:59:59
US;BSU2;XX;LHE;2006-07-26;17:41:00;2999-12-31;23:59:59
US;KSU1; ;LHZ;2008-12-31;17:41:00;2999-12-31;23:59:59
US;KSU1;00;LHZ;2006-07-26;17:41:00;2008-12-31;23:59:59
US;KSU1;10;LH2;2006-07-26;17:41:00;2999-12-31;23:59:59
US;KSU1;HR;LH1;2006-06-01;00:00:00;2999-12-31;23:59:59
US;KSU1;HR;LH2;2006-06-01;00:00:00;2999-12-31;23:59:59
US;KSU1;HR;LHZ;2006-06-01;00:00:00;2999-12-31;23:59:59
US;KSU1;XX;LH1;2006-07-26;17:41:00;2999-12-31;23:59:59
$ sort -t';' -u < file.txt
US;BSU2; ;LHZ;2008-12-31;17:41:00;2999-12-31;23:59:59
US;BSU2;00;LHZ;2006-07-26;17:41:00;2008-12-31;23:59:59
US;BSU2;10;LHN;2006-07-26;17:41:00;2999-12-31;23:59:59
US;BSU2;HR;LHE;2006-06-01;00:00:00;2999-12-31;23:59:59
US;BSU2;HR;LHN;2006-06-01;00:00:00;2999-12-31;23:59:59
US;BSU2;HR;LHZ;2006-06-01;00:00:00;2999-12-31;23:59:59
US;BSU2;XX;LHE;2006-07-26;17:41:00;2999-12-31;23:59:59
US;KSU1; ;LHZ;2008-12-31;17:41:00;2999-12-31;23:59:59
US;KSU1;00;LHZ;2006-07-26;17:41:00;2008-12-31;23:59:59
US;KSU1;10;LH2;2006-07-26;17:41:00;2999-12-31;23:59:59
US;KSU1;HR;LH1;2006-06-01;00:00:00;2999-12-31;23:59:59
US;KSU1;HR;LH2;2006-06-01;00:00:00;2999-12-31;23:59:59
US;KSU1;HR;LHZ;2006-06-01;00:00:00;2999-12-31;23:59:59
US;KSU1;XX;LH1;2006-07-26;17:41:00;2999-12-31;23:59:59

I'm guessing that maybe, those spaces aren't actually spaces. Try transferring the data with scp instead of copy-paste.
# 5  
Old 08-18-2010
I created the file file.txt within linux in gedit, manually putting in two spaces. I then ran the sort on it, with the same incorrect output.

I then scp'd the file to a sun computer, ran sort, and the output was sorted in the same order as yours.

Then I scp'd the file back to a new file on the linux computer, and ran sort again, but got the same output as the original sort I had carried out on linux (Ubuntu 9.10 x86_64). What *nix are you running?
# 6  
Old 08-18-2010
How about, instead of copy/pasting into gedit, you copy the original file, from the original source and try it on that? Not copy-paste, get the file itself. Copy-pasting is likely where the change is likely happening. Character sets might be getting changed slightly(or maybe you have a different character set than me), whitespace mangled. (things like tabs get copy/pasted as spaces, usually.) Especially copy-pasting from a web browser tends to eat multiple spaces.

Gentoo linux.
Code:
 $ sort --version
sort (GNU coreutils) 7.5
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and Paul Eggert.


Last edited by Corona688; 08-18-2010 at 06:55 PM..
This User Gave Thanks to Corona688 For This Post:
# 7  
Old 08-19-2010
Just to clarify, the text file created on the sun computer was created by hand (ie typed out) simply to test some of the routines. The original full text files are station lists that are obtained by copying and pasting from a web browser into a text file. In the past, on the sun system, this had never been a problem. After I scp'd the files from the sun system to the linux system, the sort command no longer parsed them the same way.

I've scp'd the file from the sun computer (generated there) to the linux computer, and ran the sort, but it still produces the same incorrect output. As far as I know, there is no copy and paste going on now.

Code:
$ sort --version
sort (GNU coreutils) 7.4
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and Paul Eggert.

I can't imagine that this is caused by the difference between GNU 7.4 and 7.5.

Thanks!

---------- Post updated 08-19-10 at 11:36 AM ---------- Previous update was 08-18-10 at 11:10 PM ----------

I have manged to get this figured out now, after coming across the following post: sort (GNU coreutils) 7.4 not sorting in ascii order (asked and answered)..

My LC_COLLATE environment variable was set to "en_IE.utf8". I set that instead as "C", for POSIX ASCII sort as in Unix environments, and the sort command now works as I would expect on my input file.

Thanks for your help!
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

UNIX compare, sort lines and append difference

To make it easier, i gave following example. It is not homework or classwork. Instead, i have a huge csv file dump from tsql with 15 columns and around 300 rows. I was able to extract content that needs to be really converted. Here is the extract: ES FP,B1ES FP,70000,I,SL22,SL22 (70000) ES... (8 Replies)
Discussion started by: nike27
8 Replies

2. Shell Programming and Scripting

UNIX compare, sort lines and append difference

To make it easier, i gave following example. It is not homework or classwork. Instead, i have a huge csv file dump from tsql with 15 columns and around 300 rows. I was able to extract content that needs to be really converted. Here is the extract: ES FP,B1ES FP,70000,I,SL22,SL22 (70000) ES... (0 Replies)
Discussion started by: nike27
0 Replies

3. Shell Programming and Scripting

UNIX compare, sort lines and append difference

Hi, I have a file that needs to be converted: content is: a, b, 4 a ,b, 5 x, y, 1 a, b, 1 x, y, 3 how can i get: a, b, 1|4|5 x,y 1|3 (1 Reply)
Discussion started by: nike27
1 Replies

4. UNIX for Dummies Questions & Answers

Difference between UNIX and Linux

hi experts please tell me the real difference between unix and linux at kernel structure (1 Reply)
Discussion started by: linurag
1 Replies

5. UNIX for Dummies Questions & Answers

difference between unix and linux

Hi I am new to linux I have dout waht is the difference between UNIX and LINUX Is there any soft for insatallation for UNIX OS Thanks (0 Replies)
Discussion started by: sanjaya
0 Replies

6. AIX

difference between AIx and Linux and Unix

Sir , Can any body explain the difference between linux , Unix and AIx on command Reference all the command on AIx and unix is same or not please reply (2 Replies)
Discussion started by: arif185
2 Replies

7. UNIX for Advanced & Expert Users

What is the difference between Unix & linux

:confused: Hi All Can anyone help me in finding the answer of the question mentioned below. What is the difference between Unix & linux ? Thanks in Advance to all CSaha (1 Reply)
Discussion started by: csaha
1 Replies

8. UNIX for Dummies Questions & Answers

Difference between UNIX and Linux

OK, I've used various versions of UNIX(Solaris, HPUX, etc..) over the years. Now the organization I work for is leaning towards more Linux based systems(Redhat, Suse, etc..) I do see differences in in comands and how to accomplish basic adminstration, but nothing mind blowing. So, what is it... (5 Replies)
Discussion started by: pbonilla
5 Replies

9. UNIX for Dummies Questions & Answers

difference between unix and linux?

Ok, I'm confused. Can someone answer these (stupid) questions please for me? 1. What is the difference between unix and linux? 2. Is FreeBSD a unix distribution? 3. If not, then what is Unix? I actually gone to Unix.com because I thought this is it's official website where I could download... (1 Reply)
Discussion started by: RellioN
1 Replies
Login or Register to Ask a Question