sort truncates line when they contain nulls | Unix Linux Forums | Shell Programming and Scripting

  Go Back    


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

sort truncates line when they contain nulls

Shell Programming and Scripting


Tags
linux

Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 04-05-2007
ArthurWaik ArthurWaik is offline
Registered User
 
Join Date: Apr 2007
Last Activity: 4 January 2009, 2:34 PM EST
Location: England
Posts: 4
Thanks: 0
Thanked 0 Times in 0 Posts
sort truncates line when they contain nulls

When I try to sort a file where some records contain nulls i.e. hex 00 the sort truncates the record when it reaches the null and writes message:

"sort: warning: missing NEWLINE added at end of input file myfile"

I'm assuming from this that the sort sees the null as a special character and acts accordingly. I could hack the file to replace the nulls with spaces but it would be great if I could tell the sort to accept the null as just another character in the record and not truncate.

Anybody got any ideas on options?
Sponsored Links
    #2  
Old 04-17-2008
massrobe massrobe is offline
Registered User
 
Join Date: Apr 2008
Last Activity: 21 April 2008, 5:51 PM EDT
Posts: 3
Thanks: 0
Thanked 0 Times in 0 Posts
sort: warning: missing NEWLINE added

Hello Arthur. I have the same problem. Did you could fix it? Thanks


Quote:
Originally Posted by ArthurWaik View Post
When I try to sort a file where some records contain nulls i.e. hex 00 the sort truncates the record when it reaches the null and writes message:

"sort: warning: missing NEWLINE added at end of input file myfile"

I'm assuming from this that the sort sees the null as a special character and acts accordingly. I could hack the file to replace the nulls with spaces but it would be great if I could tell the sort to accept the null as just another character in the record and not truncate.

Anybody got any ideas on options?
Sponsored Links
    #3  
Old 04-18-2008
jgrogan jgrogan is offline
Registered User
 
Join Date: Apr 2008
Last Activity: 31 July 2014, 9:20 AM EDT
Posts: 28
Thanks: 0
Thanked 0 Times in 0 Posts
Hi

Most Unix utilities will have this problem...

If x'00' is to be considered a valid character in the body of your file, how would sort identify a 'true' end-of-line?

Do your records have an end-of-line marker other than x'00'?

Just my 2 cents...

JG
    #4  
Old 04-19-2008
era era is offline Forum Advisor  
Herder of Useless Cats (On Sabbatical)
 
Join Date: Mar 2008
Last Activity: 28 March 2011, 6:41 AM EDT
Location: /there/is/only/bin/sh
Posts: 3,653
Thanks: 0
Thanked 10 Times in 8 Posts
If the files are pure 7-bit ASCII, you can replace the NUL with an extended character. Just make sure you don't pick one which already exists in the file. And make sure you don't use its UTF8 representation, which is by definition multiple bytes.

Or if you can find a 7-bit printable character which doesn't occur in the file. try that. (Tab? Tilde? Underscore? @?)


Code:
tr '\000' @ <file | sort | tr @ '\000' >output

... assuming your tr understands backslashed octal.

Grepping for special characters can be tricky, too; presumably, your grep will also treat NUL as end of string. Try replacing all occurrences of your character and comparing the result against the original; if they are binary identical, you have found a character which doesn't occur in the file.


Code:
 tr -d @ <file | cmp - file

... assuming your cmp accepts - to mean standard input.
Sponsored Links
    #5  
Old 04-21-2008
massrobe massrobe is offline
Registered User
 
Join Date: Apr 2008
Last Activity: 21 April 2008, 5:51 PM EDT
Posts: 3
Thanks: 0
Thanked 0 Times in 0 Posts
Era,
I can not change the byte because it is part of my data.
In Linux works fine, but in AIX truncated data.
Thanks
Sponsored Links
    #6  
Old 04-21-2008
massrobe massrobe is offline
Registered User
 
Join Date: Apr 2008
Last Activity: 21 April 2008, 5:51 PM EDT
Posts: 3
Thanks: 0
Thanked 0 Times in 0 Posts
jgrogan,
My file have x'0A' at end of each records.
thanks
Sponsored Links
    #7  
Old 04-22-2008
era era is offline Forum Advisor  
Herder of Useless Cats (On Sabbatical)
 
Join Date: Mar 2008
Last Activity: 28 March 2011, 6:41 AM EDT
Location: /there/is/only/bin/sh
Posts: 3,653
Thanks: 0
Thanked 10 Times in 8 Posts
Quote:
Originally Posted by massrobe View Post
I can not change the byte because it is part of my data.
The idea is to change it temporarily so sort can work, then change it back. You just need to take care to use a byte which doesn't occur in your data.

For example, octal \200 or \001 might work if they don't occur in the data file already. So you'd change the NULs to (something unique), sort, and change (something unique) back to NUL. Now the data should be sorted, with the NULs preserved.

(\200 might be problematic too, because it's NUL with the eight bit set, and some procedure might still live in 7-bit land and strip the 8th bit internally; try some other high-value byte between \201 and \377 if it doesn't work.)
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
include NULLs in line length check ironmix Shell Programming and Scripting 10 06-30-2009 05:12 PM
Sort a file line by line alphabetically H2OBoodle Shell Programming and Scripting 11 02-11-2008 06:27 AM
who truncates the output? redirection? tty? Bug? fredy UNIX for Advanced & Expert Users 7 12-10-2006 01:21 AM
PS truncates the o/p braindrain Shell Programming and Scripting 1 04-25-2006 04:23 PM



All times are GMT -4. The time now is 09:05 AM.