Sponsored Content
Top Forums Shell Programming and Scripting sort truncates line when they contain nulls Post 302187110 by era on Saturday 19th of April 2008 09:52:22 AM
Old 04-19-2008
If the files are pure 7-bit ASCII, you can replace the NUL with an extended character. Just make sure you don't pick one which already exists in the file. And make sure you don't use its UTF8 representation, which is by definition multiple bytes.

Or if you can find a 7-bit printable character which doesn't occur in the file. try that. (Tab? Tilde? Underscore? @?)

Code:
tr '\000' @ <file | sort | tr @ '\000' >output

... assuming your tr understands backslashed octal.

Grepping for special characters can be tricky, too; presumably, your grep will also treat NUL as end of string. Try replacing all occurrences of your character and comparing the result against the original; if they are binary identical, you have found a character which doesn't occur in the file.

Code:
 tr -d @ <file | cmp - file

... assuming your cmp accepts - to mean standard input.
 

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

GREPing for Nulls

I just had a filesystem / file corruption issue on my HSP's server due to disk capacity limits and fileswapping. I discovered that certain files got corrupted when fileswapping was not successful and they ended up with a string of control characters, or what I believe to be nulls, in them. Does... (4 Replies)
Discussion started by: Dr. DOT
4 Replies

2. Shell Programming and Scripting

PS truncates the o/p

Hi , I have faced a strange situation in Solaris. the command ps -eo pid,args | grep 'SOMEPROCESS' truncates the output. outpt looks like 111 xxxxxxxxxxxxx SOMEPROCES 123 xxxxxxxxxxxxx SOMEPROCES 323 xxxxxxxxxxxxx SOMEPROCES The above doesn't return the complete command/args, infact if... (1 Reply)
Discussion started by: braindrain
1 Replies

3. UNIX for Advanced & Expert Users

who truncates the output? redirection? tty? Bug?

Hi, Output of running berkeley ps is truncated to 80 chars when using redirections. $ /usr/ucb/ps -e 12490|cat #truncated to 80 chars PID TT S TIME COMMAND 12490 pts/24 S 0:00 sleep 4000 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa getting longer lines is done by changing the stty $... (7 Replies)
Discussion started by: fredy
7 Replies

4. Shell Programming and Scripting

Sort a file line by line alphabetically

infile: z y x c b a desired output: x y z a b c I don't want to sort the lines into this: a b c x y z nor this: c b a z y x The number of fields per line and number of lines is indeterminate. The field separator is always a space. Thanks for the use of your collective brains.... (11 Replies)
Discussion started by: H2OBoodle
11 Replies

5. Programming

Blanks vs: Nulls

I'm relatively new to Pro*C programming. In the following example: char name; EXEC SQL SELECT 'John Doe' INTO :name FROM DUAL; "John Doe" is in positions 0-7, blanks in 8-19, and a null in 20. I would really prefer the null to be in position 8 and I don't care what's after that. I wrote a... (1 Reply)
Discussion started by: ebock
1 Replies

6. Shell Programming and Scripting

include NULLs in line length check

Hello, I am checking the length of each line of a fixed length file and making sure all lines are 161 length. My problem is that some files contain null characters which gets stripped out of my echo. How do I have the NULLs included in my check? (and I cannot replace or sub the NULL values with... (10 Replies)
Discussion started by: ironmix
10 Replies

7. Shell Programming and Scripting

Sort a line and Insert sorted word(s) in a line

Hello, I am looking to automate a task - which is updating an existing access control instruction of a server and making sure that the attributes defined in the instruction is in sorted order. The instructions will be of a specific syntax. For example lets assume below listed is one of an... (6 Replies)
Discussion started by: sanjayroc
6 Replies

8. Shell Programming and Scripting

Replace nulls with a value in a file

Hi, I've a PIPE delimited file with about 5 fields. Sometimes the records in the 4th field is null, so I want to replace it based on values we get it on 2nd field in the same file. Following is an example. ABCD|X-TYPE 3.0|2010|X-TYPE|20000 CDEF|C-TYPE 2.5|2011|C-TYPE|10000 XYZ|LX... (4 Replies)
Discussion started by: rudoraj
4 Replies

9. Shell Programming and Scripting

/usr/bin/expect script truncates data

I have a script on a Linux machine that connects remotely, via telnet on a windows machine to launch several commands and colect their output. On the Linux machine the output of these commands is redirected in a file. The script: #!/usr/bin/expect log_user 0 spawn telnet 10.10.10.10... (6 Replies)
Discussion started by: black_fender
6 Replies
VIS(3)							   BSD Library Functions Manual 						    VIS(3)

NAME
vis -- visually encode characters LIBRARY
Standard C Library (libc, -lc) SYNOPSIS
#include <vis.h> char * vis(char *dst, int c, int flag, int nextc); int strvis(char *dst, const char *src, int flag); int strvisx(char *dst, const char *src, size_t len, int flag); DESCRIPTION
The vis() function copies into dst a string which represents the character c. If c needs no encoding, it is copied in unaltered. The string is null terminated, and a pointer to the end of the string is returned. The maximum length of any encoding is four characters (not including the trailing NUL); thus, when encoding a set of characters into a buffer, the size of the buffer should be four times the number of charac- ters encoded, plus one for the trailing NUL. The flag argument is used for altering the default range of characters considered for encoding and for altering the visual representation. The additional character, nextc, is only used when selecting the VIS_CSTYLE encoding format (explained below). The strvis() and strvisx() functions copy into dst a visual representation of the string src. The strvis() function encodes characters from src up to the first NUL. The strvisx() function encodes exactly len characters from src (this is useful for encoding a block of data that may contain NUL's). Both forms NUL terminate dst. The size of dst must be four times the number of characters encoded from src (plus one for the NUL). Both forms return the number of characters in dst (not including the trailing NUL). The encoding is a unique, invertible representation composed entirely of graphic characters; it can be decoded back into the original form using the unvis(3) or strunvis(3) functions. There are two parameters that can be controlled: the range of characters that are encoded, and the type of representation used. By default, all non-graphic characters except space, tab, and newline are encoded. (See isgraph(3).) The following flags alter this: VIS_GLOB Also encode magic characters ('*', '?', '[' and '#') recognized by glob(3). VIS_SP Also encode space. VIS_TAB Also encode tab. VIS_NL Also encode newline. VIS_WHITE Synonym for VIS_SP | VIS_TAB | VIS_NL. VIS_SAFE Only encode "unsafe" characters. Unsafe means control characters which may cause common terminals to perform unexpected func- tions. Currently this form allows space, tab, newline, backspace, bell, and return - in addition to all graphic characters - unencoded. There are four forms of encoding. Most forms use the backslash character '' to introduce a special sequence; two backslashes are used to represent a real backslash. These are the visual formats: (default) Use an 'M' to represent meta characters (characters with the 8th bit set), and use caret '^' to represent control characters see (iscntrl(3)). The following formats are used: ^C Represents the control character 'C'. Spans characters '00' through '37', and '177' (as '^?'). M-C Represents character 'C' with the 8th bit set. Spans characters '241' through '376'. M^C Represents control character 'C' with the 8th bit set. Spans characters '200' through '237', and '377' (as 'M^?'). 40 Represents ASCII space. 240 Represents Meta-space. VIS_CSTYLE Use C-style backslash sequences to represent standard non-printable characters. The following sequences are used to represent the indicated characters: a BEL (007)  BS (010) f NP (014) NL (012) CR (015) s SP (040) HT (011) v VT (013) NUL (000) When using this format, the nextc argument is looked at to determine if a NUL character can be encoded as '' instead of '00'. If nextc is an octal digit, the latter representation is used to avoid ambiguity. VIS_HTTPSTYLE Use URI encoding as described in RFC 1808. The form is '%dd' where d represents a hexadecimal digit. VIS_OCTAL Use a three digit octal sequence. The form is 'ddd' where d represents an octal digit. There is one additional flag, VIS_NOSLASH, which inhibits the doubling of backslashes and the backslash before the default format (that is, control characters are represented by '^C' and meta characters as 'M-C'). With this flag set, the encoding is ambiguous and non-invertible. SEE ALSO
unvis(1), unvis(3) R. Fielding, Relative Uniform Resource Locators, RFC1808. HISTORY
These functions first appeared in 4.4BSD. BUGS
The vis family of functions do not recognize multibyte characters, and thus may consider them to be non-printable when they are in fact printable (and vice versa.) BSD
April 9, 2006 BSD
All times are GMT -4. The time now is 03:06 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy