Sponsored Content
Top Forums Shell Programming and Scripting How to remove special characters? Post 302830811 by ken6503 on Tuesday 9th of July 2013 02:14:15 PM
Old 07-09-2013
Quote:
Originally Posted by wisecracker
Be very careful, do not make an assumption that it is a single byte is size.

For a quick assessment use this command to check:-

Code:
hexdump -C /full/path/to/your/filename

This is an example; I copied your character and put it into an editor:-

Code:
This is the _byte_ ü _end_.

Note the character is between two spaces...

Now using the above command:-

Code:
Last login: Tue Jul  9 18:46:23 on ttys000
AMIGA:barrywalker~> hexdump -C /Users/barrywalker/byte_test.txt
00000000  54 68 69 73 20 69 73 20  74 68 65 20 5f 62 79 74  |This is the _byt|
00000010  65 5f 20 c3 bc 20 5f 65  6e 64 5f 2e 0a           |e_ .. _end_..|
0000001d
AMIGA:barrywalker~>

Note that at position 00000013 and 00000014 the bytes c3 and bc have appeared instead of the single character you are expecting...

So be very, very careful...

Hope this helps...
Thanks for your quick reply.
I run following command and got some result.
Code:
 # echo 'ADDÜL' |hexdump -C
00000000  41 44 44 dc 4c 0a                                 |ADD.L.|
00000006

Actually, I was run following command to split the file with one line to separate lines. when it hits the charactor Ü, it stopped.
what should I do to make the command to spearate file without stop
Code:
awk -v L="$2" '{for (i=1; i<=length($0); i+=L) print substr($0, i, L)}' "$1" > "$1"_split

Thanks in advance
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

remove special and unicode characters

Hi, How do I remove the lines where special characters or Unicode characters appear? The following query does work but I wonder if there is a better way. cat test.txt | egrep -v '\)|#|,|&|-|\(|\\|\/|\.' The following lines show that my query is incomplete. Warning: The word "*Khan" is... (1 Reply)
Discussion started by: shantanuo
1 Replies

2. Shell Programming and Scripting

Remove special characters from string

Hi there, I'd like to write a script that removes any set of character from any string. The first argument would be the string, the second argument would be the characters to remove. For example: $ myscript "My name's Santiago. What's yours?" "atu" My nme's Snigo. Wh's yors? I wrote the... (11 Replies)
Discussion started by: chebarbudo
11 Replies

3. Shell Programming and Scripting

How to remove special characters from each line?

Hello, Is there a simpler way to remove special characters (color codes) from each lines in a log file? I use sed like in the example below but I think there should be a more simple way to achieve the same result: $ cat -vet file1 ^, , , , Maybe to convert the file somehow? ... (5 Replies)
Discussion started by: majormark
5 Replies

4. UNIX for Dummies Questions & Answers

How to Remove Special Characters

Dear Members, We have a file which contains some special characters. I need to replace these special character by a new line character(\n). The Special character is \x85. I am not sure what this character means and how we can remove it. Any inputs are greatly appreciated. Thanks... (5 Replies)
Discussion started by: sandeep_1105
5 Replies

5. UNIX for Dummies Questions & Answers

Files with special characters - how to remove

Hi, I have a directory that has a file which contained special characters in the filename. Can someone please advise how to remove the file, preferably with a rm -i ? Thanks in advance. Listing is as below: {oracle}> ls -1b bplog.bkup.001 bplog.bkup.002 bplog.bkup.003 bplog.bkup.004... (1 Reply)
Discussion started by: newbie_01
1 Replies

6. Shell Programming and Scripting

remove special characters

hello all I am writing a perl code and i wish to remove the special characters for text. I wish to remove all extended ascii characters. If the list of special characters is huge, how can i do this using substitute command s/specialcharacters/null/g I really want to code like... (3 Replies)
Discussion started by: vasuarjula
3 Replies

7. Shell Programming and Scripting

Remove string between two special characters

Hi All, I have a variable like AVAIL="\ BACK:bkpstg:testdb3.iad.expertcity.com:backtest|\ #AUTH:authstg:testdb3.iad.expertcity.com:authiapd|\ TEST:authstg:testdb3.iad.expertcity.com:authiapd|\ " What I want to do here is that If a find # before any entry, remove the entire string... (5 Replies)
Discussion started by: engineermayur
5 Replies

8. Shell Programming and Scripting

Remove the special characters from field

Hi, In source data few of columns are having special charates(like *) due to this i am not able to display the data into flat file.it's displaying the some of junk data into the flat file. source dataExample: Address1="XDERFTG * HYJUYTG" how to remove the special charates in a string (2 Replies)
Discussion started by: koti_rama
2 Replies

9. Shell Programming and Scripting

Sed - remove special characters

Hi, I have a file with this line, it's always in the first line: I want to remove these special characters: ´╗┐ file1 ´╗┐\\bar\c$\test2\;3.348.118 Bytes;160 ;3 \\bar\c$\test\;35 Bytes;2 ;1 I want the same file to be only \\bar\c$\test2\;3.348.118 Bytes;160 ;3 \\bar\c$\test\;35... (4 Replies)
Discussion started by: nakaedu
4 Replies

10. Shell Programming and Scripting

How to remove some special characters in a string?

Hi, I have string like this ="Lookup Procedure" But i want the output like this Lookup Procedure =," should be removed. Please suggest me the solution. Regards, Madhuri (2 Replies)
Discussion started by: srimadhuri
2 Replies
HEXDUMP(1)                                                  BSD General Commands Manual                                                 HEXDUMP(1)

NAME
hexdump, hd -- ASCII, decimal, hexadecimal, octal dump SYNOPSIS
hexdump [-bcCdovx] [-e format_string] [-f format_file] [-n length] [-s offset] file ... hd [-bcdovx] [-e format_string] [-f format_file] [-n length] [-s offset] file ... DESCRIPTION
The hexdump utility is a filter which displays the specified files, or the standard input, if no files are specified, in a user specified format. The options are as follows: -b One-byte octal display. Display the input offset in hexadecimal, followed by sixteen space-separated, three column, zero-filled, bytes of input data, in octal, per line. -c One-byte character display. Display the input offset in hexadecimal, followed by sixteen space-separated, three column, space- filled, characters of input data per line. -C Canonical hex+ASCII display. Display the input offset in hexadecimal, followed by sixteen space-separated, two column, hexadecimal bytes, followed by the same sixteen bytes in %_p format enclosed in ``|'' characters. Calling the command hd implies this option. -d Two-byte decimal display. Display the input offset in hexadecimal, followed by eight space-separated, five column, zero-filled, two- byte units of input data, in unsigned decimal, per line. -e format_string Specify a format string to be used for displaying data. -f format_file Specify a file that contains one or more newline separated format strings. Empty lines and lines whose first non-blank character is a hash mark (#) are ignored. -n length Interpret only length bytes of input. -o Two-byte octal display. Display the input offset in hexadecimal, followed by eight space-separated, six column, zero-filled, two byte quantities of input data, in octal, per line. -s offset Skip offset bytes from the beginning of the input. By default, offset is interpreted as a decimal number. With a leading 0x or 0X, offset is interpreted as a hexadecimal number, otherwise, with a leading 0, offset is interpreted as an octal number. Appending the character b, k, or m to offset causes it to be interpreted as a multiple of 512, 1024, or 1048576, respectively. -v Cause hexdump to display all input data. Without the -v option, any number of groups of output lines, which would be identical to the immediately preceding group of output lines (except for the input offsets), are replaced with a line comprised of a single aster- isk. -x Two-byte hexadecimal display. Display the input offset in hexadecimal, followed by eight, space separated, four column, zero-filled, two-byte quantities of input data, in hexadecimal, per line. For each input file, hexdump sequentially copies the input to standard output, transforming the data according to the format strings speci- fied by the -e and -f options, in the order that they were specified. Formats A format string contains any number of format units, separated by whitespace. A format unit contains up to three items: an iteration count, a byte count, and a format. The iteration count is an optional positive integer, which defaults to one. Each format is applied iteration count times. The byte count is an optional positive integer. If specified it defines the number of bytes to be interpreted by each iteration of the for- mat. If an iteration count and/or a byte count is specified, a single slash must be placed after the iteration count and/or before the byte count to disambiguate them. Any whitespace before or after the slash is ignored. The format is required and must be surrounded by double quote (" ") marks. It is interpreted as a fprintf-style format string (see fprintf(3)), with the following exceptions: o An asterisk (*) may not be used as a field width or precision. o A byte count or field precision is required for each ``s'' conversion character (unlike the fprintf(3) default which prints the entire string if the precision is unspecified). o The conversion characters ``%'', ``h'', ``l'', ``n'', ``p'' and ``q'' are not supported. o The single character escape sequences described in the C standard are supported: NUL <alert character> a <backspace>  <form-feed> f <newline> <carriage return> <tab> <vertical tab> v The hexdump utility also supports the following additional conversion strings: _a[dox] Display the input offset, cumulative across input files, of the next byte to be displayed. The appended characters d, o, and x specify the display base as decimal, octal or hexadecimal respectively. _A[dox] Identical to the _a conversion string except that it is only performed once, when all of the input data has been processed. _c Output characters in the default character set. Nonprinting characters are displayed in three character, zero-padded octal, except for those representable by standard escape notation (see above), which are displayed as two character strings. _p Output characters in the default character set. Nonprinting characters are displayed as a single ``.''. _u Output US ASCII characters, with the exception that control characters are displayed using the following, lower-case, names. Characters greater than 0xff, hexadecimal, are displayed as hexadecimal strings. 000 NUL 001 SOH 002 STX 003 ETX 004 EOT 005 ENQ 006 ACK 007 BEL 008 BS 009 HT 00A LF 00B VT 00C FF 00D CR 00E SO 00F SI 010 DLE 011 DC1 012 DC2 013 DC3 014 DC4 015 NAK 016 SYN 017 ETB 018 CAN 019 EM 01A SUB 01B ESC 01C FS 01D GS 01E RS 01F US 07F DEL The default and supported byte counts for the conversion characters are as follows: %_c, %_p, %_u, %c One byte counts only. %d, %i, %o, %u, %X, %x Four byte default, one, two and four byte counts supported. %E, %e, %f, %G, %g Eight byte default, four and twelve byte counts supported. The amount of data interpreted by each format string is the sum of the data required by each format unit, which is the iteration count times the byte count, or the iteration count times the number of bytes required by the format if the byte count is not specified. The input is manipulated in ``blocks'', where a block is defined as the largest amount of data specified by any format string. Format strings interpreting less than an input block's worth of data, whose last format unit both interprets some number of bytes and does not have a specified iteration count, have the iteration count incremented until the entire input block has been processed or there is not enough data remaining in the block to satisfy the format string. If, either as a result of user specification or hexdump modifying the iteration count as described above, an iteration count is greater than one, no trailing whitespace characters are output during the last iteration. It is an error to specify a byte count as well as multiple conversion characters or strings unless all but one of the conversion characters or strings is _a or _A. If, as a result of the specification of the -n option or end-of-file being reached, input data only partially satisfies a format string, the input block is zero-padded sufficiently to display all available data (i.e., any format units overlapping the end of data will display some number of the zero bytes). Further output by such format strings is replaced by an equivalent number of spaces. An equivalent number of spaces is defined as the number of spaces output by an s conversion character with the same field width and precision as the original conversion character or conversion string but with any ``+'', `` '', ``#'' conversion flag characters removed, and referencing a NULL string. If no format strings are specified, the default display is equivalent to specifying the -x option. EXIT STATUS
The hexdump and hd utilities exit 0 on success, and >0 if an error occurs. EXAMPLES
Display the input in perusal format: "%06.6_ao " 12/1 "%3_u " " " "%_p " " " Implement the -x option: "%07.7_Ax " "%07.7_ax " 8/2 "%04x " " " Some examples for the -e option: # hex bytes % echo hello | hexdump -v -e '/1 "%02X "' ; echo 68 65 6C 6C 6F 0A # same, with ASCII section % echo hello | hexdump -e '8/1 "%02X "" "" "' -e '8/1 "%c"" "' 68 65 6C 6C 6F 0A hello # hex with preceding 'x' % echo hello | hexdump -v -e '"x" 1/1 "%02X" " "' ; echo x68 x65 x6C x6C x6F x0A # one hex byte per line % echo hello | hexdump -v -e '/1 "%02X "' 68 65 6C 6C 6F 0A # a table of byte#, hex, decimal, octal, ASCII % echo hello | hexdump -v -e '/1 "%_ad# "' -e '/1 "%02X hex"' -e '/1 " = %03i dec"' -e '/1 " = %03o oct"' -e '/1 " = _%c\_ "' 0# 68 hex = 104 dec = 150 oct = _h_ 1# 65 hex = 101 dec = 145 oct = _e_ 2# 6C hex = 108 dec = 154 oct = _l_ 3# 6C hex = 108 dec = 154 oct = _l_ 4# 6F hex = 111 dec = 157 oct = _o_ 5# 0A hex = 010 dec = 012 oct = _ _ # byte# & ASCII with control chars % echo hello | hexdump -v -e '/1 "%_ad# "' -e '/1 " _%_u\_ "' 0# _h_ 1# _e_ 2# _l_ 3# _l_ 4# _o_ 5# _lf_ SEE ALSO
gdb(1), od(1) BSD October 29, 2014 BSD
All times are GMT -4. The time now is 02:52 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy