01-29-2020
Quote:
Originally Posted by
bmk123
... always it will remove the non printable characters?
No. It will remove characters that are not in the "ASCII range" from
0x00 up to
0x7F. ASCII control chars (non- printable, incl. white space) will NOT be removed. It will remove characters above ASCII, starting with
0x80 (= 128), including "extended ASCII" or any other character set / encoding like UTF-8, on which your locale setting may depend.
You may want to consider starting over, making up your mind which chars you need, and which you don't, and rephrase your specification. Do you have examples of "target" chars ?
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
anyone out there knows how to remove pattern <random string> use sed? (6 Replies)
Discussion started by: jamwong
6 Replies
2. Shell Programming and Scripting
Hello and thx for reading this
I'm using sed to remove only the leading spaces in a file
bash-280R# cat foofile
some text
some text
some text
some text
some text
bash-280R#
bash-280R# sed 's/^ *//' foofile > foofile.use
bash-280R# cat foofile.use
some text
some text
some text... (6 Replies)
Discussion started by: laser
6 Replies
3. Shell Programming and Scripting
I have to mangle some "plain ASCII" text file (i.e. 8 bits/characters where the text DOES contain characters like Umlauts and accented characters from the upper 7-bits range, i.e. with hex codes in ).
For this I am trying to use SED which I downloaded as part of cygwin package (yes, I am doing... (0 Replies)
Discussion started by: mmo
0 Replies
4. Shell Programming and Scripting
Hi gurus,
I have a file in unix with ascii values. I need to convert all the ascii values in the file to ascii characters. File contains nearly 20000 records with ascii values. (10 Replies)
Discussion started by: sandeeppvk
10 Replies
5. Shell Programming and Scripting
Hello
I have a file with records...The records have several lines and have start and end born...
This is a template:
000000001 LDR L ^^^^^nam^^2200325Iia^45e0
000000001 022 L $$a0081-3397
000000001 041 L $$aSPA
000000001 088 L $$aJ.E.N. 551
000000001 090 L $$aINFORMES JEN... (22 Replies)
Discussion started by: ldiaz2106
22 Replies
6. Shell Programming and Scripting
Hi,
i want to remove a certain pattern when i type pwd.
pwd will look like this:
..../....../....../Pat_logs/..../....../...../......
the dotted lines are just random directory names,
i want it to remove the "Pat_logs/...../....../....../" part
so for example:
... (5 Replies)
Discussion started by: a27wang
5 Replies
7. Shell Programming and Scripting
Hello
I have this special caracter after retreving rows from sql server:
"....spasses: • Entrem al valort 6050108002811 • El donem..."
I would like a sed command to remove it..or just know it's ascii code in order to replace it into my sql sentence.. Hope some one knows how to do that.... (7 Replies)
Discussion started by: ldiaz2106
7 Replies
8. Shell Programming and Scripting
I have been having an encoding problem that I need to solve.
I have an 4-column tab-separated file: I need to remove all of the lines that contain the string 'vis-à-vis'
achiever-n vis-à-vis+ns-j+vp oppose-v 1
achiever-n vis-à-vis+ns-the+vg assess-v 1
administrator-n ... (4 Replies)
Discussion started by: owwow14
4 Replies
9. Shell Programming and Scripting
Hi All,
I have an ascii file in which few columns are having hex values which i need to convert into ascii. Kindly suggest me what command can be used in unix shell scripting?
Thanks in Advance (2 Replies)
Discussion started by: HemaV
2 Replies
10. Shell Programming and Scripting
Hi
I have file in below format. How i can remove the first and lost comma from this below file
,001E:001F,,,02EE,0FED:0FEF,
I need output has below
001E:001F,,,02EE,0FED:0FEF (6 Replies)
Discussion started by: ranjancom2000
6 Replies
UTF8(5) BSD File Formats Manual UTF8(5)
NAME
utf8 -- UTF-8, a transformation format of ISO 10646
SYNOPSIS
ENCODING "UTF-8"
DESCRIPTION
The UTF-8 encoding represents UCS-4 characters as a sequence of octets, using between 1 and 6 for each character. It is backwards compatible
with ASCII, so 0x00-0x7f refer to the ASCII character set. The multibyte encoding of non-ASCII characters consist entirely of bytes whose
high order bit is set. The actual encoding is represented by the following table:
[0x00000000 - 0x0000007f] [00000000.0bbbbbbb] -> 0bbbbbbb
[0x00000080 - 0x000007ff] [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb
[0x00000800 - 0x0000ffff] [bbbbbbbb.bbbbbbbb] ->
1110bbbb, 10bbbbbb, 10bbbbbb
[0x00010000 - 0x001fffff] [00000000.000bbbbb.bbbbbbbb.bbbbbbbb] ->
11110bbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
[0x00200000 - 0x03ffffff] [000000bb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
111110bb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
[0x04000000 - 0x7fffffff] [0bbbbbbb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
1111110b, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
If more than a single representation of a value exists (for example, 0x00; 0xC0 0x80; 0xE0 0x80 0x80) the shortest representation is always
used. Longer ones are detected as an error as they pose a potential security risk, and destroy the 1:1 character:octet sequence mapping.
SEE ALSO
euc(5)
Rob Pike and Ken Thompson, "Hello World", Proceedings of the Winter 1993 USENIX Technical Conference, USENIX Association, January 1993.
F. Yergeau, UTF-8, a transformation format of ISO 10646, January 1998, RFC 2279.
The Unicode Standard, Version 3.0, The Unicode Consortium, 2000, as amended by the Unicode Standard Annex #27: Unicode 3.1 and by the Unicode
Standard Annex #28: Unicode 3.2.
STANDARDS
The utf8 encoding is compatible with RFC 2279 and Unicode 3.2.
BSD
April 7, 2004 BSD