I have put one post yesterday and get answer. thanks for your help.
my question today is: what is ascii character for following non printable characters: ( we need filter these characters out in another process) ^MM-^E^MM-^E.
This is the kind of situation we'd need to see an attachment. The ^M's are probably carriage returns, the ^E's are probably escapes, but if this junk is color escape sequences and such like I suspect it is, there's bound to be more.
Assuming that ^MM-^E^MM-^E is using the usual scheme to represent non-printable characters:
For the first block of control characters and the del character:
A byte value of B in the range 00 to 1F and 7F (in decimal, 0 to 31 and 127) is represented by ^X where B = (X+64)%128. Except for 7F, this is equivalent to B = X-64
Analogously, for the second block of control characters and the highest valued byte:
A byte value of B in the range 80 to 9F and FF (in decimal,128 to 159 and 255) is represented by M-^X where B = (X+64)%256. Except for FF, this is equivalent to B = X+64
This leaves two ranges. 20 to 7E (in decimal 32 to 126) are the printable characters (including alphanumerics and punctuation). A byte value in this range represents itself. Its high order counterpart is A0 to FE (in decimal, 160 to 254). A byte value, B, in this range, B, is represented by M-X, where B = X+128.
To recap, there are three types of encoding: ^X, M-^X, M-X. The two beginning with M have the high bit set. The two which include a ^ are the two blocks of control characters.
Under this scheme, ^MM-^E^MM-^E represents 4 characters; a two character sequence repeated twice. The first control character does not have the high bit set while the second one does. ^M is the one that does not. M is decimal 77 in ascii. ^M is then decimal 13 (hex 0D). This is a carriage return.
I'll leave the other byte, M-^E as an exercise for you.
NOTE: Although I doubt it, without any context there is a chance that those could just be literal characters.
Regards,
Alister
---------- Post updated at 01:50 PM ---------- Previous update was at 01:33 PM ----------
For those who enjoy a peek behind the curtain, OpenBSD's and GNU coreutils' cat -v implementation:
Assuming that ^MM-^E^MM-^E is using the usual scheme to represent non-printable characters:
For the first block of control characters and the del character:
A byte value of B in the range 00 to 1F and 7F (in decimal, 0 to 31 and 127) is represented by ^X where B = (X+64)%128. Except for 7F, this is equivalent to B = X-64
Analogously, for the second block of control characters and the highest valued byte:
A byte value of B in the range 80 to 9F and FF (in decimal,128 to 159 and 255) is represented by M-^X where B = (X+64)%256. Except for FF, this is equivalent to B = X+64
This leaves two ranges. 20 to 7E (in decimal 32 to 126) are the printable characters (including alphanumerics and punctuation). A byte value in this range represents itself. Its high order counterpart is A0 to FE (in decimal, 160 to 254). A byte value, B, in this range, B, is represented by M-X, where B = X+128.
To recap, there are three types of encoding: ^X, M-^X, M-X. The two beginning with M have the high bit set. The two which include a ^ are the two blocks of control characters.
Under this scheme, ^MM-^E^MM-^E represents 4 characters; a two character sequence repeated twice. The first control character does not have the high bit set while the second one does. ^M is the one that does not. M is decimal 77 in ascii. ^M is then decimal 13 (hex 0D). This is a carriage return.
I'll leave the other byte, M-^E as an exercise for you.
NOTE: Although I doubt it, without any context there is a chance that those could just be literal characters.
Regards,
Alister
---------- Post updated at 01:50 PM ---------- Previous update was at 01:33 PM ----------
For those who enjoy a peek behind the curtain, OpenBSD's and GNU coreutils' cat -v implementation:
Thanks for your reply, it is great help. Right now, I can remove ^M by putting condition CHR(13), but M-^E still there. I tried CHR(133) (I searched internet, CHR(133) match OCTAL 205 code), somehow it doesn't work, I can not remove these special characters in unix. I must remove the before dumping the file into unix. Would you please take a look which CHR value I should use to remove these characters.
Hello
I have this special caracter after retreving rows from sql server:
"....spasses: • Entrem al valort 6050108002811 • El donem..."
I would like a sed command to remove it..or just know it's ascii code in order to replace it into my sql sentence.. Hope some one knows how to do that.... (7 Replies)
Hi,
In my file, for few field I have to print the next ASCII character for every character.
In the below file, I have to do for the 2,3 and 5th fields.
Input File
========
1|abc|def|5|ghi
2|jkl|mno|6|pqr
Expected
Ouput file
=======
1|bcd|efg|5|hij
2|klm|nop|6|qrs (2 Replies)
I have one file .dat file on windows server containg the following text
"Bürki"
Now When I am using FTP (get) command from UNIX server the text is appering is as "Bürki"
I want to preserve the text in the file on UNIX server as it is in source file.
Could you please suggest some... (2 Replies)
A very simple question but I have scoured the web and can't find an answer. How do I search for a character by ASCII code in a regular expression using grep?
For example, we use the End of Medium symbol as a delimiter in certain files. (this is ascii 031 in oct, displays as ^Y) I want to grep... (6 Replies)
Hi, I need to do a global search and replacement of a non-ascii character. Let me first give the background of my problem.
Very frequently, I need to copy set of references from different sources. Typically, a reference would like this:
Banumathy et al., 2002 G. Banumathy, V. Singh and U.... (1 Reply)
Can someone help me to write a script / command to read in a file, character by character, replace any unknown ASCII characters with space. then write out the file to a new filename/
Thanks! (1 Reply)
Is there a way to determine the ascii value of a character? For example, let's say a shell variable has the value 'A'. I would like it's ascii value (e.g. 65 in this case). I would like to do this from a script (preferably ksh). (12 Replies)
Hi,
Can I know how to grep for lines with non-ascii characters in a file?
If not grep, at least can we do it with command-line perl or awk? I tried the functionality of perl, but still could not get the result. Any help??
PS: I was sure that someone should have asked this question... (9 Replies)
Hey all,
Just found your forum...Looks super rich with info! Can't wait to get through it all.
I am currently writing a web app in .net that telnets into a unix server (require uid + passwd), runs a command, and returns that output to be displayed on the web page.
I have gotten through the... (8 Replies)