What is ASCII character?


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers What is ASCII character?
# 1  
Old 10-31-2012
What is ASCII character?

Hi Guru,

I have put one post yesterday and get answer. thanks for your help.

my question today is: what is ascii character for following non printable characters: ( we need filter these characters out in another process)
^MM-^E^MM-^E.

Old post link: https://www.unix.com/shell-programmin...haracters.html

Thanks in advance

Smilie
ken002
# 2  
Old 10-31-2012
This is the kind of situation we'd need to see an attachment. The ^M's are probably carriage returns, the ^E's are probably escapes, but if this junk is color escape sequences and such like I suspect it is, there's bound to be more.
# 3  
Old 10-31-2012
Assuming that ^MM-^E^MM-^E is using the usual scheme to represent non-printable characters:

For the first block of control characters and the del character:
A byte value of B in the range 00 to 1F and 7F (in decimal, 0 to 31 and 127) is represented by ^X where B = (X+64)%128. Except for 7F, this is equivalent to B = X-64

Analogously, for the second block of control characters and the highest valued byte:
A byte value of B in the range 80 to 9F and FF (in decimal,128 to 159 and 255) is represented by M-^X where B = (X+64)%256. Except for FF, this is equivalent to B = X+64

This leaves two ranges. 20 to 7E (in decimal 32 to 126) are the printable characters (including alphanumerics and punctuation). A byte value in this range represents itself. Its high order counterpart is A0 to FE (in decimal, 160 to 254). A byte value, B, in this range, B, is represented by M-X, where B = X+128.

To recap, there are three types of encoding: ^X, M-^X, M-X. The two beginning with M have the high bit set. The two which include a ^ are the two blocks of control characters.

Under this scheme, ^MM-^E^MM-^E represents 4 characters; a two character sequence repeated twice. The first control character does not have the high bit set while the second one does. ^M is the one that does not. M is decimal 77 in ascii. ^M is then decimal 13 (hex 0D). This is a carriage return.

I'll leave the other byte, M-^E as an exercise for you.

NOTE: Although I doubt it, without any context there is a chance that those could just be literal characters.

Regards,
Alister

---------- Post updated at 01:50 PM ---------- Previous update was at 01:33 PM ----------

For those who enjoy a peek behind the curtain, OpenBSD's and GNU coreutils' cat -v implementation:

OpenBSD cat.c:
Code:
		} else if (vflag) {
			if (!isascii(ch)) {
				if (putchar('M') == EOF || putchar('-') == EOF)
					break;
				ch = toascii(ch);
			}
			if (iscntrl(ch)) {
				if (putchar('^') == EOF ||
				    putchar(ch == '\177' ? '?' :
				    ch | 0100) == EOF)
					break;
				continue;
			}
		}

GNU coreutils cat.c:
Code:
      if (show_nonprinting)
        {
          while (true)
            {
              if (ch >= 32)
                {
                  if (ch < 127)
                    *bpout++ = ch;
                  else if (ch == 127)
                    {
                      *bpout++ = '^';
                      *bpout++ = '?';
                    }
                  else
                    {
                      *bpout++ = 'M';
                      *bpout++ = '-';
                      if (ch >= 128 + 32)
                        {
                          if (ch < 128 + 127)
                            *bpout++ = ch - 128;
                          else
                            {
                              *bpout++ = '^';
                              *bpout++ = '?';
                            }
                        }
                      else
                        {
                          *bpout++ = '^';
                          *bpout++ = ch - 128 + 64;
                        }
                    }
                }
              else if (ch == '\t' && !show_tabs)
                *bpout++ = '\t';
              else if (ch == '\n')
                {
                  newlines = -1;
                  break;
                }
              else
                {
                  *bpout++ = '^';
                  *bpout++ = ch + 64;
                }

              ch = *bpin++;
            }
        }

Regards,
Alister
This User Gave Thanks to alister For This Post:
# 4  
Old 10-31-2012
Here is a way to determine the characters

Code:
$ echo hello this is weird | od -An -t dC -w10
  104  101  108  108  111   32  116  104  105  115
   32  105  115   32  119  101  105  114  100   10

That command at the end will provide the ASCII codes for the characters in question. 104=h, 101=e, 108=l, and so on
# 5  
Old 11-04-2012
Quote:
Originally Posted by alister
Assuming that ^MM-^E^MM-^E is using the usual scheme to represent non-printable characters:

For the first block of control characters and the del character:
A byte value of B in the range 00 to 1F and 7F (in decimal, 0 to 31 and 127) is represented by ^X where B = (X+64)%128. Except for 7F, this is equivalent to B = X-64

Analogously, for the second block of control characters and the highest valued byte:
A byte value of B in the range 80 to 9F and FF (in decimal,128 to 159 and 255) is represented by M-^X where B = (X+64)%256. Except for FF, this is equivalent to B = X+64

This leaves two ranges. 20 to 7E (in decimal 32 to 126) are the printable characters (including alphanumerics and punctuation). A byte value in this range represents itself. Its high order counterpart is A0 to FE (in decimal, 160 to 254). A byte value, B, in this range, B, is represented by M-X, where B = X+128.

To recap, there are three types of encoding: ^X, M-^X, M-X. The two beginning with M have the high bit set. The two which include a ^ are the two blocks of control characters.

Under this scheme, ^MM-^E^MM-^E represents 4 characters; a two character sequence repeated twice. The first control character does not have the high bit set while the second one does. ^M is the one that does not. M is decimal 77 in ascii. ^M is then decimal 13 (hex 0D). This is a carriage return.

I'll leave the other byte, M-^E as an exercise for you.

NOTE: Although I doubt it, without any context there is a chance that those could just be literal characters.

Regards,
Alister

---------- Post updated at 01:50 PM ---------- Previous update was at 01:33 PM ----------

For those who enjoy a peek behind the curtain, OpenBSD's and GNU coreutils' cat -v implementation:

OpenBSD cat.c:
Code:
        } else if (vflag) {
            if (!isascii(ch)) {
                if (putchar('M') == EOF || putchar('-') == EOF)
                    break;
                ch = toascii(ch);
            }
            if (iscntrl(ch)) {
                if (putchar('^') == EOF ||
                    putchar(ch == '\177' ? '?' :
                    ch | 0100) == EOF)
                    break;
                continue;
            }
        }

GNU coreutils cat.c:
Code:
      if (show_nonprinting)
        {
          while (true)
            {
              if (ch >= 32)
                {
                  if (ch < 127)
                    *bpout++ = ch;
                  else if (ch == 127)
                    {
                      *bpout++ = '^';
                      *bpout++ = '?';
                    }
                  else
                    {
                      *bpout++ = 'M';
                      *bpout++ = '-';
                      if (ch >= 128 + 32)
                        {
                          if (ch < 128 + 127)
                            *bpout++ = ch - 128;
                          else
                            {
                              *bpout++ = '^';
                              *bpout++ = '?';
                            }
                        }
                      else
                        {
                          *bpout++ = '^';
                          *bpout++ = ch - 128 + 64;
                        }
                    }
                }
              else if (ch == '\t' && !show_tabs)
                *bpout++ = '\t';
              else if (ch == '\n')
                {
                  newlines = -1;
                  break;
                }
              else
                {
                  *bpout++ = '^';
                  *bpout++ = ch + 64;
                }
 
              ch = *bpin++;
            }
        }

Regards,
Alister
Hi Alister,

Thanks for your reply, it is great help. Right now, I can remove ^M by putting condition CHR(13), but M-^E still there. I tried CHR(133) (I searched internet, CHR(133) match OCTAL 205 code), somehow it doesn't work, I can not remove these special characters in unix. I must remove the before dumping the file into unix. Would you please take a look which CHR value I should use to remove these characters.

Thanks in advance

Smilie
ken002
# 6  
Old 11-04-2012
https://www.unix.com/shell-programmin...#post302726395

Pls attach (relevant part of) your mainframe text file so it can be analysed.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove some special ascii character

Hello I have this special caracter after retreving rows from sql server: "....spasses: • Entrem al valort 6050108002811 • El donem..." I would like a sed command to remove it..or just know it's ascii code in order to replace it into my sql sentence.. Hope some one knows how to do that.... (7 Replies)
Discussion started by: ldiaz2106
7 Replies

2. Shell Programming and Scripting

Print the next ASCII character

Hi, In my file, for few field I have to print the next ASCII character for every character. In the below file, I have to do for the 2,3 and 5th fields. Input File ======== 1|abc|def|5|ghi 2|jkl|mno|6|pqr Expected Ouput file ======= 1|bcd|efg|5|hij 2|klm|nop|6|qrs (2 Replies)
Discussion started by: machomaddy
2 Replies

3. Shell Programming and Scripting

FTP Issue with Non ascii character

I have one file .dat file on windows server containg the following text "Bürki" Now When I am using FTP (get) command from UNIX server the text is appering is as "Bürki" I want to preserve the text in the file on UNIX server as it is in source file. Could you please suggest some... (2 Replies)
Discussion started by: Bhushan D
2 Replies

4. UNIX for Dummies Questions & Answers

How to grep for a non-standard ASCII character?

A very simple question but I have scoured the web and can't find an answer. How do I search for a character by ASCII code in a regular expression using grep? For example, we use the End of Medium symbol as a delimiter in certain files. (this is ascii 031 in oct, displays as ^Y) I want to grep... (6 Replies)
Discussion started by: DJR
6 Replies

5. UNIX for Advanced & Expert Users

ASCII Character Set

I thought I would point this out. This has a lot of the non printing characters. ASCII Character Set (7 Replies)
Discussion started by: cokedude
7 Replies

6. UNIX for Dummies Questions & Answers

global search and replacement of a non-ascii character

Hi, I need to do a global search and replacement of a non-ascii character. Let me first give the background of my problem. Very frequently, I need to copy set of references from different sources. Typically, a reference would like this: Banumathy et al., 2002 G. Banumathy, V. Singh and U.... (1 Reply)
Discussion started by: effjay
1 Replies

7. Shell Programming and Scripting

read in a file character by character - replace any unknown ASCII characters with spa

Can someone help me to write a script / command to read in a file, character by character, replace any unknown ASCII characters with space. then write out the file to a new filename/ Thanks! (1 Reply)
Discussion started by: raghav525
1 Replies

8. UNIX for Dummies Questions & Answers

Ascii value of character?

Is there a way to determine the ascii value of a character? For example, let's say a shell variable has the value 'A'. I would like it's ascii value (e.g. 65 in this case). I would like to do this from a script (preferably ksh). (12 Replies)
Discussion started by: sszd
12 Replies

9. UNIX for Dummies Questions & Answers

Non-ascii character detection (perl or grep)

Hi, Can I know how to grep for lines with non-ascii characters in a file? If not grep, at least can we do it with command-line perl or awk? I tried the functionality of perl, but still could not get the result. Any help?? PS: I was sure that someone should have asked this question... (9 Replies)
Discussion started by: srinivasan_85
9 Replies

10. UNIX for Dummies Questions & Answers

ASCII character to accept logon password

Hey all, Just found your forum...Looks super rich with info! Can't wait to get through it all. I am currently writing a web app in .net that telnets into a unix server (require uid + passwd), runs a command, and returns that output to be displayed on the web page. I have gotten through the... (8 Replies)
Discussion started by: gord
8 Replies
Login or Register to Ask a Question