What is ASCII character?

10-31-2012

ken002

Guest

n/a, 0

Posts: n/a

What is ASCII character?

Hi Guru,

I have put one post yesterday and get answer. thanks for your help.

my question today is: what is ascii character for following non printable characters: ( we need filter these characters out in another process)
^MM-^E^MM-^E.

Old post link: https://www.unix.com/shell-programmin...haracters.html

Thanks in advance

Smilie

ken002

10-31-2012

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

This is the kind of situation we'd need to see an attachment. The ^M's are probably carriage returns, the ^E's are probably escapes, but if this junk is color escape sequences and such like I suspect it is, there's bound to be more.

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

10-31-2012

Registered User

3,231, 978

Join Date: Dec 2009

Last Activity: 11 June 2014, 8:40 PM EDT

Posts: 3,231

Thanks Given: 179

Thanked 978 Times in 791 Posts

Assuming that ^MM-^E^MM-^E is using the usual scheme to represent non-printable characters:

For the first block of control characters and the del character:
A byte value of B in the range 00 to 1F and 7F (in decimal, 0 to 31 and 127) is represented by ^X where B = (X+64)%128. Except for 7F, this is equivalent to B = X-64

Analogously, for the second block of control characters and the highest valued byte:
A byte value of B in the range 80 to 9F and FF (in decimal,128 to 159 and 255) is represented by M-^X where B = (X+64)%256. Except for FF, this is equivalent to B = X+64

This leaves two ranges. 20 to 7E (in decimal 32 to 126) are the printable characters (including alphanumerics and punctuation). A byte value in this range represents itself. Its high order counterpart is A0 to FE (in decimal, 160 to 254). A byte value, B, in this range, B, is represented by M-X, where B = X+128.

To recap, there are three types of encoding: ^X, M-^X, M-X. The two beginning with M have the high bit set. The two which include a ^ are the two blocks of control characters.

Under this scheme, ^MM-^E^MM-^E represents 4 characters; a two character sequence repeated twice. The first control character does not have the high bit set while the second one does. ^M is the one that does not. M is decimal 77 in ascii. ^M is then decimal 13 (hex 0D). This is a carriage return.

I'll leave the other byte, M-^E as an exercise for you.

NOTE: Although I doubt it, without any context there is a chance that those could just be literal characters.

Regards,
Alister

---------- Post updated at 01:50 PM ---------- Previous update was at 01:33 PM ----------

For those who enjoy a peek behind the curtain, OpenBSD's and GNU coreutils' cat -v implementation:

OpenBSD cat.c:

Code:

		} else if (vflag) {
			if (!isascii(ch)) {
				if (putchar('M') == EOF || putchar('-') == EOF)
					break;
				ch = toascii(ch);
			}
			if (iscntrl(ch)) {
				if (putchar('^') == EOF ||
				    putchar(ch == '\177' ? '?' :
				    ch | 0100) == EOF)
					break;
				continue;
			}
		}

GNU coreutils cat.c:

Code:

      if (show_nonprinting)
        {
          while (true)
            {
              if (ch >= 32)
                {
                  if (ch < 127)
                    *bpout++ = ch;
                  else if (ch == 127)
                    {
                      *bpout++ = '^';
                      *bpout++ = '?';
                    }
                  else
                    {
                      *bpout++ = 'M';
                      *bpout++ = '-';
                      if (ch >= 128 + 32)
                        {
                          if (ch < 128 + 127)
                            *bpout++ = ch - 128;
                          else
                            {
                              *bpout++ = '^';
                              *bpout++ = '?';
                            }
                        }
                      else
                        {
                          *bpout++ = '^';
                          *bpout++ = ch - 128 + 64;
                        }
                    }
                }
              else if (ch == '\t' && !show_tabs)
                *bpout++ = '\t';
              else if (ch == '\n')
                {
                  newlines = -1;
                  break;
                }
              else
                {
                  *bpout++ = '^';
                  *bpout++ = ch + 64;
                }

              ch = *bpin++;
            }
        }

Regards,
Alister

This User Gave Thanks to alister For This Post:

alister

View Public Profile for alister

Find all posts by alister

10-31-2012

Registered User

2,524, 241

Join Date: Dec 2007

Last Activity: 17 March 2020, 2:04 PM EDT

Posts: 2,524

Thanks Given: 173

Thanked 241 Times in 206 Posts

Here is a way to determine the characters

Code:

$ echo hello this is weird | od -An -t dC -w10
  104  101  108  108  111   32  116  104  105  115
   32  105  115   32  119  101  105  114  100   10

That command at the end will provide the ASCII codes for the characters in question. 104=h, 101=e, 108=l, and so on

joeyg

View Public Profile for joeyg

Find all posts by joeyg

11-04-2012

ken002

Guest

n/a, 0

Posts: n/a

Quote:

Originally Posted by alister

Code:

        } else if (vflag) {
            if (!isascii(ch)) {
                if (putchar('M') == EOF || putchar('-') == EOF)
                    break;
                ch = toascii(ch);
            }
            if (iscntrl(ch)) {
                if (putchar('^') == EOF ||
                    putchar(ch == '\177' ? '?' :
                    ch | 0100) == EOF)
                    break;
                continue;
            }
        }

GNU coreutils cat.c:

Code:

      if (show_nonprinting)
        {
          while (true)
            {
              if (ch >= 32)
                {
                  if (ch < 127)
                    *bpout++ = ch;
                  else if (ch == 127)
                    {
                      *bpout++ = '^';
                      *bpout++ = '?';
                    }
                  else
                    {
                      *bpout++ = 'M';
                      *bpout++ = '-';
                      if (ch >= 128 + 32)
                        {
                          if (ch < 128 + 127)
                            *bpout++ = ch - 128;
                          else
                            {
                              *bpout++ = '^';
                              *bpout++ = '?';
                            }
                        }
                      else
                        {
                          *bpout++ = '^';
                          *bpout++ = ch - 128 + 64;
                        }
                    }
                }
              else if (ch == '\t' && !show_tabs)
                *bpout++ = '\t';
              else if (ch == '\n')
                {
                  newlines = -1;
                  break;
                }
              else
                {
                  *bpout++ = '^';
                  *bpout++ = ch + 64;
                }
 
              ch = *bpin++;
            }
        }

Regards,
Alister

Hi Alister,

Thanks for your reply, it is great help. Right now, I can remove ^M by putting condition CHR(13), but M-^E still there. I tried CHR(133) (I searched internet, CHR(133) match OCTAL 205 code), somehow it doesn't work, I can not remove these special characters in unix. I must remove the before dumping the file into unix. Would you please take a look which CHR value I should use to remove these characters.

Thanks in advance

ken002

11-04-2012

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

https://www.unix.com/shell-programmin...#post302726395

Pls attach (relevant part of) your mainframe text file so it can be analysed.

RudiC

View Public Profile for RudiC

Find all posts by RudiC

UNIX for Dummies Questions & Answers

What is ASCII character?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove some special ascii character

Discussion started by: ldiaz2106

2. Shell Programming and Scripting

Print the next ASCII character

Discussion started by: machomaddy

3. Shell Programming and Scripting

FTP Issue with Non ascii character

Discussion started by: Bhushan D

4. UNIX for Dummies Questions & Answers

How to grep for a non-standard ASCII character?

Discussion started by: DJR

5. UNIX for Advanced & Expert Users

ASCII Character Set

Discussion started by: cokedude

6. UNIX for Dummies Questions & Answers

global search and replacement of a non-ascii character

Discussion started by: effjay

7. Shell Programming and Scripting

read in a file character by character - replace any unknown ASCII characters with spa

Discussion started by: raghav525

8. UNIX for Dummies Questions & Answers

Ascii value of character?

Discussion started by: sszd

9. UNIX for Dummies Questions & Answers

Non-ascii character detection (perl or grep)

Discussion started by: srinivasan_85

10. UNIX for Dummies Questions & Answers

ASCII character to accept logon password

Discussion started by: gord