Print byte position of extended ascii character

07-16-2018

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

What's your shell?
And, please show the entire error msg including context. What be "line 4"?

RudiC

View Public Profile for RudiC

Find all posts by RudiC

07-16-2018

Registered User

11,728, 1,345

Join Date: Feb 2004

Last Activity: 8 May 2020, 9:07 AM EDT

Location: NM

Posts: 11,728

Thanks Given: 903

Thanked 1,345 Times in 1,201 Posts

Let's try another way to get you to understand the problem.

What Don said was correct and very polite. Here is what you have to do.

UTF-8 means all characters have one byte, 256 possibilities ranging from 0 to 255.
So if we read byte-by-byte reach read produces a character we can check. This how computers work. Which can be annoying.

Now if there are wide characters - say 2 bytes wide - and we do not know where they live on a line of UTF-8 bytes, we cannot tell them apart from their UTF-8 neighbors. It takes 2 bytes to create one character. Bottom line: if we think the byte we read is UTF-8, but is really UTF-16 we cannot tell the difference.

In order to do what you want:

Code:

1. we have to know where multibyte characters live  ahead of time. If they do exist.

2. If what you are seeing as a problem is really just single byte "high ascii" characters, then any single byte value that is > 127 is a problem and should be reported. 

3. If there are embedded nul (ASCII 0) characters , then we have to read the file in a completely different manner.

Got it? We need information to help. So please help us to help you.
Want a correct answer? Then provide us with choice 1, or choice 2, or choice 3.

jim mcnamara

View Public Profile for jim mcnamara

Find all posts by jim mcnamara

07-16-2018

Registered User

1,709, 666

Join Date: Jan 2013

Last Activity: 20 May 2020, 1:43 PM EDT

Location: Loughborough

Posts: 1,709

Thanks Given: 838

Thanked 666 Times in 467 Posts

To demonstrate Jim's 3rd point and Don's similar point:
Longhand, OSX 10.13.5, default bash terminal running ksh:

Code:

Last login: Mon Jul 16 17:38:24 on ttys000
AMIGA:amiga~> ksh
AMIGA:uw> 
AMIGA:uw> text=$'abcd\007efgh'
AMIGA:uw> printf "%b\n" "${text}"
abcdefgh
AMIGA:uw> # A sound should be generated!
AMIGA:uw> 
AMIGA:uw> echo "${#text}"
9
AMIGA:uw> text=$'abcd\000efgh'
AMIGA:uw> printf "%b\n" "${text}"
abcd
AMIGA:uw> 
AMIGA:uw> # HUH? where are the other characters?
AMIGA:uw> 
AMIGA:uw> echo "${#text}"
4
AMIGA:uw> hexdump -C <<< "${text}"
00000000  61 62 63 64 0a                                    |abcd.|
00000005
AMIGA:uw> # Forever lost due to NULL!
AMIGA:uw> exit
AMIGA:amiga~> _

You now see why we are mentioning these subtle details...

wisecracker

View Public Profile for wisecracker

Find all posts by wisecracker

07-16-2018

Registered User

37, 0

Join Date: Jul 2018

Last Activity: 26 December 2019, 10:13 AM EST

Posts: 37

Thanks Given: 9

Thanked 0 Times in 0 Posts

All,

Truly appreciate your inputs in solving the issue.

At this point I cannot confirm on the source encoding as the file hops on various locations before each reaches to me.

I am assuming that it is UTF-8 but the sample data suggests that it is NOT.

Can we move with choice number 2 on post #16

Thank you

rosebud123

View Public Profile for rosebud123

Find all posts by rosebud123

07-17-2018

Registered User

11,728, 1,345

Join Date: Feb 2004

Last Activity: 8 May 2020, 9:07 AM EDT

Location: NM

Posts: 11,728

Thanks Given: 903

Thanked 1,345 Times in 1,201 Posts

RudiC's post #6 in this thread does exactly what choice #2 does for you. It finds ASCII values > 127.

Use the bash shell, his example will not work in all shells. Put a shebang as the absolutely first line in your shell script. This invokes bash. I don't even know if you have bash available as a shell or not....

Code:

#!/bin/bash

This will cause an immediate error message if you do not have bash. Some other shells may work okay, but since that is still secret we can't help.

This User Gave Thanks to jim mcnamara For This Post:

jim mcnamara

View Public Profile for jim mcnamara

Find all posts by jim mcnamara

07-17-2018

Registered User

37, 0

Join Date: Jul 2018

Last Activity: 26 December 2019, 10:13 AM EST

Posts: 37

Thanks Given: 9

Thanked 0 Times in 0 Posts

KSH : Version M-11/16/88

---------- Post updated at 03:53 PM ---------- Previous update was at 03:35 PM ----------

I am I missing anything here...I am still receiving an error

Version M-11/16/88f

Code:

 admin@t1g(/opt/test)$ sh -x test127.sh
  test127.sh[2]: 0403-057 Syntax error at line 4 : `(' is not expected.

Here is my script.

Code:

#!/bin/bash
while read T
  do    ((CNT++))
        for ((i=0; i<${#T}; i++))
          do    LC_ALL=C TMP=$(printf "%d\n" "'"${T:i:1})
                [ $TMP -gt 127 ] && printf "%d %c %d\n" $i ${T:i:1} $CNT
          done
  done <Test.TXT

Please advise

rosebud123

View Public Profile for rosebud123

Find all posts by rosebud123

07-17-2018

Registered User

11,728, 1,345

Join Date: Feb 2004

Last Activity: 8 May 2020, 9:07 AM EDT

Location: NM

Posts: 11,728

Thanks Given: 903

Thanked 1,345 Times in 1,201 Posts

Oh. Do not run the script the way you did.
Just runt it like this:

Code:

cd /place/where/script/lives # probably /opt/test
chmod +x test127.sh
./test1237.sh

chmod allows the system to execute the file directly. Does you system have bash?

To find out

Code:

which bash

should produce something like /usr/bin/bash it may have other directories - but if bash exists you will not get a not found notice.

Last edited by jim mcnamara; 07-17-2018 at 08:37 PM..

jim mcnamara

View Public Profile for jim mcnamara

Find all posts by jim mcnamara

UNIX for Beginners Questions & Answers

Print byte position of extended ascii character

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removal Extended ASCII using awk

Discussion started by: tostay2003

2. Programming

How to read extended ASCII characters from stdin?

Discussion started by: sanzee007

3. Shell Programming and Scripting

Search and Replace Extended Ascii Characters

Discussion started by: ysvsr1

4. Shell Programming and Scripting

Print the next ASCII character

Discussion started by: machomaddy

5. AIX

Printing extended ASCII

Discussion started by: petervg

6. Shell Programming and Scripting

Print lines with specific character at nth position in a file

Discussion started by: manaswinig

7. Shell Programming and Scripting

Print lines with specific character at nth position in a file

Discussion started by: manaswinig

8. UNIX for Advanced & Expert Users

Processing extended ascii character file names in UNIX (BASH scipts)

Discussion started by: peli

9. Shell Programming and Scripting

extended ascii problem

Discussion started by: smooth

10. Programming

Extended ascii

Discussion started by: avis