Print byte position of extended ascii character

 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Print byte position of extended ascii character
# 15  
Old 07-16-2018
What's your shell?
And, please show the entire error msg including context. What be "line 4"?
# 16  
Old 07-16-2018
Let's try another way to get you to understand the problem.

What Don said was correct and very polite. Here is what you have to do.

UTF-8 means all characters have one byte, 256 possibilities ranging from 0 to 255.
So if we read byte-by-byte reach read produces a character we can check. This how computers work. Which can be annoying.

Now if there are wide characters - say 2 bytes wide - and we do not know where they live on a line of UTF-8 bytes, we cannot tell them apart from their UTF-8 neighbors. It takes 2 bytes to create one character. Bottom line: if we think the byte we read is UTF-8, but is really UTF-16 we cannot tell the difference.

In order to do what you want:

Code:
1. we have to know where multibyte characters live  ahead of time. If they do exist.

2. If what you are seeing as a problem is really just single byte "high ascii" characters, then any single byte value that is > 127 is a problem and should be reported. 

3. If there are embedded nul (ASCII 0) characters , then we have to read the file in a completely different manner.

Got it? We need information to help. So please help us to help you.
Want a correct answer? Then provide us with choice 1, or choice 2, or choice 3.
# 17  
Old 07-16-2018
To demonstrate Jim's 3rd point and Don's similar point:
Longhand, OSX 10.13.5, default bash terminal running ksh:
Code:
Last login: Mon Jul 16 17:38:24 on ttys000
AMIGA:amiga~> ksh
AMIGA:uw> 
AMIGA:uw> text=$'abcd\007efgh'
AMIGA:uw> printf "%b\n" "${text}"
abcdefgh
AMIGA:uw> # A sound should be generated!
AMIGA:uw> 
AMIGA:uw> echo "${#text}"
9
AMIGA:uw> text=$'abcd\000efgh'
AMIGA:uw> printf "%b\n" "${text}"
abcd
AMIGA:uw> 
AMIGA:uw> # HUH? where are the other characters?
AMIGA:uw> 
AMIGA:uw> echo "${#text}"
4
AMIGA:uw> hexdump -C <<< "${text}"
00000000  61 62 63 64 0a                                    |abcd.|
00000005
AMIGA:uw> # Forever lost due to NULL!
AMIGA:uw> exit
AMIGA:amiga~> _

You now see why we are mentioning these subtle details...
# 18  
Old 07-16-2018
All,

Truly appreciate your inputs in solving the issue.

At this point I cannot confirm on the source encoding as the file hops on various locations before each reaches to me.

I am assuming that it is UTF-8 but the sample data suggests that it is NOT.

Can we move with choice number 2 on post #16

Thank you
# 19  
Old 07-17-2018
RudiC's post #6 in this thread does exactly what choice #2 does for you. It finds ASCII values > 127.

Use the bash shell, his example will not work in all shells. Put a shebang as the absolutely first line in your shell script. This invokes bash. I don't even know if you have bash available as a shell or not....

Code:
#!/bin/bash

This will cause an immediate error message if you do not have bash. Some other shells may work okay, but since that is still secret we can't help.
This User Gave Thanks to jim mcnamara For This Post:
# 20  
Old 07-17-2018
KSH : Version M-11/16/88

---------- Post updated at 03:53 PM ---------- Previous update was at 03:35 PM ----------

I am I missing anything here...I am still receiving an error

Version M-11/16/88f

Code:
 admin@t1g(/opt/test)$ sh -x test127.sh
  test127.sh[2]: 0403-057 Syntax error at line 4 : `(' is not expected.

Here is my script.

Code:
#!/bin/bash
while read T
  do    ((CNT++))
        for ((i=0; i<${#T}; i++))
          do    LC_ALL=C TMP=$(printf "%d\n" "'"${T:i:1})
                [ $TMP -gt 127 ] && printf "%d %c %d\n" $i ${T:i:1} $CNT
          done
  done <Test.TXT

Please advise
# 21  
Old 07-17-2018
Oh. Do not run the script the way you did.
Just runt it like this:
Code:
cd /place/where/script/lives # probably /opt/test
chmod +x test127.sh
./test1237.sh

chmod allows the system to execute the file directly. Does you system have bash?

To find out
Code:
which bash

should produce something like /usr/bin/bash it may have other directories - but if bash exists you will not get a not found notice.

Last edited by jim mcnamara; 07-17-2018 at 08:37 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removal Extended ASCII using awk

Hi All, I am trying to remove (SELECTIVE - passed as argument) Extended ASCII using Awk based on adhoc basis. Can you please let me know how to do it. I have to implement this using awk only. Thanks & Regads (14 Replies)
Discussion started by: tostay2003
14 Replies

2. Programming

How to read extended ASCII characters from stdin?

Hi, I want to read extended ASCII characters from keyboard using c language on unix/linux. How to read extended characters from keyboard or by copy-paste in terminal irrespective of locale set in the system. I want to read the input characters from keyboard, store it in an array or some local... (3 Replies)
Discussion started by: sanzee007
3 Replies

3. Shell Programming and Scripting

Search and Replace Extended Ascii Characters

We are getting extended Ascii characters in the input file and my requirement is to search and replace them with a space. I am using the following command LANG=C sed -e 's// /g' It is doing a good job, but in some cases it is replacing the extended characters with two spaces. So my input... (12 Replies)
Discussion started by: ysvsr1
12 Replies

4. Shell Programming and Scripting

Print the next ASCII character

Hi, In my file, for few field I have to print the next ASCII character for every character. In the below file, I have to do for the 2,3 and 5th fields. Input File ======== 1|abc|def|5|ghi 2|jkl|mno|6|pqr Expected Ouput file ======= 1|bcd|efg|5|hij 2|klm|nop|6|qrs (2 Replies)
Discussion started by: machomaddy
2 Replies

5. AIX

Printing extended ASCII

Hi All, I'm trying to send extended ascii characters to my HP2055 as part of PCL printer control codes. What I want to do is select a bar code font, print the bar code and reset the printer to the default font. Selecting the bar code font works good. Printing the bar code goes almost ok too. ... (5 Replies)
Discussion started by: petervg
5 Replies

6. Shell Programming and Scripting

Print lines with specific character at nth position in a file

I need to print lines with character S at nth position in a file...can someone pl help me with appropriate awk command for this (1 Reply)
Discussion started by: manaswinig
1 Replies

7. Shell Programming and Scripting

Print lines with specific character at nth position in a file

I need to print lines with character S at nth position in a file...can someone pl help me with appropriate awk command for this (2 Replies)
Discussion started by: manaswinig
2 Replies

8. UNIX for Advanced & Expert Users

Processing extended ascii character file names in UNIX (BASH scipts)

Hi, I have a accentuated letter (ö) in a script for an Installer. It's a file name. This is not working and I'm told to try using the octal value for the extended ascii character. Does anyone no how to do this? If I had the word "filförval", can I just put in the value between the letters, like... (9 Replies)
Discussion started by: peli
9 Replies

9. Shell Programming and Scripting

extended ascii problem

hi i would like to check text files if they contain extended ascii characters within or not. i really dont have any idea how to start your kind help would be very much appreciated thanks. (7 Replies)
Discussion started by: smooth
7 Replies

10. Programming

Extended ascii

Hi all, I would like to change the extended ascii code ( 128 - 255). I tried to change LC_ALL and LANG in current session ( values from locale -a) and for no good. Thanks. (0 Replies)
Discussion started by: avis
0 Replies
Login or Register to Ask a Question