Let's try another way to get you to understand the problem.
What Don said was correct and very polite. Here is what you have to do.
UTF-8 means all characters have one byte, 256 possibilities ranging from 0 to 255.
So if we read byte-by-byte reach read produces a character we can check. This how computers work. Which can be annoying.
Now if there are wide characters - say 2 bytes wide - and we do not know where they live on a line of UTF-8 bytes, we cannot tell them apart from their UTF-8 neighbors. It takes 2 bytes to create one character. Bottom line: if we think the byte we read is UTF-8, but is really UTF-16 we cannot tell the difference.
In order to do what you want:
Got it? We need information to help. So please help us to help you.
Want a correct answer? Then provide us with choice 1, or choice 2, or choice 3.
To demonstrate Jim's 3rd point and Don's similar point:
Longhand, OSX 10.13.5, default bash terminal running ksh:
You now see why we are mentioning these subtle details...
RudiC's post #6 in this thread does exactly what choice #2 does for you. It finds ASCII values > 127.
Use the bash shell, his example will not work in all shells. Put a shebang as the absolutely first line in your shell script. This invokes bash. I don't even know if you have bash available as a shell or not....
This will cause an immediate error message if you do not have bash. Some other shells may work okay, but since that is still secret we can't help.
This User Gave Thanks to jim mcnamara For This Post:
Hi All,
I am trying to remove (SELECTIVE - passed as argument) Extended ASCII using Awk based on adhoc basis. Can you please let me know how to do it. I have to implement this using awk only.
Thanks & Regads (14 Replies)
Hi,
I want to read extended ASCII characters from keyboard using c language on unix/linux. How to read extended characters from keyboard or by copy-paste in terminal irrespective of locale set in the system. I want to read the input characters from keyboard, store it in an array or some local... (3 Replies)
We are getting extended Ascii characters in the input file and my requirement is to search and replace them with a space. I am using the following command
LANG=C sed -e 's// /g'
It is doing a good job, but in some cases it is replacing the extended characters with two spaces. So my input... (12 Replies)
Hi,
In my file, for few field I have to print the next ASCII character for every character.
In the below file, I have to do for the 2,3 and 5th fields.
Input File
========
1|abc|def|5|ghi
2|jkl|mno|6|pqr
Expected
Ouput file
=======
1|bcd|efg|5|hij
2|klm|nop|6|qrs (2 Replies)
Hi All,
I'm trying to send extended ascii characters to my HP2055 as part of PCL printer control codes. What I want to do is select a bar code font, print the bar code and reset the printer to the default font.
Selecting the bar code font works good. Printing the bar code goes almost ok too. ... (5 Replies)
Hi, I have a accentuated letter (ö) in a script for an Installer. It's a file name. This is not working and I'm told to try using the octal value for the extended ascii character. Does anyone no how to do this? If I had the word "filförval", can I just put in the value between the letters, like... (9 Replies)
hi i would like to check text files if they contain extended ascii characters within or not. i really dont have any idea how to start your kind help would be very much appreciated thanks. (7 Replies)
Hi all,
I would like to change the extended ascii code ( 128 - 255).
I tried to change LC_ALL and LANG in current session ( values from locale -a) and for no good.
Thanks. (0 Replies)