Well your BYTE is much more complex than that!
You meant CHARACTER, BUT, take a look at your file snippet:
With the results, (OSX 10.13.5, default bash terminal.):
As you can see there are multiple bytes including low byte values too, that is, as an example, '[0x]03', '[0x]02' etc... etc... '[0x]0a' is the newline so that can be ignored here...
This is not straightforward as we have no idea what these low value bytes do, are they hidden characters etc... etc?
Sometimes the extended character has 2 bytes and sometimes more, ( 03 c3 bb c3 81 ), with those added strange low byte values that were unknown to us all without me looking first.
As I pointed out before 'hexdump', (or 'od' or 'xxd'), is(/are) your initial friends here...
This combination is particularly hard to catch c3 ac 07 c2 a9 what does the '[0x]07' do here?
Much more information is needed before we can proceed, assuming there is a solution.
CHARACTERS and HIDDEN characters are not the same as bytes as you have now discovered...
And finally, the bizarre thing is your last line here does NOT have a low byte value so what is its requirement as they ARE technically ASCII characters, albeit control ones.
EDIT:
I have just noticed this 02 3d , are these 2 real ASCII characters or one _imaginary_ and one real?
I have a sneaking suspicion that the BYTES following the spaces should be 4 BYTE pointers of some description AND have become corrupted by those _EXTENDED_ characters! Hence the varying number of bytes before the numerical ?DATE? value.
Last edited by wisecracker; 07-15-2018 at 03:30 AM..
Reason: See above...
My Input file contains combination of ascii/extended ascii/unprintable/double byte characters
The idea is to find these extended ascii/unprintable/doublebyte characters and provide an output file in the required format.
Is there a way we can do converse operation where all good characters are replaced with some constant value and the problem child's are left as is and from there we can do another operation to get the desired output.
By definition, a text file cannot contain NUL bytes.
If the file you're reading contains pointers or other binary values. You need to really understand the format of the data you are processing and use tools appropriate to your task. Without understanding the format of the data you're reading, all bets are off. Note that the format includes not only knowing where there are binary values in your data (if there are any), but also knowing what codeset is being used to encode characters in your file. (For example, there is obviously a big difference between extended ASCII characters encoded in ISO 8859-1 and extended ASCII character encoded in UTF-8.)
These 2 Users Gave Thanks to Don Cragun For This Post:
Hi All,
I am trying to remove (SELECTIVE - passed as argument) Extended ASCII using Awk based on adhoc basis. Can you please let me know how to do it. I have to implement this using awk only.
Thanks & Regads (14 Replies)
Hi,
I want to read extended ASCII characters from keyboard using c language on unix/linux. How to read extended characters from keyboard or by copy-paste in terminal irrespective of locale set in the system. I want to read the input characters from keyboard, store it in an array or some local... (3 Replies)
We are getting extended Ascii characters in the input file and my requirement is to search and replace them with a space. I am using the following command
LANG=C sed -e 's// /g'
It is doing a good job, but in some cases it is replacing the extended characters with two spaces. So my input... (12 Replies)
Hi,
In my file, for few field I have to print the next ASCII character for every character.
In the below file, I have to do for the 2,3 and 5th fields.
Input File
========
1|abc|def|5|ghi
2|jkl|mno|6|pqr
Expected
Ouput file
=======
1|bcd|efg|5|hij
2|klm|nop|6|qrs (2 Replies)
Hi All,
I'm trying to send extended ascii characters to my HP2055 as part of PCL printer control codes. What I want to do is select a bar code font, print the bar code and reset the printer to the default font.
Selecting the bar code font works good. Printing the bar code goes almost ok too. ... (5 Replies)
Hi, I have a accentuated letter (ö) in a script for an Installer. It's a file name. This is not working and I'm told to try using the octal value for the extended ascii character. Does anyone no how to do this? If I had the word "filförval", can I just put in the value between the letters, like... (9 Replies)
hi i would like to check text files if they contain extended ascii characters within or not. i really dont have any idea how to start your kind help would be very much appreciated thanks. (7 Replies)
Hi all,
I would like to change the extended ascii code ( 128 - 255).
I tried to change LC_ALL and LANG in current session ( values from locale -a) and for no good.
Thanks. (0 Replies)