Print byte position of extended ascii character

 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Print byte position of extended ascii character
# 8  
Old 07-14-2018
What locale are you using? UTF-8, Unicode?

Without more information we can't help very much
# 9  
Old 07-14-2018
Yes UTF-8
# 10  
Old 07-15-2018
Well your BYTE is much more complex than that!
You meant CHARACTER, BUT, take a look at your file snippet:
Code:
#!/bin/bash
text='MDQ ŸD201803132018031400
MDQ "ã201707112018071100
MDQ =ÿ201605202018052000
MDQ "Ä201605202018052000
QDX ûÁ201705012018050200
MDQ ì©201708102018081000
QDU ìc-201708092018080900'
hexdump -C <<< "$text"

With the results, (OSX 10.13.5, default bash terminal.):
Code:
Last login: Sun Jul 15 05:28:32 on ttys000
AMIGA:amiga~> cd Desktop/Code/Shell
AMIGA:amiga~/Desktop/Code/Shell> ./byte_or_char.sh
00000000  4d 44 51 20 03 c5 b8 44  32 30 31 38 30 33 31 33  |MDQ ...D20180313|
00000010  32 30 31 38 30 33 31 34  30 30 0a 4d 44 51 20 02  |2018031400.MDQ .|
00000020  22 c3 a3 32 30 31 37 30  37 31 31 32 30 31 38 30  |"..2017071120180|
00000030  37 31 31 30 30 0a 4d 44  51 20 02 3d c3 bf 32 30  |71100.MDQ .=..20|
00000040  31 36 30 35 32 30 32 30  31 38 30 35 32 30 30 30  |1605202018052000|
00000050  0a 4d 44 51 20 02 22 c3  84 32 30 31 36 30 35 32  |.MDQ ."..2016052|
00000060  30 32 30 31 38 30 35 32  30 30 30 0a 51 44 58 20  |02018052000.QDX |
00000070  03 c3 bb c3 81 32 30 31  37 30 35 30 31 32 30 31  |.....20170501201|
00000080  38 30 35 30 32 30 30 0a  4d 44 51 20 c3 ac 07 c2  |8050200.MDQ ....|
00000090  a9 32 30 31 37 30 38 31  30 32 30 31 38 30 38 31  |.201708102018081|
000000a0  30 30 30 0a 51 44 55 20  c3 ac 63 2d 32 30 31 37  |000.QDU ..c-2017|
000000b0  30 38 30 39 32 30 31 38  30 38 30 39 30 30 0a     |08092018080900.|
000000bf
AMIGA:amiga~/Desktop/Code/Shell> _

As you can see there are multiple bytes including low byte values too, that is, as an example, '[0x]03', '[0x]02' etc... etc... '[0x]0a' is the newline so that can be ignored here...
This is not straightforward as we have no idea what these low value bytes do, are they hidden characters etc... etc?
Sometimes the extended character has 2 bytes and sometimes more, ( 03 c3 bb c3 81 ), with those added strange low byte values that were unknown to us all without me looking first.
As I pointed out before 'hexdump', (or 'od' or 'xxd'), is(/are) your initial friends here...
This combination is particularly hard to catch c3 ac 07 c2 a9 what does the '[0x]07' do here?
Much more information is needed before we can proceed, assuming there is a solution.
CHARACTERS and HIDDEN characters are not the same as bytes as you have now discovered...
And finally, the bizarre thing is your last line here does NOT have a low byte value so what is its requirement as they ARE technically ASCII characters, albeit control ones.
EDIT:
I have just noticed this 02 3d , are these 2 real ASCII characters or one _imaginary_ and one real?

I have a sneaking suspicion that the BYTES following the spaces should be 4 BYTE pointers of some description AND have become corrupted by those _EXTENDED_ characters! Hence the varying number of bytes before the numerical ?DATE? value.

Last edited by wisecracker; 07-15-2018 at 03:30 AM.. Reason: See above...
# 11  
Old 07-15-2018
My Input file contains combination of ascii/extended ascii/unprintable/double byte characters

The idea is to find these extended ascii/unprintable/doublebyte characters and provide an output file in the required format.

Is there a way we can do converse operation where all good characters are replaced with some constant value and the problem child's are left as is and from there we can do another operation to get the desired output.

Please advise
# 12  
Old 07-15-2018
By definition, a text file cannot contain NUL bytes.

If the file you're reading contains pointers or other binary values. You need to really understand the format of the data you are processing and use tools appropriate to your task. Without understanding the format of the data you're reading, all bets are off. Note that the format includes not only knowing where there are binary values in your data (if there are any), but also knowing what codeset is being used to encode characters in your file. (For example, there is obviously a big difference between extended ASCII characters encoded in ISO 8859-1 and extended ASCII character encoded in UTF-8.)
These 2 Users Gave Thanks to Don Cragun For This Post:
# 13  
Old 07-15-2018
Quote:
Originally Posted by rosebud123
. . .
Please advise

Did you consider / adapt post#6?
# 14  
Old 07-15-2018
RudiC.

I tried but for some strange reason I am receiving a syntax error

0403-057 Syntax error at line 4 : `(' is not expected.

I am on AIX.

Thank you
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removal Extended ASCII using awk

Hi All, I am trying to remove (SELECTIVE - passed as argument) Extended ASCII using Awk based on adhoc basis. Can you please let me know how to do it. I have to implement this using awk only. Thanks & Regads (14 Replies)
Discussion started by: tostay2003
14 Replies

2. Programming

How to read extended ASCII characters from stdin?

Hi, I want to read extended ASCII characters from keyboard using c language on unix/linux. How to read extended characters from keyboard or by copy-paste in terminal irrespective of locale set in the system. I want to read the input characters from keyboard, store it in an array or some local... (3 Replies)
Discussion started by: sanzee007
3 Replies

3. Shell Programming and Scripting

Search and Replace Extended Ascii Characters

We are getting extended Ascii characters in the input file and my requirement is to search and replace them with a space. I am using the following command LANG=C sed -e 's// /g' It is doing a good job, but in some cases it is replacing the extended characters with two spaces. So my input... (12 Replies)
Discussion started by: ysvsr1
12 Replies

4. Shell Programming and Scripting

Print the next ASCII character

Hi, In my file, for few field I have to print the next ASCII character for every character. In the below file, I have to do for the 2,3 and 5th fields. Input File ======== 1|abc|def|5|ghi 2|jkl|mno|6|pqr Expected Ouput file ======= 1|bcd|efg|5|hij 2|klm|nop|6|qrs (2 Replies)
Discussion started by: machomaddy
2 Replies

5. AIX

Printing extended ASCII

Hi All, I'm trying to send extended ascii characters to my HP2055 as part of PCL printer control codes. What I want to do is select a bar code font, print the bar code and reset the printer to the default font. Selecting the bar code font works good. Printing the bar code goes almost ok too. ... (5 Replies)
Discussion started by: petervg
5 Replies

6. Shell Programming and Scripting

Print lines with specific character at nth position in a file

I need to print lines with character S at nth position in a file...can someone pl help me with appropriate awk command for this (1 Reply)
Discussion started by: manaswinig
1 Replies

7. Shell Programming and Scripting

Print lines with specific character at nth position in a file

I need to print lines with character S at nth position in a file...can someone pl help me with appropriate awk command for this (2 Replies)
Discussion started by: manaswinig
2 Replies

8. UNIX for Advanced & Expert Users

Processing extended ascii character file names in UNIX (BASH scipts)

Hi, I have a accentuated letter (ö) in a script for an Installer. It's a file name. This is not working and I'm told to try using the octal value for the extended ascii character. Does anyone no how to do this? If I had the word "filförval", can I just put in the value between the letters, like... (9 Replies)
Discussion started by: peli
9 Replies

9. Shell Programming and Scripting

extended ascii problem

hi i would like to check text files if they contain extended ascii characters within or not. i really dont have any idea how to start your kind help would be very much appreciated thanks. (7 Replies)
Discussion started by: smooth
7 Replies

10. Programming

Extended ascii

Hi all, I would like to change the extended ascii code ( 128 - 255). I tried to change LC_ALL and LANG in current session ( values from locale -a) and for no good. Thanks. (0 Replies)
Discussion started by: avis
0 Replies
Login or Register to Ask a Question