Search and Replace Extended Ascii Characters


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Search and Replace Extended Ascii Characters
# 8  
Old 10-31-2014
RudiC, Sample output

Code:
0000000  41  36  31  31  34  30  39  32  39  30  30  30  30  30  30  30
          A   6   1   1   4   0   9   2   9   0   0   0   0   0   0   0
0000020  30  30  34  33  30  30  30  31  30  30  35  30  38  32  37  36
          0   0   4   3   0   0   0   1   0   0   5   0   8   2   7   6
0000040  31  30  32  30  31  34  2d  30  39  2d  32  38  31  36  3a  34
          1   0   2   0   1   4   -   0   9   -   2   8   1   6   :   4
0000060  32  3a  31  34  31  30  30  33  37  34  39  30  31  30  30  35
          2   :   1   4   1   0   0   3   7   4   9   0   1   0   0   5
0000100  30  38  32  37  36  31  30  31  30  20  20  20  20  20  20  20
          0   8   2   7   6   1   0   1   0
0000120  20  20  20  20  20  20  20  20  20  20  20  20  20  34  20  20
                                                              4
0000140  20  20  20  20  20  20  20  20  20  20  20  20  20  20  20  20

*
0000200  20  20  20  20  20  20  3f  23  42  32  4e  30  39  32  38  31
                                  ?   #   B   2   N   0   9   2   8   1
0000220  34  30  52  42  51  30  32  36  44  4d  6a  4d  33  4e  6a  59
          4   0   R   B   Q   0   2   6   D   M   j   M   3   N   j   Y
0000240  31  4d  6a  63  79  41  44  4a  4b  76  4d  57  39  65  61  53
          1   M   j   c   y   A   D   J   K   v   M   W   9   e   a   S
0000260  74  37  65  71  50  7a  46  76  37  5a  59  73  52  6d  6a  61
          t   7   e   q   P   z   F   v   7   Z   Y   s   R   m   j   a
0000300  42  36  45  44  52  61  31  6c  78  4b  33  77  49  30  67  61
          B   6   E   D   R   a   1   l   x   K   3   w   I   0   g   a
0000320  76  55  79  7a  76  69  31  54  59  72  47  34  39  32  38  6a
          v   U   y   z   v   i   1   T   Y   r   G   4   9   2   8   j
0000340  71  74  47  6d  35  30  41  3d  3d  4d  54  4d  77  4f  54  63
          q   t   G   m   5   0   A   =   =   M   T   M   w   O   T   c
0000360  77  4f  54  67  32  4e  77  44  33  31  57  4d  6b  56  6c  32
          w   O   T   g   2   N   w   D   3   1   W   M   k   V   l   2
0000400  52  39  65  43  7a  7a  4e  51  71  43  54  33  51  4a  6e  62
          R   9   e   C   z   z   N   Q   q   C   T   3   Q   J   n   b
0000420  69  79  6a  73  33  4a  70  65  74  67  46  31  56  71  5a  43
          i   y   j   s   3   J   p   e   t   g   F   1   V   q   Z   C
0000440  73  38  77  3d  3d  35  31  32  31  30  37  0a
          s   8   w   =   =   5   1   2   1   0   7  \n
0000454

Code:
0000000  41  36  31  31  34  30  39  32  39  30  30  30  30  30  30  30
          A   6   1   1   4   0   9   2   9   0   0   0   0   0   0   0
0000020  30  33  32  35  30  30  30  31  30  30  35  35  32  31  31  31
          0   3   2   5   0   0   0   1   0   0   5   5   2   1   1   1
0000040  36  32  32  30  31  34  2d  30  39  2d  32  38  31  34  3a  30
          6   2   2   0   1   4   -   0   9   -   2   8   1   4   :   0
0000060  30  3a  32  30  31  30  38  32  31  35  36  30  31  30  30  35
          0   :   2   0   1   0   8   2   1   5   6   0   1   0   0   5
0000100  35  32  31  31  31  36  32  31  30  20  20  20  20  20  20  20
          5   2   1   1   1   6   2   1   0
0000120  20  20  20  20  20  20  20  20  20  20  20  20  20  34  20  20
                                                              4
0000140  20  20  20  20  20  20  20  20  20  20  20  20  20  20  20  20

*
0000200  20  20  20  20  20  20  3f  23  42  32  4e  30  39  32  38  31
                                  ?   #   B   2   N   0   9   2   8   1
0000220  34  30  52  41  53  31  39  30  44  4d  6a  4d  33  4e  6a  59
          4   0   R   A   S   1   9   0   D   M   j   M   3   N   j   Y
0000240  31  4d  6a  63  79  41  45  65  51  52  70  58  46  68  6b  37
          1   M   j   c   y   A   E   e   Q   R   p   X   F   h   k   7
0000260  41  74  38  6f  56  37  4b  46  56  66  48  41  37  66  70  6a
          A   t   8   o   V   7   K   F   V   f   H   A   7   f   p   j
0000300  4f  6b  78  32  73  4e  7a  65  37  79  63  37  4b  5a  59  43
          O   k   x   2   s   N   z   e   7   y   c   7   K   Z   Y   C
0000320  70  78  51  59  4c  73  47  5a  36  79  72  65  50  34  42  67
          p   x   Q   Y   L   s   G   Z   6   y   r   e   P   4   B   g
0000340  73  68  35  4c  4c  37  41  3d  3d  4d  54  4d  77  4f  54  63
          s   h   5   L   L   7   A   =   =   M   T   M   w   O   T   c
0000360  77  4f  54  67  32  4e  77  43  50  4a  56  46  45  7a  64  35
          w   O   T   g   2   N   w   C   P   J   V   F   E   z   d   5
0000400  61  5a  5a  50  77  64  58  2b  51  75  44  71  6a  7a  79  34
          a   Z   Z   P   w   d   X   +   Q   u   D   q   j   z   y   4
0000420  77  35  4e  77  69  39  2b  2b  6b  35  79  77  30  62  5a  45
          w   5   N   w   i   9   +   +   k   5   y   w   0   b   Z   E
0000440  45  53  77  3d  3d  35  31  32  31  30  37  0a
          E   S   w   =   =   5   1   2   1   0   7  \n
0000454

---------- Post updated at 10:29 AM ---------- Previous update was at 10:26 AM ----------

This C Program was develop some 20 years ago and it is so complex, it would take a lot of time to make the code changes test it and deploy it, Our project went live this week and i am looking for a quick and temporary solution for now.
# 9  
Old 10-31-2014
Where in your last post are the bytes under discussion?


The octal sequence 357 277 275 (hex: EF BF BD) is the three byte unicode representation of FFFD, which is (from wikipedia )

Quote:
Replacement character

Image Replacement character


The replacement character � (often a black diamond with a white question mark) is a symbol found in the Unicode standard at codepoint U+FFFD in the Specials table. It is used to indicate problems when a system is not able to render a stream of data to a correct symbol. It is most commonly seen when a font does not contain a character, but is also seen when the data is invalid and does not match any character:
Looks like it is a left over from a recent (incorrect) character set conversion?
# 10  
Old 10-31-2014
Aaagh. 2? That means that your assumption about fixed length is not quite right.

Or - there are several flavors of records like HEADER DATA TRAILER and HEADER and DATA have an extra byte.

Or - the file layout is broken.

Your sed cannot ever fix something that is already broken. I do not get how this was pushed into production with a data flaw like that. It should have broken things in earlier testing. Assuming testing went well, I would look to see that everything that was pushed and tested as good matches exactly what is in PROD.

BTW - junk like this usually originates in C code where somebody does something to cause a trailing NUL to be overwritten or none put there to start with. Example: memcpy rather than strcpy. It starts with the questionable practice of not initializing C strings.

The junk comes from what was on the stack earlier.

Why do I say all this? I do not know for sure, but I believe you are going to have to run your C code in a debugger, locate the problem, and fix it.

This has now gotten past a trivial sed one-liner. Or anything we can fix by remote control for you. Maybe someone else here has a better idea. I hope.
# 11  
Old 10-31-2014
Maybe the LANG=C setting is not the best? What locale do the files come from?
# 12  
Old 10-31-2014
Why don't you post your C code here if it ain't too long and maybe some forumite can locate the problem and suggest a fix...
# 13  
Old 10-31-2014
RudiC - I think it is just bad C code leaving stack detritus in a string variable.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Convert UTF-8 file to ASCII/ISO8859-1 OR replace characters

I am trying to develop a script which will work on a source UTF-8 file and perform one or more of the following It will accept the target encoding as an argument e.g. US-ASCII or ISO-8859-1, etc 1. It should replace all occurrences of characters outside target character set by " " (space) or... (3 Replies)
Discussion started by: hemkiran.s
3 Replies

2. Shell Programming and Scripting

Extended ASCII Characters keep on getting reintroduced to text files

I am working with a log file that I am trying to clean up by removing non-English ASCII characters. I am using Bash via Cygwin on Windows. Before I start I set: export LC_ALL=C I clean it up by removing all non-English ASCII characters with the following command; grep -v $''... (4 Replies)
Discussion started by: lewk
4 Replies

3. Shell Programming and Scripting

Removal Extended ASCII using awk

Hi All, I am trying to remove (SELECTIVE - passed as argument) Extended ASCII using Awk based on adhoc basis. Can you please let me know how to do it. I have to implement this using awk only. Thanks & Regads (14 Replies)
Discussion started by: tostay2003
14 Replies

4. Programming

How to read extended ASCII characters from stdin?

Hi, I want to read extended ASCII characters from keyboard using c language on unix/linux. How to read extended characters from keyboard or by copy-paste in terminal irrespective of locale set in the system. I want to read the input characters from keyboard, store it in an array or some local... (3 Replies)
Discussion started by: sanzee007
3 Replies

5. Shell Programming and Scripting

Identify extended ascii characters in a file

Hi, Is there a way to identify the lines in a file having extended ascii characters and display the same? For instance I have a file abc.txt having below data aaa|bbb|111|This is first line aaa|bbb|222|This is secõnd line aaa|bbb|333|This is third line aaa|bbb|444|This is foùrth line... (3 Replies)
Discussion started by: decci_7
3 Replies

6. AIX

Printing extended ASCII

Hi All, I'm trying to send extended ascii characters to my HP2055 as part of PCL printer control codes. What I want to do is select a bar code font, print the bar code and reset the printer to the default font. Selecting the bar code font works good. Printing the bar code goes almost ok too. ... (5 Replies)
Discussion started by: petervg
5 Replies

7. Shell Programming and Scripting

extended ascii problem

hi i would like to check text files if they contain extended ascii characters within or not. i really dont have any idea how to start your kind help would be very much appreciated thanks. (7 Replies)
Discussion started by: smooth
7 Replies

8. Shell Programming and Scripting

Replace characters in a string using their ascii value

Hi All, In the HP Unix that i'm using when i initialise a string as Stalled="'30¬G'" Stalled=$Stalled" '30¬C'", it is taking the character ¬ as a comma. I need to grep for 30¬G 30¬C in a file and take its count. But since this character ¬ is not being understood, the count returns a zero. The... (2 Replies)
Discussion started by: roops
2 Replies

9. Programming

Extended ascii

Hi all, I would like to change the extended ascii code ( 128 - 255). I tried to change LC_ALL and LANG in current session ( values from locale -a) and for no good. Thanks. (0 Replies)
Discussion started by: avis
0 Replies

10. UNIX for Dummies Questions & Answers

search and replace in ASCII file

Greetings.... I'm looking for the command and syntax to search files, several actually, that will find the string pattern "\0;" and delete it. I have over 200 files to change :o Thanx (2 Replies)
Discussion started by: karpolu
2 Replies
Login or Register to Ask a Question