The UNIX and Linux Forums  


Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Advanced & Expert Users
.
google unix.com




View Single Post in the UNIX and Linux Forums - Click on the Thread or Permalink to View Entire Thread -->
  #6 (permalink)  
Old 09-02-2008
era era is offline Forum Advisor  
Herder of Useless Cats (On Sabbatical)
  
 

Join Date: Mar 2008
Location: /there/is/only/bin/sh
Posts: 3,652
The big question is in which character set you do see the trademark sign, or, which ISO-8859-1 character are you seeing as a SUB character (whatever that is?)

In ASCII there is a control character SUB (ctrl-Z) which has the character code 26 decimal (octal 032, hex 0x1A) -- is that what you have in your file? What would be a useful encoding to transfer it to? The following will translate all occurrences of this character code into the Unicode trade mark symbol character U+2122 in the UTF-8 encoding:

Code:
perl -pe 's/\x1A/\xE2\x84\xA2/g' file.orig > file.utf8
Or in ISO-8859-1, there is the Registered sign ® at code point 0xAE, would that be a useful substitute?

Code:
perl -pe 's/\x1A/\xAE/g' file.orig > file.iso-8859-1
This assumes that the SUB character really is character code 0x1A; if it's not, but you can find out what it is instead, it should be trivial to adapt either of these one-liners to something which works for you. Some Windows code pages have the trademark symbol at 0x99 so that might be a thing to try if 0x1A doesn't work for you (but again, if you can look at the raw bytes in the file, you don't have to guess).

Last edited by era; 09-02-2008 at 02:59 AM.. Reason: Add ISO8859-1 ® substitution; remark on Windows 0x99 character