The UNIX and Linux Forums  


Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Advanced & Expert Users
.
google unix.com



UNIX for Advanced & Expert Users Expert-to-Expert. Learn advanced UNIX, UNIX commands, Linux, Operating Systems, System Administration, Programming, Shell, Shell Scripts, Solaris, Linux, HP-UX, AIX, OS X, BSD.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Extract the last character of a string annelisa Shell Programming and Scripting 8 07-05-2008 12:57 AM
How to extract first column with a specific character selamba_warrior Shell Programming and Scripting 3 05-22-2008 06:14 AM
extract character + 1 francis_tom Shell Programming and Scripting 1 04-21-2008 01:16 PM
grep or awk problem, unable to extract numbers baghera Shell Programming and Scripting 7 08-31-2007 05:42 PM
Extract a character aajan UNIX for Advanced & Expert Users 10 08-20-2007 11:03 AM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 08-31-2008
cosec cosec is offline
Registered User
  
 

Join Date: Sep 2007
Posts: 12
unable to extract Trademark(™) Character

Hello All,

I am trying to extract a trademark character (™) from a varchar column in a DB2 Table. The result is to be placed in a sequential file in an AIX environment.

After the Extraction is complete when I view the extracted file I noticed that in place of the (™) Character another highlighted character SUB has been placed.

It would be great if anyone can shed some light as to why it does not display the (™) Character. Could it be because the AIX character set does not allow this special character ?

Would appreciate your Advice. Thank You
  #2 (permalink)  
Old 08-31-2008
Annihilannic Annihilannic is offline Forum Advisor  
  
 

Join Date: May 2008
Location: Sydney, Australia
Posts: 1,009
Quote:
Originally Posted by cosec View Post
Could it be because the AIX character set does not allow this special character ?
Yes. However if you were to load the output file in a similar environment to the one where you were viewing the data originally you should still see the TM character, as at a binary level it should be unchanged.
  #3 (permalink)  
Old 09-01-2008
era era is offline Forum Advisor  
Herder of Useless Cats (On Sabbatical)
  
 

Join Date: Mar 2008
Location: /there/is/only/bin/sh
Posts: 3,652
There is no such thing as an "AIX character set". There are bytes in the file, and there is your terminal, and there are multiple conventions for how to display the bytes in the file on any particular terminal. If you know the character set encoding of the file, and the character repertoire of your terminal, you can predict how any particular byte sequence will be displayed, but if one or the other is unknown, it's pretty hard to say what you should expect (or even indeed what you are talking about).

Plain 7-bit data is usually displayed as ASCII, which is completely well-defined, but the (tm) character is not part of the 7-bit ASCII character set; you are apparently viewing the file under two different interpretations of the character-set encoding in the file, perhaps using two different terminals, or different tools which impose different assumptions. (On AIX perhaps you have the option to add EBCDIC into the mix, but let's not go there.)

Anyway, to troubleshoot this, you might want to use a hex dump tool (od, xxd, hexdump, or even just cat -A) to inspect what the actual bytes in the file are. Once you know that, it should not be hard to figure out which encoding gives the interpretation you want, and/or convert the file to the representation you want.
  #4 (permalink)  
Old 09-01-2008
cosec cosec is offline
Registered User
  
 

Join Date: Sep 2007
Posts: 12
Thanks For the response..I found out that the encoding character set used is ISO8859-1 and does not have the Trademark sign. The Trademark character is a reserved word and there displayed as SUB.

Is it possible to convert the file to another with file with a different encoding character set and be able to display the Trademark character ?

If so, how could it be done via unix ?

Thanks
  #5 (permalink)  
Old 09-01-2008
Annihilannic Annihilannic is offline Forum Advisor  
  
 

Join Date: May 2008
Location: Sydney, Australia
Posts: 1,009
Where do you need to display the Trademark character? In an AIX terminal session of some kind? Or in some application or client that connects to the server? Or on some other system that the data will eventually be transferred to?
  #6 (permalink)  
Old 09-02-2008
era era is offline Forum Advisor  
Herder of Useless Cats (On Sabbatical)
  
 

Join Date: Mar 2008
Location: /there/is/only/bin/sh
Posts: 3,652
The big question is in which character set you do see the trademark sign, or, which ISO-8859-1 character are you seeing as a SUB character (whatever that is?)

In ASCII there is a control character SUB (ctrl-Z) which has the character code 26 decimal (octal 032, hex 0x1A) -- is that what you have in your file? What would be a useful encoding to transfer it to? The following will translate all occurrences of this character code into the Unicode trade mark symbol character U+2122 in the UTF-8 encoding:

Code:
perl -pe 's/\x1A/\xE2\x84\xA2/g' file.orig > file.utf8
Or in ISO-8859-1, there is the Registered sign ® at code point 0xAE, would that be a useful substitute?

Code:
perl -pe 's/\x1A/\xAE/g' file.orig > file.iso-8859-1
This assumes that the SUB character really is character code 0x1A; if it's not, but you can find out what it is instead, it should be trivial to adapt either of these one-liners to something which works for you. Some Windows code pages have the trademark symbol at 0x99 so that might be a thing to try if 0x1A doesn't work for you (but again, if you can look at the raw bytes in the file, you don't have to guess).

Last edited by era; 09-02-2008 at 02:59 AM.. Reason: Add ISO8859-1 ® substitution; remark on Windows 0x99 character
Closed Thread

Bookmarks

Tags
character set, encoding

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 10:32 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0