Sponsored Content
Top Forums UNIX for Advanced & Expert Users unable to extract Trademark(™) Character Post 302231243 by era on Tuesday 2nd of September 2008 01:50:32 AM
Old 09-02-2008
The big question is in which character set you do see the trademark sign, or, which ISO-8859-1 character are you seeing as a SUB character (whatever that is?)

In ASCII there is a control character SUB (ctrl-Z) which has the character code 26 decimal (octal 032, hex 0x1A) -- is that what you have in your file? What would be a useful encoding to transfer it to? The following will translate all occurrences of this character code into the Unicode trade mark symbol character U+2122 in the UTF-8 encoding:

Code:
perl -pe 's/\x1A/\xE2\x84\xA2/g' file.orig > file.utf8

Or in ISO-8859-1, there is the Registered sign ® at code point 0xAE, would that be a useful substitute?

Code:
perl -pe 's/\x1A/\xAE/g' file.orig > file.iso-8859-1

This assumes that the SUB character really is character code 0x1A; if it's not, but you can find out what it is instead, it should be trivial to adapt either of these one-liners to something which works for you. Some Windows code pages have the trademark symbol at 0x99 so that might be a thing to try if 0x1A doesn't work for you (but again, if you can look at the raw bytes in the file, you don't have to guess).

Last edited by era; 09-02-2008 at 02:59 AM.. Reason: Add ISO8859-1 ® substitution; remark on Windows 0x99 character
 

7 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Extract a character

HI Guys, Have a Doubt...... I have a pattern "abcdef" and i need to extract the third character..ie(c) How to achieve it? (10 Replies)
Discussion started by: aajan
10 Replies

2. Shell Programming and Scripting

extract character + 1

Hi, I would like extract from a file a character or pattern after ( n + 1) a specific pattern (n) . ( i supposed with awk) how could i do ? Thanks in advance. (1 Reply)
Discussion started by: francis_tom
1 Replies

3. Shell Programming and Scripting

Unable to read special character from the file

Hello All, We are getting files from sftp server through file transmission protocol & after transmission we are removing all the control M (^M) characters from them.we are expecting various kind of special characters in the files. we are tried removing '^M' characters through 'dos2unix' command... (2 Replies)
Discussion started by: Aquilis
2 Replies

4. UNIX for Dummies Questions & Answers

Unable to extract bz2-00 file

I have a file having name smthing.tar.bz2-00 I am trying to unzip it using tar -xvjf "name" But I am not able to unzip it. PLease advise for the issue. (3 Replies)
Discussion started by: nixhead
3 Replies

5. Programming

Unable to assign zero to unsigned character array

Hi, I am unable to assign value zero to my variable which is defined as unsigned char. typedef struct ABCD { unsigned char abc; unsigned char def; unsigned char ghi; } ABCD; typedef ABCD *PABCD; In my Por*C code, i assign the values using memcpy like below ... (3 Replies)
Discussion started by: gthangav
3 Replies

6. Red Hat

Unable to extract .gz file using gunzip

Linux 3.8.13-16.2.1.el6uek.x86_64 #1 SMP Thu Nov 7 17:01:44 PST 2013 x86_64 x86_64 x86_64 GNU/Linux Hi all, I am unable to extract .gz file using gunzip Used the following command to create the .gz file: nohup tar -cvpf - 11.2.0.4 | gzip -c >... (3 Replies)
Discussion started by: a1_win
3 Replies

7. UNIX for Beginners Questions & Answers

Unable to create new user without lower character

Hi friends, I want to create new user BG0001 in SunOS and getting below errorbash-3.2# useradd -d /home/BG0001 -m -s /bin/sh BG0001 UX: useradd: BG0001 name should have at least one lower case character. bash-3.2# bash-3.2# OS version is as belowbash-3.2# cat /etc/release ... (7 Replies)
Discussion started by: sandeepkmehra
7 Replies
TCS(1)							      General Commands Manual							    TCS(1)

NAME
tcs - translate character sets SYNOPSIS
tcs [ -slcv ] [ -f ics ] [ -t ocs ] [ file ... ] DESCRIPTION
Tcs interprets the named file(s) (standard input default) as a stream of characters from the ics character set or format, converts them to runes, and then converts them into a stream of characters from the ocs character set or format on the standard output. The default value for ics and ocs is utf, the UTF encoding described in utf(6). The -l option lists the character sets known to tcs. Processing continues in the face of conversion errors (the -s option prevents reporting of these errors). The -c option forces the output to contain only cor- rectly converted characters; otherwise, 0x80 characters will be substituted for UTF encoding errors and 0xFFFD characters will substituted for unknown characters. The -v option generates various diagnostic and summary information on standard error, or makes the -l output more verbose. Tcs recognizes an ever changing list of character sets. In particular, it supports a variety of Russian and Japanese encodings. Some of the supported encodings are utf The Plan 9 UTF encoding, known by ISO as UTF-8 utf1 The deprecated original UTF encoding from ISO 10646 ascii 7-bit ASCII 8859-1 Latin-1 (Central European) 8859-2 Latin-2 (Czech .. Slovak) 8859-3 Latin-3 (Dutch .. Turkish) 8859-4 Latin-4 (Scandinavian) 8859-5 Part 5 (Cyrillic) 8859-6 Part 6 (Arabic) 8859-7 Part 7 (Greek) 8859-8 Part 8 (Hebrew) 8859-9 Latin-5 (Finnish .. Portuguese) koi8 KOI-8 (GOST 19769-74) jis-kanji ISO 2022-JP ujis EUC-JX: JIS 0208 ms-kanji Microsoft, or Shift-JIS jis (from only) guesses between ISO 2022-JP, EUC or Shift-Jis gb Chinese national standard (GB2312-80) big5 Big 5 (HKU version) unicode Unicode Standard 1.0 tis Thai character set plus ASCII (TIS 620-1986) msdos IBM PC: CP 437 atari Atari-ST character set EXAMPLES
tcs -f 8859-1 Convert 8859-1 (Latin-1) characters into UTF format. tcs -s -f jis Convert characters encoded in one of several shift JIS encodings into UTF format. Unknown Kanji will be converted into 0xFFFD char- acters. tcs -lv Print an up to date list of the supported character sets. SOURCE
/sys/src/cmd/tcs SEE ALSO
ascii(1), rune(2), utf(6). TCS(1)
All times are GMT -4. The time now is 06:17 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy