Sponsored Content
Top Forums Shell Programming and Scripting Find Unicode Character in File Post 302184573 by fpmurphy on Friday 11th of April 2008 09:46:53 PM
Old 04-11-2008
"0x17" is not a Unicode (UTF-16 or UTF-32) character per se.

For those not familiar with Unicode, UTF-16 basically means that
every "character" is stored as 2 bytes whereas UTF-32 means every
"character" is stored as 4 bytes.

On a practical level, it means that most standard ASCII characters are
either preceded by or followed by either a single NUL (0x00) or 3 NULs
depending on whether data storage is Big-Endian or Little-Endian.

Which Unicode "format" is your file using?
 

10 More Discussions You Might Find Interesting

1. Programming

How to display unicode characters / unicode string

I have a stream of characters like "\u8BBE\u5907\u7BA1" and i want to display it. I tried following things already without any luck. 1) printf("%s",L("\u8BBE\u5907\u7BA1")); 2) printf("%lc",0x8BBE); 3) setlocale followed by fwide followed by wprintf 4) also changed the local manually... (3 Replies)
Discussion started by: jackdorso
3 Replies

2. UNIX for Dummies Questions & Answers

find and remove rows from file where multi occurrences of character found

I have a '~' delimited file of 6 - 7 million rows. Each row should contain 13 columns delimited by 12 ~'s. Where there are 13 tildes, the row needs to be removed. Each row contains alphanumeric data and occasionally a ~ ends up in a descriptive field and therefore acts as a delimiter, resulting in... (1 Reply)
Discussion started by: kpd
1 Replies

3. UNIX for Dummies Questions & Answers

How to find the ^M(control M) character in unix file?

can any one say about command to find "^M" (Control M)characters in a unix text file. ^M comes when a file ftped from windows to unix without using bin mode. I need the command to find lik this, ex.txt: ------------------------------ ...,name,time^M go^M ...file,end^M... (5 Replies)
Discussion started by: prsam
5 Replies

4. Solaris

An invalid XML character (Unicode: 0x1a)

While uploading an exl file to my application in Solaris 10 the upload failed with error Error! Parsing Error: /SPLM/TC83/tcdata83/model/model_dbextract.xml Line:65576 Column:73 An invalid XML character (Unicode: 0x1a) was found in the value of attribute "unitOfMeasureSymbol" and element is ... (12 Replies)
Discussion started by: karghum
12 Replies

5. Shell Programming and Scripting

How do I replace a unicode character using sed

I have a unicode character {Unicode: 0x1C} in my file and I need to replace it with a blank. How would a sed command look like? cat file1 | sed "s/(//g;" > file2 Is X28 the right value for this Unicode character?? (4 Replies)
Discussion started by: Hangman2
4 Replies

6. HP-UX

how to find the character encoding of a file in hp_ux

how to find the character encoding of a file in hp_ux (1 Reply)
Discussion started by: alokjyotibal
1 Replies

7. Shell Programming and Scripting

How to find character position in file?

how to find character positionin file? i.e string = "123X568" i want to find the position of character "X". Thanks (6 Replies)
Discussion started by: LiorAmitai
6 Replies

8. Shell Programming and Scripting

Find position of character in multiple strings in a file

Greetings. I have a file with information like this: AMNDHRKEOEU?AMNDHRKEOEU?AMNDHRKEOEU?AMNDHRKEOEU? AMNDHRKEEU?AMNDHREOEU? AMNDHREU?AHRKEOEU?AMNDHRKEU?AMNDKEOEU? What I need to extract is the position, in every line, of every occurrence of '?' A desired output would be something... (6 Replies)
Discussion started by: Twinklefingers
6 Replies

9. Shell Programming and Scripting

Find string in a file and append character

Hi Experts, Is there a way to find a string in a file then append a character to that string then save the file or save to another file. Here is an example. >cat test.txt NULL NULL NULL 9,800.00 NULL 1,234,567.01 I want to find all NON NULL String and add a dollar sign to those... (9 Replies)
Discussion started by: brichigo
9 Replies

10. Shell Programming and Scripting

Find character and Replace character for given position

Hi, i want find the character '-' in a file from position 284-298, if it occurs i need to replace it with 'O ' for the position in the file. How to do that using SED command. thanks in advance, Sara (9 Replies)
Discussion started by: Sara183
9 Replies
Tcl_UniCharIsAlpha(3)					      Tcl Library Procedures					     Tcl_UniCharIsAlpha(3)

__________________________________________________________________________________________________________________________________________________

NAME
Tcl_UniCharIsAlnum, Tcl_UniCharIsAlpha, Tcl_UniCharIsControl, Tcl_UniCharIsDigit, Tcl_UniCharIsGraph, Tcl_UniCharIsLower, Tcl_UniCharIsPrint, Tcl_UniCharIsPunct, Tcl_UniCharIsSpace, Tcl_UniCharIsUpper, Tcl_UniCharIsWordChar - routines for classification of Tcl_UniChar characters SYNOPSIS
#include <tcl.h> int Tcl_UniCharIsAlnum(ch) int Tcl_UniCharIsAlpha(ch) int Tcl_UniCharIsControl(ch) int Tcl_UniCharIsDigit(ch) int Tcl_UniCharIsGraph(ch) int Tcl_UniCharIsLower(ch) int Tcl_UniCharIsPrint(ch) int Tcl_UniCharIsPunct(ch) int Tcl_UniCharIsSpace(ch) int Tcl_UniCharIsUpper(ch) int Tcl_UniCharIsWordChar(ch) ARGUMENTS
int ch (in) The Tcl_UniChar to be examined. _________________________________________________________________ DESCRIPTION
All of the routines described examine Tcl_UniChars and return a boolean value. A non-zero return value means that the character does belong to the character class associated with the called routine. The rest of this document just describes the character classes associated with the various routines. Note: A Tcl_UniChar is a Unicode character represented as an unsigned, fixed-size quantity. CHARACTER CLASSES
Tcl_UniCharIsAlnum tests if the character is an alphanumeric Unicode character. Tcl_UniCharIsAlpha tests if the character is an alphabetic Unicode character. Tcl_UniCharIsControl tests if the character is a Unicode control character. Tcl_UniCharIsDigit tests if the character is a numeric Unicode character. Tcl_UniCharIsGraph tests if the character is any Unicode print character except space. Tcl_UniCharIsLower tests if the character is a lowercase Unicode character. Tcl_UniCharIsPrint tests if the character is a Unicode print character. Tcl_UniCharIsPunct tests if the character is a Unicode punctuation character. Tcl_UniCharIsSpace tests if the character is a whitespace Unicode character. Tcl_UniCharIsUpper tests if the character is an uppercase Unicode character. Tcl_UniCharIsWordChar tests if the character is alphanumeric or a connector punctuation mark. KEYWORDS
unicode, classification Tcl 8.1 Tcl_UniCharIsAlpha(3)
All times are GMT -4. The time now is 01:49 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy