Find Unicode Character in File Post: 302184562

10 More Discussions You Might Find Interesting

1. Programming

How to display unicode characters / unicode string

I have a stream of characters like "\u8BBE\u5907\u7BA1" and i want to display it. I tried following things already without any luck. 1) printf("%s",L("\u8BBE\u5907\u7BA1")); 2) printf("%lc",0x8BBE); 3) setlocale followed by fwide followed by wprintf 4) also changed the local manually...

2. UNIX for Dummies Questions & Answers

find and remove rows from file where multi occurrences of character found

I have a '~' delimited file of 6 - 7 million rows. Each row should contain 13 columns delimited by 12 ~'s. Where there are 13 tildes, the row needs to be removed. Each row contains alphanumeric data and occasionally a ~ ends up in a descriptive field and therefore acts as a delimiter, resulting in...

3. UNIX for Dummies Questions & Answers

How to find the ^M(control M) character in unix file?

can any one say about command to find "^M" (Control M)characters in a unix text file. ^M comes when a file ftped from windows to unix without using bin mode. I need the command to find lik this, ex.txt: ------------------------------ ...,name,time^M go^M ...file,end^M...

4. Solaris

An invalid XML character (Unicode: 0x1a)

While uploading an exl file to my application in Solaris 10 the upload failed with error Error! Parsing Error: /SPLM/TC83/tcdata83/model/model_dbextract.xml Line:65576 Column:73 An invalid XML character (Unicode: 0x1a) was found in the value of attribute "unitOfMeasureSymbol" and element is ...

5. Shell Programming and Scripting

How do I replace a unicode character using sed

I have a unicode character {Unicode: 0x1C} in my file and I need to replace it with a blank. How would a sed command look like? cat file1 | sed "s/(//g;" > file2 Is X28 the right value for this Unicode character??

6. HP-UX

how to find the character encoding of a file in hp_ux

7. Shell Programming and Scripting

How to find character position in file?

how to find character positionin file? i.e string = "123X568" i want to find the position of character "X". Thanks

8. Shell Programming and Scripting

Find position of character in multiple strings in a file

Greetings. I have a file with information like this: AMNDHRKEOEU?AMNDHRKEOEU?AMNDHRKEOEU?AMNDHRKEOEU? AMNDHRKEEU?AMNDHREOEU? AMNDHREU?AHRKEOEU?AMNDHRKEU?AMNDKEOEU? What I need to extract is the position, in every line, of every occurrence of '?' A desired output would be something...

9. Shell Programming and Scripting

Find string in a file and append character

Hi Experts, Is there a way to find a string in a file then append a character to that string then save the file or save to another file. Here is an example. >cat test.txt NULL NULL NULL 9,800.00 NULL 1,234,567.01 I want to find all NON NULL String and add a dollar sign to those...

10. Shell Programming and Scripting

Find character and Replace character for given position

Hi, i want find the character '-' in a file from position 284-298, if it occurs i need to replace it with 'O ' for the position in the file. How to do that using SED command. thanks in advance, Sara

LEARN ABOUT OSX

unicode

unicode(n)						       Unicode normalization							unicode(n)

__________________________________________________________________________________________________________________________________________________

NAME

       unicode - Implementation of Unicode normalization

SYNOPSIS

       package require Tcl  8.3

       package require unicode	1.0

       ::unicode::fromstring string

       ::unicode::tostring uclist

       ::unicode::normalize form uclist

       ::unicode::normalizeS form string

_________________________________________________________________

DESCRIPTION

       This is an implementation in Tcl of the Unicode normalization forms.

COMMANDS

       ::unicode::fromstring string
	      Converts string to list of integer Unicode character codes which is used in unicode for internal string representation.

       ::unicode::tostring uclist
	      Converts list of integers uclist back to Tcl string.

       ::unicode::normalize form uclist
	      Normalizes  Unicode  characters  list  ulist according to form and returns the normalized list. Form form takes one of the following
	      values: D (canonical decomposition), C (canonical decomposition, followed by canonical composition),  KD	(compatibility	decomposi-
	      tion), or KC (compatibility decomposition, followed by canonical composition).

       ::unicode::normalizeS form string
	      A  shortcut  to ::unicode::tostring [unicode::normalize $form [::unicode::fromstring $string]].  Normalizes Tcl string and returns
	      normalized string.

EXAMPLES

       % ::unicode::fromstring "u0410u0411u0412u0413"
       1040 1041 1042 1043
       % ::unicode::tostring {49 50 51 52 53}
       12345
       %

       % ::unicode::normalize D {7692 775}
       68 803 775
       % ::unicode::normalizeS KD "u1d2c"
       A
       %

REFERENCES

       [1]    "Unicode Standard Annex #15: Unicode Normalization Forms", (http://unicode.org/reports/tr15/)

AUTHORS

       Sergei Golovan

BUGS, IDEAS, FEEDBACK
       This document, and the package it describes, will undoubtedly contain bugs and other problems.  Please report such in the category  string-
       prep  of  the  Tcllib  SF Trackers [http://sourceforge.net/tracker/?group_id=12883].  Please also report any ideas for enhancements you may
       have for either package and/or documentation.

SEE ALSO

       stringprep(n)

KEYWORDS

       normalization, unicode

COPYRIGHT

       Copyright (c) 2007, Sergei Golovan <sgolovan@nes.ru>

stringprep							       1.0.0								unicode(n)

10 More Discussions You Might Find Interesting

1. Programming

How to display unicode characters / unicode string

Discussion started by: jackdorso

2. UNIX for Dummies Questions & Answers

find and remove rows from file where multi occurrences of character found

Discussion started by: kpd

3. UNIX for Dummies Questions & Answers

How to find the ^M(control M) character in unix file?

Discussion started by: prsam

4. Solaris

An invalid XML character (Unicode: 0x1a)

Discussion started by: karghum