04-11-2008
Find Unicode Character in File
I have a very large file in Unix that I would like to search for all instances of the unicode character 0x17. I need to remove these characters because the character is causing my SAX Parser to throw an exception. Does anyone know how to find a unicode character in a file?
Thank you for your assistance.
10 More Discussions You Might Find Interesting
1. Programming
I have a stream of characters like "\u8BBE\u5907\u7BA1"
and i want to display it.
I tried following things already without any luck.
1) printf("%s",L("\u8BBE\u5907\u7BA1"));
2) printf("%lc",0x8BBE);
3) setlocale followed by fwide followed by wprintf
4) also changed the local manually... (3 Replies)
Discussion started by: jackdorso
3 Replies
2. UNIX for Dummies Questions & Answers
I have a '~' delimited file of 6 - 7 million rows. Each row should contain 13 columns delimited by 12 ~'s. Where there are 13 tildes, the row needs to be removed. Each row contains alphanumeric data and occasionally a ~ ends up in a descriptive field and therefore acts as a delimiter, resulting in... (1 Reply)
Discussion started by: kpd
1 Replies
3. UNIX for Dummies Questions & Answers
can any one say about command to find "^M" (Control M)characters in a unix text file.
^M comes when a file ftped from windows to unix without using bin mode.
I need the command to find lik this,
ex.txt:
------------------------------
...,name,time^M
go^M
...file,end^M... (5 Replies)
Discussion started by: prsam
5 Replies
4. Solaris
While uploading an exl file to my application in Solaris 10 the upload failed with error Error! Parsing Error: /SPLM/TC83/tcdata83/model/model_dbextract.xml Line:65576 Column:73 An invalid XML character (Unicode: 0x1a) was found in the value of attribute "unitOfMeasureSymbol" and element is ... (12 Replies)
Discussion started by: karghum
12 Replies
5. Shell Programming and Scripting
I have a unicode character {Unicode: 0x1C} in my file and I need to replace it with a blank. How would a sed command look like?
cat file1 | sed "s/(//g;" > file2
Is X28 the right value for this Unicode character?? (4 Replies)
Discussion started by: Hangman2
4 Replies
6. HP-UX
how to find the character encoding of a file in hp_ux (1 Reply)
Discussion started by: alokjyotibal
1 Replies
7. Shell Programming and Scripting
how to find character positionin file?
i.e
string = "123X568"
i want to find the position of character "X".
Thanks (6 Replies)
Discussion started by: LiorAmitai
6 Replies
8. Shell Programming and Scripting
Greetings.
I have a file with information like this:
AMNDHRKEOEU?AMNDHRKEOEU?AMNDHRKEOEU?AMNDHRKEOEU?
AMNDHRKEEU?AMNDHREOEU?
AMNDHREU?AHRKEOEU?AMNDHRKEU?AMNDKEOEU?
What I need to extract is the position, in every line, of every occurrence of '?'
A desired output would be something... (6 Replies)
Discussion started by: Twinklefingers
6 Replies
9. Shell Programming and Scripting
Hi Experts,
Is there a way to find a string in a file then append a character to that string then save the file or save to another file.
Here is an example.
>cat test.txt
NULL
NULL
NULL
9,800.00
NULL
1,234,567.01
I want to find all NON NULL String and add a dollar sign to those... (9 Replies)
Discussion started by: brichigo
9 Replies
10. Shell Programming and Scripting
Hi,
i want find the character '-' in a file from position 284-298, if it occurs i need to replace it with 'O ' for the position in the file. How to do that using SED command.
thanks in advance,
Sara (9 Replies)
Discussion started by: Sara183
9 Replies
unicode(n) Unicode normalization unicode(n)
__________________________________________________________________________________________________________________________________________________
NAME
unicode - Implementation of Unicode normalization
SYNOPSIS
package require Tcl 8.3
package require unicode 1.0
::unicode::fromstring string
::unicode::tostring uclist
::unicode::normalize form uclist
::unicode::normalizeS form string
_________________________________________________________________
DESCRIPTION
This is an implementation in Tcl of the Unicode normalization forms.
COMMANDS
::unicode::fromstring string
Converts string to list of integer Unicode character codes which is used in unicode for internal string representation.
::unicode::tostring uclist
Converts list of integers uclist back to Tcl string.
::unicode::normalize form uclist
Normalizes Unicode characters list ulist according to form and returns the normalized list. Form form takes one of the following
values: D (canonical decomposition), C (canonical decomposition, followed by canonical composition), KD (compatibility decomposi-
tion), or KC (compatibility decomposition, followed by canonical composition).
::unicode::normalizeS form string
A shortcut to ::unicode::tostring [unicode::normalize $form [::unicode::fromstring $string]]. Normalizes Tcl string and returns
normalized string.
EXAMPLES
% ::unicode::fromstring "u0410u0411u0412u0413"
1040 1041 1042 1043
% ::unicode::tostring {49 50 51 52 53}
12345
%
% ::unicode::normalize D {7692 775}
68 803 775
% ::unicode::normalizeS KD "u1d2c"
A
%
REFERENCES
[1] "Unicode Standard Annex #15: Unicode Normalization Forms", (http://unicode.org/reports/tr15/)
AUTHORS
Sergei Golovan
BUGS, IDEAS, FEEDBACK
This document, and the package it describes, will undoubtedly contain bugs and other problems. Please report such in the category string-
prep of the Tcllib SF Trackers [http://sourceforge.net/tracker/?group_id=12883]. Please also report any ideas for enhancements you may
have for either package and/or documentation.
SEE ALSO
stringprep(n)
KEYWORDS
normalization, unicode
COPYRIGHT
Copyright (c) 2007, Sergei Golovan <sgolovan@nes.ru>
stringprep 1.0.0 unicode(n)