04-11-2008
Find Unicode Character in File
I have a very large file in Unix that I would like to search for all instances of the unicode character 0x17. I need to remove these characters because the character is causing my SAX Parser to throw an exception. Does anyone know how to find a unicode character in a file?
Thank you for your assistance.
10 More Discussions You Might Find Interesting
1. Programming
I have a stream of characters like "\u8BBE\u5907\u7BA1"
and i want to display it.
I tried following things already without any luck.
1) printf("%s",L("\u8BBE\u5907\u7BA1"));
2) printf("%lc",0x8BBE);
3) setlocale followed by fwide followed by wprintf
4) also changed the local manually... (3 Replies)
Discussion started by: jackdorso
3 Replies
2. UNIX for Dummies Questions & Answers
I have a '~' delimited file of 6 - 7 million rows. Each row should contain 13 columns delimited by 12 ~'s. Where there are 13 tildes, the row needs to be removed. Each row contains alphanumeric data and occasionally a ~ ends up in a descriptive field and therefore acts as a delimiter, resulting in... (1 Reply)
Discussion started by: kpd
1 Replies
3. UNIX for Dummies Questions & Answers
can any one say about command to find "^M" (Control M)characters in a unix text file.
^M comes when a file ftped from windows to unix without using bin mode.
I need the command to find lik this,
ex.txt:
------------------------------
...,name,time^M
go^M
...file,end^M... (5 Replies)
Discussion started by: prsam
5 Replies
4. Solaris
While uploading an exl file to my application in Solaris 10 the upload failed with error Error! Parsing Error: /SPLM/TC83/tcdata83/model/model_dbextract.xml Line:65576 Column:73 An invalid XML character (Unicode: 0x1a) was found in the value of attribute "unitOfMeasureSymbol" and element is ... (12 Replies)
Discussion started by: karghum
12 Replies
5. Shell Programming and Scripting
I have a unicode character {Unicode: 0x1C} in my file and I need to replace it with a blank. How would a sed command look like?
cat file1 | sed "s/(//g;" > file2
Is X28 the right value for this Unicode character?? (4 Replies)
Discussion started by: Hangman2
4 Replies
6. HP-UX
how to find the character encoding of a file in hp_ux (1 Reply)
Discussion started by: alokjyotibal
1 Replies
7. Shell Programming and Scripting
how to find character positionin file?
i.e
string = "123X568"
i want to find the position of character "X".
Thanks (6 Replies)
Discussion started by: LiorAmitai
6 Replies
8. Shell Programming and Scripting
Greetings.
I have a file with information like this:
AMNDHRKEOEU?AMNDHRKEOEU?AMNDHRKEOEU?AMNDHRKEOEU?
AMNDHRKEEU?AMNDHREOEU?
AMNDHREU?AHRKEOEU?AMNDHRKEU?AMNDKEOEU?
What I need to extract is the position, in every line, of every occurrence of '?'
A desired output would be something... (6 Replies)
Discussion started by: Twinklefingers
6 Replies
9. Shell Programming and Scripting
Hi Experts,
Is there a way to find a string in a file then append a character to that string then save the file or save to another file.
Here is an example.
>cat test.txt
NULL
NULL
NULL
9,800.00
NULL
1,234,567.01
I want to find all NON NULL String and add a dollar sign to those... (9 Replies)
Discussion started by: brichigo
9 Replies
10. Shell Programming and Scripting
Hi,
i want find the character '-' in a file from position 284-298, if it occurs i need to replace it with 'O ' for the position in the file. How to do that using SED command.
thanks in advance,
Sara (9 Replies)
Discussion started by: Sara183
9 Replies
LEARN ABOUT SUSE
getunimap
GETUNIMAP(8) Linux GETUNIMAP(8)
NAME
getunimap - dump the unicode map for the current console to stdout
SYNOPSIS
getunimap [ -s ] [ -C console ]
DESCRIPTION
The getunimap program is old and obsolete. It is now part of setfont (1).
The getunimap program outputs the unicode map (also called a "Screen Font Map") for the current console to standard output.
The -C option may be used with Linux 2.6.1 and later to get the map for a console different from the current one. Its argument is a path-
name.
The output of getunimap is of the form
0xAA U+1234 # comment
where 0xAA is the font character code and U+1234 is a unicode character, that if displayed, will be displayed using glyph 0xAA in the font.
Many unicode characters may be mapped to the same glyph.
the Hash symbol # is used as a comment delimiter; characters after a hash sign (to the end of the line) are comments.
The -s option will sort and merge elements, sorting on font character. Hence, it will produce output of the form:
0x22 U+1234 U+5678 U+3456
0x23 U+0023
etc., listing the multiple unicode characters that map to a font glyph.
The output of getunimap is of the form accepted by setfont and psfaddtable
SEE ALSO
psfaddtable(1), setfont(1).
Console Tools 2004-01-01 GETUNIMAP(8)