Sponsored Content
Top Forums UNIX for Dummies Questions & Answers grep and UNICODE (utf-16) file Post 302101553 by matrixmadhan on Wednesday 3rd of January 2007 02:35:43 AM
Old 01-03-2007
i think utf (8/16) format files doenst have the default end-of-line identifier in such a case, the usual tools applied to other text files cannot be used with.

Try running a wc -l on the file and post the output of number of lines, for pure utf formatted files it would return a zero; for such situations customized codes need to be written.

Smilie
 

9 More Discussions You Might Find Interesting

1. Programming

How to display unicode characters / unicode string

I have a stream of characters like "\u8BBE\u5907\u7BA1" and i want to display it. I tried following things already without any luck. 1) printf("%s",L("\u8BBE\u5907\u7BA1")); 2) printf("%lc",0x8BBE); 3) setlocale followed by fwide followed by wprintf 4) also changed the local manually... (3 Replies)
Discussion started by: jackdorso
3 Replies

2. Shell Programming and Scripting

Help with Converting UTF-8 data to Unicode

How can I get an error when converting 3rd line, since it has invalid characters abcde a®cdée a�cd� Unicode for ® = ® é = é I used "iconv -f UTF-8 -t ISO-8859-15 in.txt > out.txt" (2 Replies)
Discussion started by: arunbs
2 Replies

3. Shell Programming and Scripting

Unicode file validation

I don't want HTML_CONTENT,RICH_CONTENT,TEXT_CONTENT columns data in the file and reset of data we need to extract. Find the attached file. Need to extract date in between DI_UX_ROW_END tag. Can help me using unix command using AWK. Thanks, (2 Replies)
Discussion started by: bmk
2 Replies

4. UNIX for Dummies Questions & Answers

Issue with UTF-8 BOM character in text file

Sometimes we recieve some excel files containing French/Japanese characters over the mail, and these files are manually transferred to the server by using SFTP (security is not a huge concern here). The data is changed to text format before transferring it using Notepad. Problem is: When saving... (4 Replies)
Discussion started by: jawsnnn
4 Replies

5. UNIX for Advanced & Expert Users

[ask]unicode utf-8 for arabic font

hlow all, i want to read arabic font in cli (cat, vi ,etc) in windows i can see the for why in linux i can't see that. this for the example وَمَنْ يَشْكُرْ فَإِنَّمَا يَشْكُرُ لِنَفْسِهِ what should i do ? i need your advice for read that font in cli...:confused: thx before (0 Replies)
Discussion started by: zvtral
0 Replies

6. Linux

Help to Convert file from UNIX UTF-8 to Windows UTF-16

Hi, I have tried to convert a UTF-8 file to windows UTF-16 format file as below from unix machine unix2dos < testing.txt | iconv -f UTF-8 -t UTF-16 > out.txt and i am getting some chinese characters as below which l opened the converted file on windows machine. LANG=en_US.UTF-8... (3 Replies)
Discussion started by: phanidhar6039
3 Replies

7. Shell Programming and Scripting

Copying a file with UTF char on UNIX server

Hi, I need to run a SQL which check for special UTF char in DB. When I try to copy that in UNIX file it changes it to some wierd chat. How can in retain the UTF chars in my script? e.g. ο|π|ρ|σ|τ|υ|φ|χ|ψ Any help will be appriciated. Thanks, (14 Replies)
Discussion started by: varun22486
14 Replies

8. Shell Programming and Scripting

Convert UTF-8 file to ASCII/ISO8859-1 OR replace characters

I am trying to develop a script which will work on a source UTF-8 file and perform one or more of the following It will accept the target encoding as an argument e.g. US-ASCII or ISO-8859-1, etc 1. It should replace all occurrences of characters outside target character set by " " (space) or... (3 Replies)
Discussion started by: hemkiran.s
3 Replies

9. Shell Programming and Scripting

Create .nfo file in ISO-8859-1 or UTF-8

Hey guys, I have a little problem, Let's say I create this script : #!/bin/sh nfo_file="/home/admin/info.nfo" echo "▒▒█ Hello █▒▒" > $nfo_fileIt seems to be okay : cat /home/admin/info.nfo ▒▒█ Hello █▒▒file -bi /home/admin/info.nfo text/plain; charset=utf-8But when I open it in a... (7 Replies)
Discussion started by: antoinelomb
7 Replies
db2x_manxml(1)							     docbook2X							    db2x_manxml(1)

NAME
db2x_manxml - Make man pages from Man-XML SYNOPSIS
db2x_manxml [options] [xml-document] DESCRIPTION
db2x_manxml converts a Man-XML document into one or more man pages. They are written in the current directory. If xml-document is not given, then the document to convert is read from standard input. OPTIONS
--encoding=encoding Select the character encoding used for the output files. The available encodings are those of iconv(1). The default encoding is us-ascii. The XML source may contain characters that are not representable in the encoding that you select; in this case the program will bomb out during processing, and you should choose another encoding. (This is guaranteed not to happen with any Unicode encoding such as UTF-8, but unfortunately not everyone is able to process Unicode texts.) If you are using GNU's version of iconv(1), you can affix //TRANSLIT to the end of the encoding name to attempt transliterations of any unconvertible characters in the output. Beware, however, that the really inconvertible characters will be turned into another of those damned question marks. (Aren't you sick of this?) The suffix //TRANSLIT applied to a Unicode encoding -- in particular, utf-8//TRANSLIT -- means that the output files are to remain in Unicode, but markup-level character translations using utf8trans are still to be done. So in most cases, an English-language doc- ument, converted using --encoding=utf-8//TRANSLIT will actually end up as a US-ASCII document, but any untranslatable characters will remain as UTF-8 without any warning whatsoever. (Note: strictly speaking this is not "transliteration".) This method of con- version is a compromise over strict --encoding=us-ascii processing, which aborts if any untranslatable characters are encountered. Note that man pages and Texinfo documents in non-ASCII encodings (including UTF-8) may not be portable to older (non-international- ized) systems, which is why the default value for this option is us-ascii. To suppress any automatic character mapping or encoding conversion whatsoever, pass the option --encoding=utf-8. --list-files Write a list of all the output files to standard output, in addition to normal processing. --output-dir=dir Specify the directory where the output files are placed. The default is the current working directory. This option is ignored if the output is to be written to standard output (triggered by the option --to-stdout). --to-stdout Write the output to standard output instead of to individual files. If this option is used even when there are supposed to be multiple output documents, then everything is concatenated to standard output. But beware that most other programs will not accept this concatenated output. This option is incompatible with --list-files, obviously. --help Show brief usage information and exit. --version Show version and exit. Some man pages may be referenced under two or more names, instead of just one. For example, strcpy(3) and strncpy(3) often point to the same man page which describes the two functions together. Choose one of the following options to select how such man pages are to be gen- erated: --symlinks For each of all the alternate names for a man page, erect symbolic links to the file that contains the real man page content. --solinks Generate stub pages (using .so roff requests) for the alternate names, pointing them to the real man page content. --no-links Do not make any alternative names available. The man page can only be referenced under its principal name. This program uses certain other programs for its operation. If they are not in their default installed locations, then use the following options to set their location: --utf8trans-program=path, --utf8trans-map=charmap Use the character map charmap with the utf8trans(1) program, included with docbook2X, found under path. --iconv-program=path The location of the iconv(1) program, used for encoding conversions. NOTES
The man pages produced should be compatible with most troff implementations and other tools that process man pages. Some backwards-compat- ible groff(1) extensions are used to make the output look nicer. AUTHOR
Steve Cheng <stevecheng@users.sourceforge.net>. SEE ALSO
The docbook2X manual (in Texinfo or HTML format) fully describes how to convert DocBook to man pages and Texinfo. Up-to-date information about this program can be found at the docbook2X Web site <http://docbook2x.sourceforge.net/> . The input to db2x_manxml is defined by the XML DTD present at dtd/Man-XML in the docbook2X distribution. docbook2X 0.8.8 3 March 2007 db2x_manxml(1)
All times are GMT -4. The time now is 04:00 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy