Sponsored Content
Full Discussion: Grep MS Word document
Top Forums Shell Programming and Scripting Grep MS Word document Post 302255433 by mkris on Thursday 6th of November 2008 11:34:35 AM
Old 11-06-2008
Thank you for your time but It didn't work,I am getting invalid codeset error when I issue the follwoing command.

iconv -f UTF-16 -t UTF-8 filename > tempfilename

Error
iconv: Invalid codeset: UTF-8: The system cannot find the file specified.
iconv: Invalid codeset: UTF-16: The system cannot find the file specified.

when I issue iconv -l I am getting the follwing code set

Character sets: ISO8859-1:1987 8859 ISO8859-1 ISO8859-2 ISO8859-3 ISO8859-4 ISO8
859-5 ISO8859-6 ISO8859-7 ISO8859-8 ISO8859-9 CP037 EBCDIC CP273 CP277 CP278 CP2
80 CP284 CP285 CP297 CP437 CP500 CP850 CP852 CP857 CP860 CP863 CP865 CP866 CP870
CP871 CP905 ISO646 646 C


Thanks in advance
Regards
kris
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

transfer word document using ftp,sftp

Hello All, I want to transfer some world documents from solaris server to my local PC. using FTP i can not see the content of the files. Pls. tell me some other alternative (as sftp - i have tried with sftp ip_address which is not working) . (3 Replies)
Discussion started by: artikulkarni
3 Replies

2. UNIX for Advanced & Expert Users

Mutt - Word Document or Formatted text as a Message

Hi, I am writing a mailing script by using mutt command. I that i have facing a issues. because, i want to send Some Formatted text as the mail message. but, i try to send the Word Document file as the Mail message. it shows some junk characters in the mail. :confused:I think the mutt command is... (1 Reply)
Discussion started by: krsenkumar
1 Replies

3. UNIX for Dummies Questions & Answers

Converting LATEX PDF to WORD document

Hi there, is it possible to convert pdf files to Word with some free :p software or with some trick??? Now I'm working with LATEX and I can get pdf format but I would like to get .rtf or .doc files too:rolleyes:. Lately I found something like that, but it wasn't free. Thanks for any... (1 Reply)
Discussion started by: Giordano Bruno
1 Replies

4. Solaris

Copy and paste text from a word document into a txt file in vi

Hello, Can anybody please tell me how we can copy and paste text from a word document into a text file that we are editing in vi? Is it possible to do that while we are editing the text file in vi in insert mode? Thanks, (3 Replies)
Discussion started by: Pouchie1
3 Replies

5. Shell Programming and Scripting

copy contents of unix file to Word document

Hello, I have a unix file about 3000lines which i want to copy from and paste it into a Word document. If i cat the file and try to scroll through it then not everything is captured so i am getting and incomplete paste. Any help is really appreciated. jak (2 Replies)
Discussion started by: jakSun8
2 Replies

6. Programming

extract xml data and create word document using perl.

hi, i have large xml file which contains students information, i need to extract student number and some address tags and create a word document for the extracted data. my data looking llike this <student> <number>24</number> <education>bachelors</education> ... (1 Reply)
Discussion started by: veerubiji
1 Replies

7. Programming

Extract xml data and create word document using perl.

Hi, I have large xml data file.I need to extract node and some tags in the node and after I need to create word document. my XMl data is look like as below -<student> <number>24</number> <education>bachelor</bachelor> <specialization>computers</specialization> ... (3 Replies)
Discussion started by: veerubiji
3 Replies

8. Shell Programming and Scripting

Creating word document (.doc) with attachment in unix solaris

Hi All, Is it possible to creation a word document (.doc) in unix solaris which includes an attachment (i.e similar to insert -> object add attachment in windows) Requirement is to add files into .doc and the files is present in unix servers. Is it possible to do this within unix instead... (5 Replies)
Discussion started by: ajay547
5 Replies

9. Shell Programming and Scripting

Word change in a document

I have a bunch of documents where I need to change the word pi to pisignage. No big deal there: sed -i -e 's/pi/spisignage/g' /path/to/file However it is finding things like the word stopping and making the word stoppisignageng. Any suggestions to just find the work pi and change it? ... (0 Replies)
Discussion started by: wspgpete
0 Replies

10. Shell Programming and Scripting

Word change in a document

I have a bunch of documents where I need to change the word pi to pisignage. No big deal there: sed -i -e 's/pi/spisignage/g' /path/to/file However it is finding things like the word stopping and making the word stoppisignageng. Any suggestions to just find the word pi and change it? ... (3 Replies)
Discussion started by: wspgpete
3 Replies
code_page(5)							File Formats Manual						      code_page(5)

NAME
code_page, cp437, cp737, cp775, cp850, cp852, cp855, cp857, cp860, cp861, cp862, cp863, cp865, cp866, cp869, cp874, cp932, cp936, cp949, cp950, cp1250, cp1251, cp1252, cp1253, cp1254, cp1255, cp1256, cp1257, cp1258, dingbats, symbol - Coded character sets that are used on Mi- crosoft Windows and NT systems DESCRIPTION
Code pages are coded character sets that are used on Microsoft Windows, Windows 95, and NT systems. Just as there are different UNIX code- sets, there are different PC code pages, each supporting a particular set of character encodings. A Tru64 UNIX system supplies one locale, en_US.cp850, that directly supports a PC code-page format (MS-DOS Latin 1). For all other locales, data in code-page format is supported only through codeset converters. These converters can be run directly by users or by software or applications that exchange data between PC and Tru64 UNIX systems. Fonts and other kinds of character support are available only for the native UNIX codeset to which a code page can be converted. See the i18n_intro(5) reference page for introductory information on locales and codesets. See the iconv_intro(5) reference page for an introduction to codeset conversion and the name format and location of codeset con- verters. The following table lists and describes the code pages that have conversion support on a Tru64 UNIX system. An asterisk (*) follows the names of code pages that include support for the Euro currency sign (C=). ------------------------------------------------------ Code Page Description ------------------------------------------------------ cp437 MS-DOS United States cp737 Greek cp775 Baltic languages (1) cp850 MS-DOS Multilingual (Latin-1) cp852 MS-DOS Slavic (Latin-2) cp855 IBM Cyrillic cp857 IBM Turkish cp860 MS-DOS Portuguese cp861 MS-DOS Icelandic cp862 Hebrew cp863 MS-DOS Canadian French cp865 MS-DOS Nordic languages cp866 MS-DOS Russian cp869 IBM Modern Greek cp874 * MS-DOS Thai cp932 Japanese cp936 Chinese (People's Republic of China) cp949 Korean cp950 Chinese (Hong Kong) cp1250 * Windows Latin-2 cp1251 * Windows Cyrillic cp1252 * Windows Latin-1 cp1253 * Windows Greek cp1254 * Windows Turkish cp1255 * Windows Hebrew cp1256 * Windows Arabic cp1257 * Windows Baltic (1) cp1258 * Windows Vietnamese dingbats Microsoft dingbat characters symbol Microsoft miscellaneous symbol characters ------------------------------------------------------ (1) Baltic languages include Estonian, Latvian, and Lithuanian. (2) Latin-2 languages include Albanian, Croatian, Czech, Faeroese, Hungarian, Polish, Romanian, Latin Serbian, Slovak, and Slovenian. (3) Cyrillic languages include Byelorussian, Bulgarian, and Russian. In all cases, a code page can be converted to and from the UCS-2, UCS-4, and UTF-8 codesets. In addition, some code pages can be converted directly to ISO codesets as shown in the following table, although some data loss may occur. ------------------------------------------ Code Page Can Be Converted Directly to: ------------------------------------------ cp437 ISO8859-1 cp737 ISO8859-7 cp775 ISO8859-4 cp850 ISO8859-1 cp852 ISO8859-2 cp855 ISO8859-5 cp857 ISO8859-9 cp860 ISO8859-1 cp861 ISO8859-1 cp862 ISO8859-8 cp863 ISO8859-1 cp865 ISO8859-1 cp866 ISO8859-5 cp869 ISO8859-7 cp874 TACTIS cp1252 ISO8859-1, ISO8859-15 ------------------------------------------ See Unicode(5) for information about UCS-2, UCS-4, and UTF-8. Reference pages for UNIX implementations of the ISO codesets have the name format iso8859-number(5). For Traditional Chinese and Japanese, there are no codeset converters whose names include the name of a code page because identical charac- ter encoding is provided in existing UNIX codesets. For Traditional Chinese, character encoding in PC code-page format (cp950) is identical to that in the Big-5 (big5) codeset. For Japanese, character encoding in PC code-page format (cp932) is identical to that in the Shift JIS (SJIS) codeset. Therefore, the codeset converters whose names include big5 and SJIS can be used to convert data in and out of PC code-page format for the supported languages. Caution for Conversion of Korean and Simplified Chinese Conversion of text that starts out in code-page format (cp949) to the DEC Korean (deckorean) codeset may result in loss of data. All of the Tru64 UNIX codeset equivalents for cp949 support all the Hanja and miscellaneous characters also supported by the code page. However, only the UCS-2, UCS-4, and UTF-8 codesets support the complete set of Hangul characters supported by the cp949 code page. The deckorean codeset supports only a subset of these Hangul characters. Therefore, if data is converted from cp949 format to UCS-2, UCS-4, or UTF-8, no data is lost. However, if the data is then converted from UCS-2, UCS-4, or UTF-8 to deckorean, the unsupported Hangul characters will be lost. The DEC Hanzi (dechanzi) codeset uses the same encoding format as the PC code page used for Simplified Chinese (cp936) but does not support all the characters supported by the code page. Therefore, you can use converters with dechanzi in the converter name to convert text to and from cp936 format, but the operation may result in some loss of data. SEE ALSO
Commands: iconv(1) Functions: iconv(3), iconv_close(3), iconv_open(3) Others: i18n_intro(5), iconv_intro(5), iso8859-1(5), iso8859-2(5), iso8859-4(5), iso8859-5(5), iso8859-7(5), iso8859-8(5), iso8859-15(5), Unicode(5) code_page(5)
All times are GMT -4. The time now is 08:59 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy