Help to Convert file from UNIX UTF-8 to Windows UTF-16 Post: 302886875

10 More Discussions You Might Find Interesting

1. Programming

Howto convert Ascii -> UTF-8 & back C++

While working with russian text under FreeBSD&MySQL I need to convert a string from MySQL to the Unicode format. I've just started my way in C++ under FreeBSD , so please explain me how can I get ascii code of Char variable and also how can i get a character into variable with the specified ascii...

2. UNIX for Dummies Questions & Answers

grep and UNICODE (utf-16) file

I'm using shell scripting in Applescript. When searching a file with the ANSEL character set (for GEDCOM files) using (grep '1 CHAR ANSEL' filepath) gives the expected result. When searching a UNICODE formatted file (utf-16), searching for text known to exist in the file using (grep '1 CHAR...

3. UNIX for Advanced & Expert Users

Convert UTF-8 encoded hex value to a character

Hi, I have a non-ascii character (Ŵ), which can be represented in UTF-8 encoding as equivalent hex value (\xC5B4). Is there a function in unix to convert this hex value back to display the charcter ?

4. UNIX for Advanced & Expert Users

UTF-8 to EBCDIC conversion in UNIX

Hi all, At present a file from AS400 system is being FTPed to an AIX system. Now, a similar file needs to be sent from our Unix box (Solaris) Is there any tool available which does the conversion in Unix from UTF-8 to EBCDIC? Any suggestions/ pointers are really appreciated. Thanks,...

5. Red Hat

Can't convert 7bit ASCII to UTF-8

Hello, I am trying to convert a 7bit ASCII file to UTF-8. I have used iconv before though it can't recognize it for some reason and says unknown file encoding. When I used ascii2uni package with different package, ./ascii2uni -a K -a I -a J -a X test_file > new_test_file It still...

6. UNIX for Dummies Questions & Answers

Issue with UTF-8 BOM character in text file

Sometimes we recieve some excel files containing French/Japanese characters over the mail, and these files are manually transferred to the server by using SFTP (security is not a huge concern here). The data is changed to text format before transferring it using Notepad. Problem is: When saving...

7. Shell Programming and Scripting

Trying to convert utf-8 to WINDOWS-1251

Hello all i have utf-8 file that i try to convert to WINDOWS-1251 on linux without any success the file name is utf-8 when i try to do : file -bi test.txt it gives me : text/plain; charset=utf-8 when i try to convert the file i do : /usr/bin/iconv -f UTF-8 -t WINDOWS-1251 test.txt >...

8. Shell Programming and Scripting

Copying a file with UTF char on UNIX server

Hi, I need to run a SQL which check for special UTF char in DB. When I try to copy that in UNIX file it changes it to some wierd chat. How can in retain the UTF chars in my script? e.g. ο|π|ρ|σ|τ|υ|φ|χ|ψ Any help will be appriciated. Thanks,

9. Shell Programming and Scripting

Convert UTF-8 file to ASCII/ISO8859-1 OR replace characters

I am trying to develop a script which will work on a source UTF-8 file and perform one or more of the following It will accept the target encoding as an argument e.g. US-ASCII or ISO-8859-1, etc 1. It should replace all occurrences of characters outside target character set by " " (space) or...

10. UNIX for Beginners Questions & Answers

Convert files to UTF-8 on AIX 7.1

Dears, I have a shell script - working perfectly on Oracle Linux - that detects the encoding (the charset to be exact) of the files in a specified directory using the "file" command (The file command outputs the charset in Linux, but doesn't do that in AIX), then if the file isn't a UTF-8 text...

LEARN ABOUT DEBIAN

nkf

nkf(1)																	    nkf(1)

NAME

       nkf - Network Kanji Filter

SYNOPSIS

       nkf [-butjnesliohrTVvwWJESZxXFfmMBOcdILg] [file ...]

DESCRIPTION

       Nkf is a yet another kanji code converter among networks, hosts and terminals.  It converts input kanji code to designated kanji code such
       as ISO-2022-JP, Shift_JIS, EUC-JP, UTF-8, UTF-16 or UTF-32.

       One of the most unique faculty of nkf is the guess of the input kanji encodings.  It currently recognizes ISO-2022-JP, Shift_JIS, EUC-JP,
       UTF-8, UTF-16 and UTF-32.  So users needn't set the input kanji code explicitly.

       By default, X0201 kana is converted into X0208 kana.  For X0201 kana, SO/SI, SSO and ESC-(-I methods are supported.  For automatic code
       detection, nkf assumes no X0201 kana in Shift_JIS.  To accept X0201 in Shift_JIS, use -X, -x or -S.

OPTIONS

       -J -S -E -W -W16 -W32 -j -s -e -w -w16 -w32
	   Specify input and output encodings. Upper case is input.  cf. --ic and --oc.

	   -J  ISO-2022-JP (JIS code).

	   -S  Shift_JIS and JIS X 0201 kana.  EUC-JP is recognized as X0201 kana. Without -x flag, JIS X 0201 Katakana (a.k.a.halfwidth kana) is
	       converted into JIS X 0208.  If you use Windows, see Windows-31J (CP932).

	   -E  EUC-JP.

	   -W  UTF-8N.

	   -W16[BL][0]
	       UTF-16.	B or L gives whether Big Endian or Little Endian.  0 gives whther put BOM or not.

	   -W32[BL][0]
	       UTF-32.	B or L gives whether Big Endian or Little Endian.  0 gives whther put BOM or not.

       -b -u
	   Output is buffered (DEFAULT), Output is unbuffered.

       -t  No conversion.

       -i[@B]
	   Specify the escape sequence for JIS X 0208.

	   -i@ Use ESC ( @. (JIS X 0208-1978)

	   -iB Use ESC ( B. (JIS X 0208-1983/1990 DEFAULT)

       -o[BJ]
	   Specify the escape sequence for US-ASCII/JIS X 0201 Roman. (DEFAULT B)

       -r  {de/en}crypt ROT13/47

       -h[123] --hiragana --katakana --katakana-hiragana
	   -h1 --hiragana
	       Katakana to Hiragana conversion.

	   -h2 --katakana
	       Hiragana to Katakana conversion.

	   -h3 --katakana-hiragana
	       Katakana to Hiragana and Hiragana to Katakana conversion.

       -T  Text mode output (MS-DOS)

       -f[m [- n]]
	   Folding on m length with n margin in a line.  Without this option, fold length is 60 and fold margin is 10.

       -F  New line preserving line folding.

       -Z[0-3]
	   Convert X0208 alphabet (Fullwidth Alphabets) to ASCII.

	   -Z -Z0
	       Convert X0208 alphabet to ASCII.

	   -Z1 Convert X0208 kankaku to single ASCII space.

	   -Z2 Convert X0208 kankaku to double ASCII spaces.

	   -Z3 Replacing fullwidth >, <, ", & into '&gt;', '&lt;', '&quot;', '&amp;' as in HTML.

       -X -x
	   With -X or without this option, X0201 is converted into X0208 Kana.	With -x, try to preserve X0208 kana and do not convert X0201 kana
	   to X0208.  In JIS output, ESC-(-I is used. In EUC output, SS2 is used.

       -B[0-2]
	   Assume broken JIS-Kanji input, which lost ESC.  Useful when your site is using old B-News Nihongo patch.

	   -B1 allows any chars after ESC-( or ESC-$.

	   -B2 force ASCII after NL.

       -I  Replacing non iso-2022-jp char into a geta character (substitute character in Japanese).

       -m[BQN0]
	   MIME ISO-2022-JP/ISO8859-1 decode. (DEFAULT) To see ISO8859-1 (Latin-1) -l is necessary.

	   -mB Decode MIME base64 encoded stream. Remove header or other part before conversion.

	   -mQ Decode MIME quoted stream. '_' in quoted stream is converted to space.

	   -mN Non-strict decoding.  It allows line break in the middle of the base64 encoding.

	   -m0 No MIME decode.

       -M  MIME encode. Header style. All ASCII code and control characters are intact.

	   -MB MIME encode Base64 stream.  Kanji conversion is performed before encoding, so this cannot be used as a picture encoder.

	   -MQ Perform quoted encoding.

       -l  Input and output code is ISO8859-1 (Latin-1) and ISO-2022-JP.  -s, -e and -x are not compatible with this option.

       -L[uwm] -d -c
	   Convert line breaks.

	   -Lu -d
	       unix (LF)

	   -Lw -c
	       windows (CRLF)

	   -Lm mac (CR)

	       Without this option, nkf doesn't convert line breaks.

       --fj --unix --mac --msdos --windows
	   Convert for these systems.

       --jis --euc --sjis --mime --base64
	   Convert to named code.

       --jis-input --euc-input --sjis-input --mime-input --base64-input
	   Assume input system

       --ic=input codeset --oc=output codeset
	   Set the input or output codeset.  NKF supports following codesets and those codeset names are case insensitive.

	   ISO-2022-JP
	       a.k.a. RFC1468, 7bit JIS, JUNET

	   EUC-JP (eucJP-nkf)
	       a.k.a. AT&T JIS, Japanese EUC, UJIS

	   eucJP-ascii
	   eucJP-ms
	   CP51932
	       Microsoft Version of EUC-JP.

	   Shift_JIS
	       a.k.a. SJIS, MS_Kanji

	   Windows-31J
	       a.k.a. CP932

	   UTF-8
	       same as UTF-8N

	   UTF-8N
	       UTF-8 without BOM

	   UTF-8-BOM
	       UTF-8 with BOM

	   UTF8-MAC (input only)
	       decomposed UTF-8

	   UTF-16
	       same as UTF-16BE

	   UTF-16BE
	       UTF-16 Big Endian without BOM

	   UTF-16BE-BOM
	       UTF-16 Big Endian with BOM

	   UTF-16LE
	       UTF-16 Little Endian without BOM

	   UTF-16LE-BOM
	       UTF-16 Little Endian with BOM

	   UTF-32
	       same as UTF-32BE

	   UTF-32BE
	       UTF-32 Big Endian without BOM

	   UTF-32BE-BOM
	       UTF-32 Big Endian with BOM

	   UTF-32LE
	       UTF-32 Little Endian without BOM

	   UTF-32LE-BOM
	       UTF-32 Little Endian with BOM

       --fb-{skip, html, xml, perl, java, subchar}
	   Specify the way that nkf handles unassigned characters.  Without this option, --fb-skip is assumed.

       --prefix=escape charactertarget character..
	   When nkf converts to Shift_JIS, nkf adds a specified escape character to specified 2nd byte of Shift_JIS characters.  1st byte of argu-
	   ment is the escape character and following bytes are target characters.

       --no-cp932ext
	   Handle the characters extended in CP932 as unassigned characters.

       --no-best-fit-chars
	   When Unicode to Encoded byte conversion, don't convert characters which is not round trip safe.  When Unicode to Unicode conversion,
	   with this and -x option, nkf can be used as UTF converter.  (In other words, without this and -x option, nkf doesn't save some charac-
	   ters)

	   When nkf converts strings that related to path, you should use this opion.

       --cap-input
	   Decode hex encoded characters.

       --url-input
	   Unescape percent escaped characters.

       --numchar-input
	   Decode character reference, such as "&#....;".

       --in-place[=SUFFIX]  --overwrite[=SUFFIX]
	   Overwrite original listed files by filtered result.

	   Note --overwrite preserves timestamps of original files.

       --guess=[12]
	   Print guessed encoding and newline. (2 is default, 1 is only encoding)

       --help
	   Print nkf's help.

       --version
	   Print nkf's version.

       --  Ignore rest of -option.

AUTHOR

       Copyright (c) 1987, Fujitsu LTD. (Itaru ICHIKAWA).

       Copyright (c) 1996-2010, The nkf Project.

nkf 2.1.2							    2011-09-08								    nkf(1)

10 More Discussions You Might Find Interesting

1. Programming

Howto convert Ascii -> UTF-8 & back C++

Discussion started by: macron

2. UNIX for Dummies Questions & Answers

grep and UNICODE (utf-16) file

Discussion started by: Whiterock

3. UNIX for Advanced & Expert Users

Convert UTF-8 encoded hex value to a character

Discussion started by: sumirmehta

4. UNIX for Advanced & Expert Users

UTF-8 to EBCDIC conversion in UNIX

Discussion started by: sridhar_423

5. Red Hat

Can't convert 7bit ASCII to UTF-8

Discussion started by: rockf1bull

6. UNIX for Dummies Questions & Answers

Issue with UTF-8 BOM character in text file

Discussion started by: jawsnnn

7. Shell Programming and Scripting

Trying to convert utf-8 to WINDOWS-1251

Discussion started by: umen