Sponsored Content
Operating Systems Linux Help to Convert file from UNIX UTF-8 to Windows UTF-16 Post 302886875 by phanidhar6039 on Tuesday 4th of February 2014 06:00:11 AM
Old 02-04-2014
Usually all the file transferred should be binary format so that nothing can be changed so that we don’t get any unknown characters

Let us consider the file name as Orgdata_UTF8.txt then output file as Orgdata.txt
Code:
unix2dos < Orgdata_UTF8.txt |iconv -f UTF-8 -t UTF-16LE>Orgdata.txt

As some systems add the BOM by default and some systems doesn’t add the BOM based on the operating systems and it is also the similar case with UTF-16LE format as it is sometimes recognised as UTF-16 and some as UTF-16LE based on versions and use them as needed.

Adding BOM manually

Create a new file as below Orgdata.txt and check the file type using file command to confirm that it is UTF-16LE format and then convert it as below

Code:
printf "\xff\xfe" > Orgdata.txt
file Orgdata.txt 
unix2dos < Orgdata_UTF8.txt |iconv -f UTF-8 -t UTF-16LE>>Orgdata.txt

Use the hex coder to check if you have got the desired result of ff fe or not. This result varies depending on the type of hexdump used.
Code:
cat Orgdata.txt |hexdump |less  -- this shows as fe ff          
xxd < Orgdata.txt |less              -- same file shows as ff fe

Code:
cat -vT Orgdata.txt

In reality both of them are same as one of them shows the reversing output.

This has resolved my issue
 

10 More Discussions You Might Find Interesting

1. Programming

Howto convert Ascii -> UTF-8 & back C++

While working with russian text under FreeBSD&MySQL I need to convert a string from MySQL to the Unicode format. I've just started my way in C++ under FreeBSD , so please explain me how can I get ascii code of Char variable and also how can i get a character into variable with the specified ascii... (3 Replies)
Discussion started by: macron
3 Replies

2. UNIX for Dummies Questions & Answers

grep and UNICODE (utf-16) file

I'm using shell scripting in Applescript. When searching a file with the ANSEL character set (for GEDCOM files) using (grep '1 CHAR ANSEL' filepath) gives the expected result. When searching a UNICODE formatted file (utf-16), searching for text known to exist in the file using (grep '1 CHAR... (4 Replies)
Discussion started by: Whiterock
4 Replies

3. UNIX for Advanced & Expert Users

Convert UTF-8 encoded hex value to a character

Hi, I have a non-ascii character (Ŵ), which can be represented in UTF-8 encoding as equivalent hex value (\xC5B4). Is there a function in unix to convert this hex value back to display the charcter ? (10 Replies)
Discussion started by: sumirmehta
10 Replies

4. UNIX for Advanced & Expert Users

UTF-8 to EBCDIC conversion in UNIX

Hi all, At present a file from AS400 system is being FTPed to an AIX system. Now, a similar file needs to be sent from our Unix box (Solaris) Is there any tool available which does the conversion in Unix from UTF-8 to EBCDIC? Any suggestions/ pointers are really appreciated. Thanks,... (4 Replies)
Discussion started by: sridhar_423
4 Replies

5. Red Hat

Can't convert 7bit ASCII to UTF-8

Hello, I am trying to convert a 7bit ASCII file to UTF-8. I have used iconv before though it can't recognize it for some reason and says unknown file encoding. When I used ascii2uni package with different package, ./ascii2uni -a K -a I -a J -a X test_file > new_test_file It still... (2 Replies)
Discussion started by: rockf1bull
2 Replies

6. UNIX for Dummies Questions & Answers

Issue with UTF-8 BOM character in text file

Sometimes we recieve some excel files containing French/Japanese characters over the mail, and these files are manually transferred to the server by using SFTP (security is not a huge concern here). The data is changed to text format before transferring it using Notepad. Problem is: When saving... (4 Replies)
Discussion started by: jawsnnn
4 Replies

7. Shell Programming and Scripting

Trying to convert utf-8 to WINDOWS-1251

Hello all i have utf-8 file that i try to convert to WINDOWS-1251 on linux without any success the file name is utf-8 when i try to do : file -bi test.txt it gives me : text/plain; charset=utf-8 when i try to convert the file i do : /usr/bin/iconv -f UTF-8 -t WINDOWS-1251 test.txt >... (1 Reply)
Discussion started by: umen
1 Replies

8. Shell Programming and Scripting

Copying a file with UTF char on UNIX server

Hi, I need to run a SQL which check for special UTF char in DB. When I try to copy that in UNIX file it changes it to some wierd chat. How can in retain the UTF chars in my script? e.g. ο|π|ρ|σ|τ|υ|φ|χ|ψ Any help will be appriciated. Thanks, (14 Replies)
Discussion started by: varun22486
14 Replies

9. Shell Programming and Scripting

Convert UTF-8 file to ASCII/ISO8859-1 OR replace characters

I am trying to develop a script which will work on a source UTF-8 file and perform one or more of the following It will accept the target encoding as an argument e.g. US-ASCII or ISO-8859-1, etc 1. It should replace all occurrences of characters outside target character set by " " (space) or... (3 Replies)
Discussion started by: hemkiran.s
3 Replies

10. UNIX for Beginners Questions & Answers

Convert files to UTF-8 on AIX 7.1

Dears, I have a shell script - working perfectly on Oracle Linux - that detects the encoding (the charset to be exact) of the files in a specified directory using the "file" command (The file command outputs the charset in Linux, but doesn't do that in AIX), then if the file isn't a UTF-8 text... (4 Replies)
Discussion started by: JeanM-1
4 Replies
nkf(1)																	    nkf(1)

NAME
nkf - Network Kanji Filter SYNOPSIS
nkf [-butjnesliohrTVvwWJESZxXFfmMBOcdILg] [file ...] DESCRIPTION
Nkf is a yet another kanji code converter among networks, hosts and terminals. It converts input kanji code to designated kanji code such as ISO-2022-JP, Shift_JIS, EUC-JP, UTF-8, UTF-16 or UTF-32. One of the most unique faculty of nkf is the guess of the input kanji encodings. It currently recognizes ISO-2022-JP, Shift_JIS, EUC-JP, UTF-8, UTF-16 and UTF-32. So users needn't set the input kanji code explicitly. By default, X0201 kana is converted into X0208 kana. For X0201 kana, SO/SI, SSO and ESC-(-I methods are supported. For automatic code detection, nkf assumes no X0201 kana in Shift_JIS. To accept X0201 in Shift_JIS, use -X, -x or -S. OPTIONS
-J -S -E -W -W16 -W32 -j -s -e -w -w16 -w32 Specify input and output encodings. Upper case is input. cf. --ic and --oc. -J ISO-2022-JP (JIS code). -S Shift_JIS and JIS X 0201 kana. EUC-JP is recognized as X0201 kana. Without -x flag, JIS X 0201 Katakana (a.k.a.halfwidth kana) is converted into JIS X 0208. If you use Windows, see Windows-31J (CP932). -E EUC-JP. -W UTF-8N. -W16[BL][0] UTF-16. B or L gives whether Big Endian or Little Endian. 0 gives whther put BOM or not. -W32[BL][0] UTF-32. B or L gives whether Big Endian or Little Endian. 0 gives whther put BOM or not. -b -u Output is buffered (DEFAULT), Output is unbuffered. -t No conversion. -i[@B] Specify the escape sequence for JIS X 0208. -i@ Use ESC ( @. (JIS X 0208-1978) -iB Use ESC ( B. (JIS X 0208-1983/1990 DEFAULT) -o[BJ] Specify the escape sequence for US-ASCII/JIS X 0201 Roman. (DEFAULT B) -r {de/en}crypt ROT13/47 -h[123] --hiragana --katakana --katakana-hiragana -h1 --hiragana Katakana to Hiragana conversion. -h2 --katakana Hiragana to Katakana conversion. -h3 --katakana-hiragana Katakana to Hiragana and Hiragana to Katakana conversion. -T Text mode output (MS-DOS) -f[m [- n]] Folding on m length with n margin in a line. Without this option, fold length is 60 and fold margin is 10. -F New line preserving line folding. -Z[0-3] Convert X0208 alphabet (Fullwidth Alphabets) to ASCII. -Z -Z0 Convert X0208 alphabet to ASCII. -Z1 Convert X0208 kankaku to single ASCII space. -Z2 Convert X0208 kankaku to double ASCII spaces. -Z3 Replacing fullwidth >, <, ", & into '&gt;', '&lt;', '&quot;', '&amp;' as in HTML. -X -x With -X or without this option, X0201 is converted into X0208 Kana. With -x, try to preserve X0208 kana and do not convert X0201 kana to X0208. In JIS output, ESC-(-I is used. In EUC output, SS2 is used. -B[0-2] Assume broken JIS-Kanji input, which lost ESC. Useful when your site is using old B-News Nihongo patch. -B1 allows any chars after ESC-( or ESC-$. -B2 force ASCII after NL. -I Replacing non iso-2022-jp char into a geta character (substitute character in Japanese). -m[BQN0] MIME ISO-2022-JP/ISO8859-1 decode. (DEFAULT) To see ISO8859-1 (Latin-1) -l is necessary. -mB Decode MIME base64 encoded stream. Remove header or other part before conversion. -mQ Decode MIME quoted stream. '_' in quoted stream is converted to space. -mN Non-strict decoding. It allows line break in the middle of the base64 encoding. -m0 No MIME decode. -M MIME encode. Header style. All ASCII code and control characters are intact. -MB MIME encode Base64 stream. Kanji conversion is performed before encoding, so this cannot be used as a picture encoder. -MQ Perform quoted encoding. -l Input and output code is ISO8859-1 (Latin-1) and ISO-2022-JP. -s, -e and -x are not compatible with this option. -L[uwm] -d -c Convert line breaks. -Lu -d unix (LF) -Lw -c windows (CRLF) -Lm mac (CR) Without this option, nkf doesn't convert line breaks. --fj --unix --mac --msdos --windows Convert for these systems. --jis --euc --sjis --mime --base64 Convert to named code. --jis-input --euc-input --sjis-input --mime-input --base64-input Assume input system --ic=input codeset --oc=output codeset Set the input or output codeset. NKF supports following codesets and those codeset names are case insensitive. ISO-2022-JP a.k.a. RFC1468, 7bit JIS, JUNET EUC-JP (eucJP-nkf) a.k.a. AT&T JIS, Japanese EUC, UJIS eucJP-ascii eucJP-ms CP51932 Microsoft Version of EUC-JP. Shift_JIS a.k.a. SJIS, MS_Kanji Windows-31J a.k.a. CP932 UTF-8 same as UTF-8N UTF-8N UTF-8 without BOM UTF-8-BOM UTF-8 with BOM UTF8-MAC (input only) decomposed UTF-8 UTF-16 same as UTF-16BE UTF-16BE UTF-16 Big Endian without BOM UTF-16BE-BOM UTF-16 Big Endian with BOM UTF-16LE UTF-16 Little Endian without BOM UTF-16LE-BOM UTF-16 Little Endian with BOM UTF-32 same as UTF-32BE UTF-32BE UTF-32 Big Endian without BOM UTF-32BE-BOM UTF-32 Big Endian with BOM UTF-32LE UTF-32 Little Endian without BOM UTF-32LE-BOM UTF-32 Little Endian with BOM --fb-{skip, html, xml, perl, java, subchar} Specify the way that nkf handles unassigned characters. Without this option, --fb-skip is assumed. --prefix=escape charactertarget character.. When nkf converts to Shift_JIS, nkf adds a specified escape character to specified 2nd byte of Shift_JIS characters. 1st byte of argu- ment is the escape character and following bytes are target characters. --no-cp932ext Handle the characters extended in CP932 as unassigned characters. --no-best-fit-chars When Unicode to Encoded byte conversion, don't convert characters which is not round trip safe. When Unicode to Unicode conversion, with this and -x option, nkf can be used as UTF converter. (In other words, without this and -x option, nkf doesn't save some charac- ters) When nkf converts strings that related to path, you should use this opion. --cap-input Decode hex encoded characters. --url-input Unescape percent escaped characters. --numchar-input Decode character reference, such as "&#....;". --in-place[=SUFFIX] --overwrite[=SUFFIX] Overwrite original listed files by filtered result. Note --overwrite preserves timestamps of original files. --guess=[12] Print guessed encoding and newline. (2 is default, 1 is only encoding) --help Print nkf's help. --version Print nkf's version. -- Ignore rest of -option. AUTHOR
Copyright (c) 1987, Fujitsu LTD. (Itaru ICHIKAWA). Copyright (c) 1996-2010, The nkf Project. nkf 2.1.2 2011-09-08 nkf(1)
All times are GMT -4. The time now is 03:28 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy