Problem With UTF8 Byte Order Make Post: 302849623

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Utf8-utf16

Hi All, When we create a flat file using a PLSQL program , the flat file is being created in UTF8 format.This file has lot of german characters.When we use this file to load data into MS SQL Server, the german characters are coming as junk. When we create a flat file in oracle it is being ...

2. Shell Programming and Scripting

problem with 0 byte and large files

how to remove all zero byte files in a particular directory and also files that are morew than 1GB. pLEASE let me know

3. Shell Programming and Scripting

Check if 2 files are identical byte-to-byte?

In my server migration requirement, I need to compare if one file on old server is exactly the same as the corresponding file on the new server. For diff and comm, the inputs need to be sorted. But I do not want to disturb the content of the file and need to find byte-to-byte match. Please...

4. Shell Programming and Scripting

Remove a byte(Last byte from the last line)

Hi All Can anyone please suggest me how to remove the last byte from a falt file .This is from the last line's last BYTE. Please suggest me something. Thank's and regards Vinay

5. Programming

Byte order question

Hi, The structure that will follow is supposed to hold the following RTP header field 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ...

6. Shell Programming and Scripting

UTF8 encoding

Hi experts, I have a gz file from other system(solaris), which is ftped to our system(solaris). After gunzip, the file is a xml file and we are using ORACLE built in xml transformiing tool ORAXSL to transform XML to TXT. Now the issue is we come accross issue regarding UTF8 as below:...

7. Programming

How to use sigmask in order to make signals can be processed by a thread

Hi, I have a UDP server and client program, and they must run within a program, so I decided two threads, one for UDP server and another for UDP client. The simple architecture is shown in attachment. However, I can't send the packets out on the UDP client, no any time message and...

8. Shell Programming and Scripting

redirect stdout and stderr to file wrong order problem with subshell

Hello I read a lot of post related to this topic, but nothing helped me. :mad: I'm running a ksh script with subshell what processing some ldap command. I need to check output for possible errors. #!/bin/ksh ... readinput < $QCHAT_INPUT |& while read -p line do echo $line ...

9. Shell Programming and Scripting

Help with sort data based on descending order problem

Input file 9.99331e-13 8.98451e-65 9.98418e-34 7.98319e-08 365592 111669 74942.9 0 Desired output 365592 111669 74942.9 7.98319e-08 1.99331e-13 6.98418e-34

10. Debian

Locales UTF8 - not working

Hello, I'm facing a strange problem in one of my Debian server, what is happening right now it that I have runned dpkg-reconfigure locales to set en_US UTF-8 so in that way I could use accentuation in my system. # locale -a C en_US.utf8 POSIX pt_BR.utf8 However, when I create a new...

LEARN ABOUT DEBIAN

nkf

nkf(1)																	    nkf(1)

NAME

       nkf - Network Kanji Filter

SYNOPSIS

       nkf [-butjnesliohrTVvwWJESZxXFfmMBOcdILg] [file ...]

DESCRIPTION

       Nkf is a yet another kanji code converter among networks, hosts and terminals.  It converts input kanji code to designated kanji code such
       as ISO-2022-JP, Shift_JIS, EUC-JP, UTF-8, UTF-16 or UTF-32.

       One of the most unique faculty of nkf is the guess of the input kanji encodings.  It currently recognizes ISO-2022-JP, Shift_JIS, EUC-JP,
       UTF-8, UTF-16 and UTF-32.  So users needn't set the input kanji code explicitly.

       By default, X0201 kana is converted into X0208 kana.  For X0201 kana, SO/SI, SSO and ESC-(-I methods are supported.  For automatic code
       detection, nkf assumes no X0201 kana in Shift_JIS.  To accept X0201 in Shift_JIS, use -X, -x or -S.

OPTIONS

       -J -S -E -W -W16 -W32 -j -s -e -w -w16 -w32
	   Specify input and output encodings. Upper case is input.  cf. --ic and --oc.

	   -J  ISO-2022-JP (JIS code).

	   -S  Shift_JIS and JIS X 0201 kana.  EUC-JP is recognized as X0201 kana. Without -x flag, JIS X 0201 Katakana (a.k.a.halfwidth kana) is
	       converted into JIS X 0208.  If you use Windows, see Windows-31J (CP932).

	   -E  EUC-JP.

	   -W  UTF-8N.

	   -W16[BL][0]
	       UTF-16.	B or L gives whether Big Endian or Little Endian.  0 gives whther put BOM or not.

	   -W32[BL][0]
	       UTF-32.	B or L gives whether Big Endian or Little Endian.  0 gives whther put BOM or not.

       -b -u
	   Output is buffered (DEFAULT), Output is unbuffered.

       -t  No conversion.

       -i[@B]
	   Specify the escape sequence for JIS X 0208.

	   -i@ Use ESC ( @. (JIS X 0208-1978)

	   -iB Use ESC ( B. (JIS X 0208-1983/1990 DEFAULT)

       -o[BJ]
	   Specify the escape sequence for US-ASCII/JIS X 0201 Roman. (DEFAULT B)

       -r  {de/en}crypt ROT13/47

       -h[123] --hiragana --katakana --katakana-hiragana
	   -h1 --hiragana
	       Katakana to Hiragana conversion.

	   -h2 --katakana
	       Hiragana to Katakana conversion.

	   -h3 --katakana-hiragana
	       Katakana to Hiragana and Hiragana to Katakana conversion.

       -T  Text mode output (MS-DOS)

       -f[m [- n]]
	   Folding on m length with n margin in a line.  Without this option, fold length is 60 and fold margin is 10.

       -F  New line preserving line folding.

       -Z[0-3]
	   Convert X0208 alphabet (Fullwidth Alphabets) to ASCII.

	   -Z -Z0
	       Convert X0208 alphabet to ASCII.

	   -Z1 Convert X0208 kankaku to single ASCII space.

	   -Z2 Convert X0208 kankaku to double ASCII spaces.

	   -Z3 Replacing fullwidth >, <, ", & into '&gt;', '&lt;', '&quot;', '&amp;' as in HTML.

       -X -x
	   With -X or without this option, X0201 is converted into X0208 Kana.	With -x, try to preserve X0208 kana and do not convert X0201 kana
	   to X0208.  In JIS output, ESC-(-I is used. In EUC output, SS2 is used.

       -B[0-2]
	   Assume broken JIS-Kanji input, which lost ESC.  Useful when your site is using old B-News Nihongo patch.

	   -B1 allows any chars after ESC-( or ESC-$.

	   -B2 force ASCII after NL.

       -I  Replacing non iso-2022-jp char into a geta character (substitute character in Japanese).

       -m[BQN0]
	   MIME ISO-2022-JP/ISO8859-1 decode. (DEFAULT) To see ISO8859-1 (Latin-1) -l is necessary.

	   -mB Decode MIME base64 encoded stream. Remove header or other part before conversion.

	   -mQ Decode MIME quoted stream. '_' in quoted stream is converted to space.

	   -mN Non-strict decoding.  It allows line break in the middle of the base64 encoding.

	   -m0 No MIME decode.

       -M  MIME encode. Header style. All ASCII code and control characters are intact.

	   -MB MIME encode Base64 stream.  Kanji conversion is performed before encoding, so this cannot be used as a picture encoder.

	   -MQ Perform quoted encoding.

       -l  Input and output code is ISO8859-1 (Latin-1) and ISO-2022-JP.  -s, -e and -x are not compatible with this option.

       -L[uwm] -d -c
	   Convert line breaks.

	   -Lu -d
	       unix (LF)

	   -Lw -c
	       windows (CRLF)

	   -Lm mac (CR)

	       Without this option, nkf doesn't convert line breaks.

       --fj --unix --mac --msdos --windows
	   Convert for these systems.

       --jis --euc --sjis --mime --base64
	   Convert to named code.

       --jis-input --euc-input --sjis-input --mime-input --base64-input
	   Assume input system

       --ic=input codeset --oc=output codeset
	   Set the input or output codeset.  NKF supports following codesets and those codeset names are case insensitive.

	   ISO-2022-JP
	       a.k.a. RFC1468, 7bit JIS, JUNET

	   EUC-JP (eucJP-nkf)
	       a.k.a. AT&T JIS, Japanese EUC, UJIS

	   eucJP-ascii
	   eucJP-ms
	   CP51932
	       Microsoft Version of EUC-JP.

	   Shift_JIS
	       a.k.a. SJIS, MS_Kanji

	   Windows-31J
	       a.k.a. CP932

	   UTF-8
	       same as UTF-8N

	   UTF-8N
	       UTF-8 without BOM

	   UTF-8-BOM
	       UTF-8 with BOM

	   UTF8-MAC (input only)
	       decomposed UTF-8

	   UTF-16
	       same as UTF-16BE

	   UTF-16BE
	       UTF-16 Big Endian without BOM

	   UTF-16BE-BOM
	       UTF-16 Big Endian with BOM

	   UTF-16LE
	       UTF-16 Little Endian without BOM

	   UTF-16LE-BOM
	       UTF-16 Little Endian with BOM

	   UTF-32
	       same as UTF-32BE

	   UTF-32BE
	       UTF-32 Big Endian without BOM

	   UTF-32BE-BOM
	       UTF-32 Big Endian with BOM

	   UTF-32LE
	       UTF-32 Little Endian without BOM

	   UTF-32LE-BOM
	       UTF-32 Little Endian with BOM

       --fb-{skip, html, xml, perl, java, subchar}
	   Specify the way that nkf handles unassigned characters.  Without this option, --fb-skip is assumed.

       --prefix=escape charactertarget character..
	   When nkf converts to Shift_JIS, nkf adds a specified escape character to specified 2nd byte of Shift_JIS characters.  1st byte of argu-
	   ment is the escape character and following bytes are target characters.

       --no-cp932ext
	   Handle the characters extended in CP932 as unassigned characters.

       --no-best-fit-chars
	   When Unicode to Encoded byte conversion, don't convert characters which is not round trip safe.  When Unicode to Unicode conversion,
	   with this and -x option, nkf can be used as UTF converter.  (In other words, without this and -x option, nkf doesn't save some charac-
	   ters)

	   When nkf converts strings that related to path, you should use this opion.

       --cap-input
	   Decode hex encoded characters.

       --url-input
	   Unescape percent escaped characters.

       --numchar-input
	   Decode character reference, such as "&#....;".

       --in-place[=SUFFIX]  --overwrite[=SUFFIX]
	   Overwrite original listed files by filtered result.

	   Note --overwrite preserves timestamps of original files.

       --guess=[12]
	   Print guessed encoding and newline. (2 is default, 1 is only encoding)

       --help
	   Print nkf's help.

       --version
	   Print nkf's version.

       --  Ignore rest of -option.

AUTHOR

       Copyright (c) 1987, Fujitsu LTD. (Itaru ICHIKAWA).

       Copyright (c) 1996-2010, The nkf Project.

nkf 2.1.2							    2011-09-08								    nkf(1)

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Utf8-utf16

Discussion started by: Suppandi

2. Shell Programming and Scripting

problem with 0 byte and large files

Discussion started by: dsravan

3. Shell Programming and Scripting

Check if 2 files are identical byte-to-byte?

Discussion started by: krishmaths

4. Shell Programming and Scripting

Remove a byte(Last byte from the last line)

Discussion started by: vinayrao

5. Programming

Byte order question

Discussion started by: emitrax

6. Shell Programming and Scripting

UTF8 encoding

Discussion started by: summer_cherry

7. Programming

How to use sigmask in order to make signals can be processed by a thread

Discussion started by: sehang