Sponsored Content
Top Forums Shell Programming and Scripting How to remove Unicode <feff> from top of file? Post 302739745 by jim mcnamara on Tuesday 4th of December 2012 11:15:14 PM
Old 12-05-2012
Exactly How the BOM is encoded in the file depends on whether it is UTF8, UTF16 or UTF32, plus whether the the Text is big endian or little endian.

The BOM is supposed to be at very beginning of the text, hence bipinajith used the ^ to indicate that. What you show as a BOM denotes UTF16 big endian. Is that in fact what you have? Because what you were given by bipiniajith should have worked. That tells me something is not right. Not all BOM's are 0xFEFF.


Code:
Bytes	Encoding Form
00 00 FE FF        UTF-32, big-endian
FF FE 00 00        UTF-32, little-endian
FE FF	                UTF-16, big-endian
FF FE	                UTF-16, little-endian
EF BB BF	        UTF-8

Please enlighten us.
 

10 More Discussions You Might Find Interesting

1. Programming

How to display unicode characters / unicode string

I have a stream of characters like "\u8BBE\u5907\u7BA1" and i want to display it. I tried following things already without any luck. 1) printf("%s",L("\u8BBE\u5907\u7BA1")); 2) printf("%lc",0x8BBE); 3) setlocale followed by fwide followed by wprintf 4) also changed the local manually... (3 Replies)
Discussion started by: jackdorso
3 Replies

2. Shell Programming and Scripting

grep for a particular pattern and remove few lines above top and bottom of the patter

grep for a particular pattern and remove 5 lines above the pattern and 6 lines below the pattern root@server1 # cat filename Shell Programming and Scripting test1 Shell Programminsada asda dasd asd Shell Programming and Scripting Post New Thread Shell Programming and S sadsa ... (17 Replies)
Discussion started by: fed.linuxgossip
17 Replies

3. AIX

want to remove some line from top of file.

Hi All, I have AIX 5.3 server. I have one big file. in that i want to remove 5000 line from top. is there any command for this? Thanks, Vishal (6 Replies)
Discussion started by: vishalpatel03
6 Replies

4. UNIX for Dummies Questions & Answers

remove special and unicode characters

Hi, How do I remove the lines where special characters or Unicode characters appear? The following query does work but I wonder if there is a better way. cat test.txt | egrep -v '\)|#|,|&|-|\(|\\|\/|\.' The following lines show that my query is incomplete. Warning: The word "*Khan" is... (1 Reply)
Discussion started by: shantanuo
1 Replies

5. UNIX for Dummies Questions & Answers

Remove Unicode/special chars from XML

Hi, We are receiving an XML file in Unix which has some special characters between tags like '^' etc <Tag> 1e^O7f%<2304e.$d8f57e8^Bf-&e.^Zh7/327e^O7 </Tag> We need to remove all special characters like ^ ones and also any '&' or '<' or '>' being sent within the start and close tags i.e.... (6 Replies)
Discussion started by: dsrookie7
6 Replies

6. Shell Programming and Scripting

Unicode file validation

I don't want HTML_CONTENT,RICH_CONTENT,TEXT_CONTENT columns data in the file and reset of data we need to extract. Find the attached file. Need to extract date in between DI_UX_ROW_END tag. Can help me using unix command using AWK. Thanks, (2 Replies)
Discussion started by: bmk
2 Replies

7. Shell Programming and Scripting

Request for advise on how to remove control characters in a UNIX file extracted from top command

Hi, Please excuse for posting new thread on control characters, I am facing some difficulties in removing the control character from a file extracted from top command, i am able to see control characters using more command and in vi mode, through cat control characters are not visible ... (8 Replies)
Discussion started by: karthikram
8 Replies

8. Shell Programming and Scripting

Remove top 3 duplicates

hello , I have a requirement with input in below format abc 123 xyz bcd 365 kii abc 987 876 cdf 987 uii abc 456 yuu bcd 654 rrr Expecting Output abc 456 yuu bcd 654 rrr cdf 987 uii (1 Reply)
Discussion started by: Tomlight
1 Replies

9. Shell Programming and Scripting

Remove top and bottom for each column

Dear All I was wondering if someone could help me in resolving an issue. I have a file like this: column1 column2 2 4 3 5 8 9 0 12 0 0 0 0 9 0 87 0 1 0 1 0 1 0 4 0 (2 Replies)
Discussion started by: giuliangiuseppe
2 Replies

10. UNIX for Beginners Questions & Answers

Bash script - Remove the 3 top level of a full path filename

Hello. Source file are in : /a/b/c/d/e/f/g/some_file Destination is : /d/e where sub-directories "f" and "g" may missing or not. After copying I want /a/b/c/d/e/f/g/file1 in /d/e/f/g/file1 On source /a is top-level directory On destination /d is top-level directory I would like... (2 Replies)
Discussion started by: jcdole
2 Replies
GENCFU(1)							 ICU 50.1.2 Manual							 GENCFU(1)

NAME
gencfu - Generates Unicode Confusable data files SYNOPSIS
gencfu [ -h, -?, --help ] [ -V, --version ] [ -c, --copyright ] [ -v, --verbose ] [ -d, --destdir destination ] [ -i, --icudatadir direc- tory ] -r, --rules rule-file -w, --wsrules whole-script-rule-file -o, --out output-file DESCRIPTION
gencfu reads confusable character definitions in the input file, which are plain text files containing confusable character definitions in the input format defined by Unicode UAX39 for the files confusables.txt and confusablesWholeScript.txt. This source (.txt) format is also accepted by ICU spoof detectors. The files must be encoded in utf-8 format, with or without a BOM. Normally the output data file has the .cfu extension. OPTIONS
-h, -?, --help Print help about usage and exit. -V, --version Print the version of gencfu and exit. -c, --copyright Embeds the standard ICU copyright into the output-file. -v, --verbose Display extra informative messages during execution. -d, --destdir destination Set the destination directory of the output-file to destination. -i, --icudatadir directory Look for any necessary ICU data files in directory. For example, the file pnames.icu must be located when ICU's data is not built as a shared library. The default ICU data directory is specified by the environment variable ICU_DATA. Most configurations of ICU do not require this argument. -r, --rules rule-file The source file to read. -w, --wsrules whole-script-rule-file The whole script source file to read. -o, --out output-file The output data file to write. VERSION
1.0 COPYRIGHT
Copyright (C) 2009 International Business Machines Corporation and others ICU MANPAGE
24 May 2009 GENCFU(1)
All times are GMT -4. The time now is 10:52 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy