How to remove Unicode <feff> from top of file?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to remove Unicode <feff> from top of file?
# 1  
Old 12-04-2012
How to remove Unicode <feff> from top of file?

Experts,

this has been dumped on me at the last minute.... i am having issue on few files where im getting files from source with BOM mark at the top of every file and i need to check for its existence and remove it.

Code:
<feff>
header
Coulmn1|column2......n

i know i can simply do sed on it like this to get rid of 1st line...
Code:
sed 1d FilewidFEFF.csv > other.txt

and it works great and removes 1st line.
but my goal is to first check if 1st line has BOM or not and then only delete 1st line. since its unicode i've NOT been able to grep it successfully....
any ideas pleas...

thanks so much for your inputs....
truly appreciate it

Moderator's Comments:
Mod Comment Please use code tags for code and data

Last edited by Scrutinizer; 12-04-2012 at 08:11 PM.. Reason: mis-spel; mod: code tags
# 2  
Old 12-04-2012
Try using this command
Code:
grep -v $(echo -ne '^\0376\0377') input_file

# 3  
Old 12-04-2012
I tried it in many forms but its no help at all...

Code:
 cat xyz.csv | grep -v $(echo -ne '^\0376\0377')
 
or
 
 grep -v $(echo -ne '^\0376\0377') xyz.csv > fixed.txt

what was your thought process...

Thanks for the post though... Smilie
# 4  
Old 12-04-2012
Not sure why it didn't work!

\0376\0377 - Octal escape sequences to generate Byte Order Marker (Hex FE FE)
Code:
echo '\0376\0377'
þÿ

# 5  
Old 12-05-2012
Exactly How the BOM is encoded in the file depends on whether it is UTF8, UTF16 or UTF32, plus whether the the Text is big endian or little endian.

The BOM is supposed to be at very beginning of the text, hence bipinajith used the ^ to indicate that. What you show as a BOM denotes UTF16 big endian. Is that in fact what you have? Because what you were given by bipiniajith should have worked. That tells me something is not right. Not all BOM's are 0xFEFF.


Code:
Bytes	Encoding Form
00 00 FE FF        UTF-32, big-endian
FF FE 00 00        UTF-32, little-endian
FE FF	                UTF-16, big-endian
FF FE	                UTF-16, little-endian
EF BB BF	        UTF-8

Please enlighten us.
# 6  
Old 12-05-2012
when i check the encoding its UTF-8

Code:
file xyz.csv
xyz.csv: UTF-8 Unicode text, with very long lines

i tried piconv from UTF-8 to ASCII and it does converts <feff> to ?.
then i can grep ? and delete the 1st line.
is that ideal solution?
i wanted something robust. what if file has ? mark somewhere else in the file etc ...
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Bash script - Remove the 3 top level of a full path filename

Hello. Source file are in : /a/b/c/d/e/f/g/some_file Destination is : /d/e where sub-directories "f" and "g" may missing or not. After copying I want /a/b/c/d/e/f/g/file1 in /d/e/f/g/file1 On source /a is top-level directory On destination /d is top-level directory I would like... (2 Replies)
Discussion started by: jcdole
2 Replies

2. Shell Programming and Scripting

Remove top and bottom for each column

Dear All I was wondering if someone could help me in resolving an issue. I have a file like this: column1 column2 2 4 3 5 8 9 0 12 0 0 0 0 9 0 87 0 1 0 1 0 1 0 4 0 (2 Replies)
Discussion started by: giuliangiuseppe
2 Replies

3. Shell Programming and Scripting

Remove top 3 duplicates

hello , I have a requirement with input in below format abc 123 xyz bcd 365 kii abc 987 876 cdf 987 uii abc 456 yuu bcd 654 rrr Expecting Output abc 456 yuu bcd 654 rrr cdf 987 uii (1 Reply)
Discussion started by: Tomlight
1 Replies

4. Shell Programming and Scripting

Request for advise on how to remove control characters in a UNIX file extracted from top command

Hi, Please excuse for posting new thread on control characters, I am facing some difficulties in removing the control character from a file extracted from top command, i am able to see control characters using more command and in vi mode, through cat control characters are not visible ... (8 Replies)
Discussion started by: karthikram
8 Replies

5. Shell Programming and Scripting

Unicode file validation

I don't want HTML_CONTENT,RICH_CONTENT,TEXT_CONTENT columns data in the file and reset of data we need to extract. Find the attached file. Need to extract date in between DI_UX_ROW_END tag. Can help me using unix command using AWK. Thanks, (2 Replies)
Discussion started by: bmk
2 Replies

6. UNIX for Dummies Questions & Answers

Remove Unicode/special chars from XML

Hi, We are receiving an XML file in Unix which has some special characters between tags like '^' etc <Tag> 1e^O7f%<2304e.$d8f57e8^Bf-&e.^Zh7/327e^O7 </Tag> We need to remove all special characters like ^ ones and also any '&' or '<' or '>' being sent within the start and close tags i.e.... (6 Replies)
Discussion started by: dsrookie7
6 Replies

7. UNIX for Dummies Questions & Answers

remove special and unicode characters

Hi, How do I remove the lines where special characters or Unicode characters appear? The following query does work but I wonder if there is a better way. cat test.txt | egrep -v '\)|#|,|&|-|\(|\\|\/|\.' The following lines show that my query is incomplete. Warning: The word "*Khan" is... (1 Reply)
Discussion started by: shantanuo
1 Replies

8. AIX

want to remove some line from top of file.

Hi All, I have AIX 5.3 server. I have one big file. in that i want to remove 5000 line from top. is there any command for this? Thanks, Vishal (6 Replies)
Discussion started by: vishalpatel03
6 Replies

9. Shell Programming and Scripting

grep for a particular pattern and remove few lines above top and bottom of the patter

grep for a particular pattern and remove 5 lines above the pattern and 6 lines below the pattern root@server1 # cat filename Shell Programming and Scripting test1 Shell Programminsada asda dasd asd Shell Programming and Scripting Post New Thread Shell Programming and S sadsa ... (17 Replies)
Discussion started by: fed.linuxgossip
17 Replies

10. Programming

How to display unicode characters / unicode string

I have a stream of characters like "\u8BBE\u5907\u7BA1" and i want to display it. I tried following things already without any luck. 1) printf("%s",L("\u8BBE\u5907\u7BA1")); 2) printf("%lc",0x8BBE); 3) setlocale followed by fwide followed by wprintf 4) also changed the local manually... (3 Replies)
Discussion started by: jackdorso
3 Replies
Login or Register to Ask a Question