|
|||||||
| Forums | Search Forums | Register | Forum Rules | Man Pages | Albums | FAQ | Members | Calendar | Search | Today's Posts | Mark Forums Read |
| UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !! |
|
|
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
Issue with UTF-8 BOM character in text file
Sometimes we recieve some excel files containing French/Japanese characters over the mail, and these files are manually transferred to the server by using SFTP (security is not a huge concern here). The data is changed to text format before transferring it using Notepad. Problem is: When saving the files to our windows machine in UTF-8 format, notepad inserts BOM characters Code:
 at the beginning of the text. ETL tools such as Informatica have no problem reading the files with this character, but unfortunately we validate the data before loading it, and this validation is performed by a shell script. Since the first field of the first line is no longer a valid field, the shell script fails. One solution we tried was removing the BOM characters from the text file in Unix before processing it. This worked fine as far as the shell script was concerned, but then the ETL tool failed to read the UTF characters in the file. My questions: 1. Is there a way to remove this issue at root, i.e. can I find a way to remove the BOM character in notepad, while saving it to UTF-8 format. 2. Can some other tool help me out to make this change instead of notepad... DOS maybe? 3. What are my options in Unix? Is there a way to remove the BOM characters without "breaking" the file in Unix? There must be, because I have seen a lot of UTF files without BOM being processed just fine earlier. I just don't know how to do it. |
| Sponsored Links | ||
|
|
#2
|
|||
|
|||
|
Make a copy of the file with the BOM character removed. Use that to validate the file in UNIX. You don't have to actually save it permanently.
|
| Sponsored Links | ||
|
|
#3
|
|||
|
|||
|
Apologies for not mentioning this....
Validation is not the only thing the script does. It also shuffles the columns around to an order which the ETL tool will understand. |
|
#4
|
|||
|
|||
|
Can you not transfer it using ftp in ASCII mode instead of using a windows app?
Unix tools for windows do exist - they are free. You can install cygwin on your PC or simply download unixtools for windows UnxUtils | Free software downloads at SourceForge.net Cygwin |
| Sponsored Links | |
|
|
#5
|
|||
|
|||
|
Unfortunately downloading+installing tools is not an option (Controlled environment at work means I would have to cut through at least half a dozen people to get something as basic as puTTY installed on my system).
Question regarding your first point: Wouldn't transferring the file in ASCII mode incorrectly transmit the UTF(japanese/spanish) characters? Also, are you suggesting skipping the "copy data from excel - paste to notepad - save to UTF8 format" step? That might again not be possible in my current situation, unless I find a way to convert the data to a proper UTF8 text file without BOM characters using a pre-installed application. |
| Sponsored Links | ||
|
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| post-Adding character for a text file | manas_ranjan | Shell Programming and Scripting | 5 | 11-04-2011 11:28 AM |
| Deleting the last character in a text file | sagarparadkar | Shell Programming and Scripting | 5 | 02-24-2011 04:10 AM |
| read the text file and print the content character by character.. | samupnl | Shell Programming and Scripting | 1 | 06-10-2010 03:03 AM |
| Deleting all instances of a certain character from a text file | guitarscn | UNIX for Dummies Questions & Answers | 1 | 02-18-2010 01:17 PM |
| need to read 3° character from a text file | piltrafa | UNIX for Dummies Questions & Answers | 15 | 07-26-2005 10:19 AM |
|
|