Change encoding, no removing special chars. inconv


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Change encoding, no removing special chars. inconv
# 1  
Old 01-13-2018
Lightbulb Change encoding, no removing special chars. inconv

Hi all,

I'm using
Code:
iconv

command to change files encoding to UTF-8

If my input file has chars as
Quote:
í, ó,
those are removed creating the file without those special chars.

I tried using
Code:
iconv -c

, but there is still the removal.

Is there a way to keep those special chars changing just the Encoding?

The final goal is to implement a script changing Encoding when files are not UTF-8

Thank you all!!

Last edited by mrreds; 01-13-2018 at 04:34 PM.. Reason: Adding Details
# 2  
Old 01-13-2018
Characters that don't exist in the target char set are difficult to convert. The -c option would not necessarily help as it just silently deletes inconvertible chars.
Not sure what your OS / shell / iconv versions are. Does the latter offer this option (man iconv)
Quote:
-t to-encoding, --to-code=to-encoding
Use to-encoding for output characters.
. . .
If the string //TRANSLIT is appended to to-encoding, characters being converted are transliterated when needed and possible. This means that when a character cannot be represented in the target character set, it can be approximated through one or several similar looking characters. Characters that are outside of the target character set and cannot be transliterated are replaced with a question mark (?) in the output.
? Would his come close to what you need?

Last edited by RudiC; 01-14-2018 at 09:55 AM..
This User Gave Thanks to RudiC For This Post:
# 3  
Old 01-13-2018
Hi,

I'm thinking that perhaps there is no direct or equivalent character to translate these characters to in your destination character set, and so that's why they're being dropped, maybe ?

Some testing of my own. Firstly, all I did here was copy and paste the string you provided:

Code:
$ cat test
í, ó,
$ file test
test: UTF-8 Unicode text
$

and it was picked up as UTF-8, as you can see. Full disclosure: this was on a Slackware Linux 14.2 system.

So here's what happens when I try converting this to ASCII, and as mentioned I think it fails since these characters simply don't exist in any way in normal ASCII:

Code:
$ iconv -f=utf8 -t=ascii -o new.txt test.txt
iconv: illegal input sequence at position 0
$

However, if I tell iconv to transliterate only what it can, and drop what it can't, things seem to work, although I end up with question marks in the output (since there's nothing to transliterate to):

Code:
$ iconv -f=utf8 -t=ascii//TRANSLIT -o new.txt test.txt
$ cat new.txt
?, ?,
$

So I think that's the issue: they're being dropped or giving errors because there isn't anything in your destination character set that iconv regards as an acceptable replacement.

Hope this helps.
# 4  
Old 01-13-2018
Thank you RudiC, drysdalk!

Quote:
SunOS 5.11
Quote:
file
command is just displaying:
Quote:
XML document
I need to convert any encoding to UTF8.

A customer is sending me files not having UTF8 (seems ANSI), I just need to assign UTF8 encoding to all files coming to my system.
# 5  
Old 01-14-2018
I don't know an ANSI char set but would be surprised if it contained codes that UTF-8 could not represent. Should you mean "ASCII", chars í, ó will NOT exist in that source char set; mayhap in what is called "extended ASCII". Howsoever, Your problem now seems a bit strange to me...
# 6  
Old 01-14-2018
You need to figure out whether the file you are trying to convert from is encoded in ISO 8859-1, ISO 8859-15, Windows 1252, or some other codeset. All three of the ones listed here have the lower 128 characters with the same encodings as US ASCII and all of them contain the í and ó characters, but I'm not sure if they are encoded the same way in the three listed codesets. The only way iconv can work correctly is if you correctly tell it in what codeset the file it is reading is encoded and tell it to what codeset you want the output file to be written.
# 7  
Old 01-15-2018
Quote:
Originally Posted by mrreds
A customer is sending me files not having UTF8 (seems ANSI)
Problem solved then, as ANSI can be used in UTF-8 directly without conversion.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Shell script to split data with a delimiter having chars and special chars

Hi Team, I have a file a1.txt with data as follows. dfjakjf...asdfkasj</EnableQuotedIDs><SQL><SelectStatement modified='1' type='string'><! The delimiter string: <SelectStatement modified='1' type='string'><! dlm="<SelectStatement modified='1' type='string'><! The above command is... (7 Replies)
Discussion started by: kmanivan82
7 Replies

2. Solaris

View file encoding then change encoding.

Hi all!! I´m using command file -i myfile.xml to validate XML file encoding, but it is just saying regular file . I´m expecting / looking an output as UTF8 or ANSI / ASCII Is there command to display the files encoding? Thank you! (2 Replies)
Discussion started by: mrreds
2 Replies

3. UNIX for Advanced & Expert Users

Removing special chars from file and maintain field separator

Running SunOs 5.6. Solaris. I've been able to remove all special characters from a fixed length file which appear in the first column but as a result all subsequent columns have shifted to the left by the amount of characters deleted. It is a space separated file. Line 1 in input file is... (6 Replies)
Discussion started by: iffy290
6 Replies

4. UNIX for Dummies Questions & Answers

How to search for a string with special chars?

Hi guys, I am trying to find the following string in a file, but I always get pattern not found error, not sure what is missing here. Can you help please? I do a less to open the xrates.log and then do a /'="18"' in the file and tried various combinations to search the below string. String... (8 Replies)
Discussion started by: santokal
8 Replies

5. Shell Programming and Scripting

All strings within two special chars

I have a file with multiple lines. From each line I want to get all strings that starts with '+' and ends with '/'. Then I want the strings to be separated by ' + ' Example input: +$A$/NOUN+At/NSUFF_FEM_PL+K/CASE_INDEF_ACC Sample output: $A$ + At + K (20 Replies)
Discussion started by: Viernes
20 Replies

6. Shell Programming and Scripting

print all between patterns with special chars

Hi, I'm having trouble with awk print all characters between 2 patterns. I tried more then one solution found on this forum but with no success. Probably my mistakes are due to the special characters "" and "]"in the search patterns. Well, have a log file like this: logfile.txt ... (3 Replies)
Discussion started by: ginolatino
3 Replies

7. UNIX for Dummies Questions & Answers

Strings with Special chars in IF condition

I was trying to run a code to check if a fax number is empty or not. for that, I've written the following code which is throwing an error. #!/bin/ksh fax= "999-999-9999" if ; then fax_no="000-000-0000" else fax_no=$fax fi echo $fax_no And I get the... (7 Replies)
Discussion started by: hooaamai
7 Replies

8. Shell Programming and Scripting

special chars arrangement in code

here is my simple script to show process and owners except me: ps `-ef |grep xterm |grep -v aucar` | while read a1 a2 a3 a4 a5 a6 a7 a8 do echo KILL..\($a1\).. $a2 |more done how can I pass values from command "ps -ef |grep xterm|grep -v aucar" to ? because above command... (2 Replies)
Discussion started by: xramm
2 Replies

9. Shell Programming and Scripting

treating special chars

Hi, I need some advise on treating non printable chars over ascii value 126 Case 1 : On some fields in the text , I need to retiain then 'as-is' and load to a database.I understand it also depends on database codepage. but i just wanna know how do i ensure it do not change while loading... (1 Reply)
Discussion started by: braindrain
1 Replies

10. UNIX for Advanced & Expert Users

Supress special chars in vi

Hi, One of our application is producing log files. But if we open the log file in vi or less or view mode, it shows all the special characters in it. The 'cat' shows correctly but it shows only last page. If I do 'cat' <file_name> | more, then again it shows special characters. ... (1 Reply)
Discussion started by: divakarp
1 Replies
Login or Register to Ask a Question