Unix/Linux Go Back    


UNIX for Beginners Questions & Answers If you're not sure where to post a Unix or Linux question, post it here. All unix and Linux beginners welcome in this forum!

Change encoding, no removing special chars. inconv

UNIX for Beginners Questions & Answers


Tags
ansi, file, iconv, special chars, utf-8 character replacement

Reply    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 1 Week Ago   -   Original Discussion by mrreds
mrreds's Unix or Linux Image
mrreds mrreds is offline
Registered User
 
Join Date: Jan 2011
Last Activity: 14 January 2018, 4:02 PM EST
Posts: 14
Thanks: 3
Thanked 0 Times in 0 Posts
Lightbulb Change encoding, no removing special chars. inconv

Hi all,

I'm using

Code:
iconv

command to change files encoding to UTF-8

If my input file has chars as
Quote:
, ,
those are removed creating the file without those special chars.

I tried using

Code:
iconv -c

, but there is still the removal.

Is there a way to keep those special chars changing just the Encoding?

The final goal is to implement a script changing Encoding when files are not UTF-8

Thank you all!!

Last edited by mrreds; 1 Week Ago at 04:34 PM.. Reason: Adding Details
Sponsored Links
    #2  
Old Unix and Linux 1 Week Ago   -   Original Discussion by mrreds
RudiC's Unix or Linux Image
RudiC RudiC is online now Forum Staff  
Moderator
 
Join Date: Jul 2012
Last Activity: 23 January 2018, 5:16 AM EST
Location: Aachen, Germany
Posts: 11,972
Thanks: 354
Thanked 3,688 Times in 3,386 Posts
Characters that don't exist in the target char set are difficult to convert. The -c option would not necessarily help as it just silently deletes inconvertible chars.
Not sure what your OS / shell / iconv versions are. Does the latter offer this option (man iconv)
Quote:
-t to-encoding, --to-code=to-encoding
Use to-encoding for output characters.
. . .
If the string //TRANSLIT is appended to to-encoding, characters being converted are transliterated when needed and possible. This means that when a character cannot be represented in the target character set, it can be approximated through one or several similar looking characters. Characters that are outside of the target character set and cannot be transliterated are replaced with a question mark (?) in the output.
? Would his come close to what you need?

Last edited by RudiC; 1 Week Ago at 09:55 AM..
The Following User Says Thank You to RudiC For This Useful Post:
mrreds (1 Week Ago)
Sponsored Links
    #3  
Old Unix and Linux 1 Week Ago   -   Original Discussion by mrreds
drysdalk's Unix or Linux Image
drysdalk drysdalk is offline
Registered User
 
Join Date: Feb 2017
Last Activity: 23 January 2018, 3:13 AM EST
Location: United Kingdom
Posts: 230
Thanks: 12
Thanked 74 Times in 68 Posts
Hi,

I'm thinking that perhaps there is no direct or equivalent character to translate these characters to in your destination character set, and so that's why they're being dropped, maybe ?

Some testing of my own. Firstly, all I did here was copy and paste the string you provided:



Code:
$ cat test
, ,
$ file test
test: UTF-8 Unicode text
$

and it was picked up as UTF-8, as you can see. Full disclosure: this was on a Slackware Linux 14.2 system.

So here's what happens when I try converting this to ASCII, and as mentioned I think it fails since these characters simply don't exist in any way in normal ASCII:



Code:
$ iconv -f=utf8 -t=ascii -o new.txt test.txt
iconv: illegal input sequence at position 0
$

However, if I tell iconv to transliterate only what it can, and drop what it can't, things seem to work, although I end up with question marks in the output (since there's nothing to transliterate to):



Code:
$ iconv -f=utf8 -t=ascii//TRANSLIT -o new.txt test.txt
$ cat new.txt
?, ?,
$

So I think that's the issue: they're being dropped or giving errors because there isn't anything in your destination character set that iconv regards as an acceptable replacement.

Hope this helps.
    #4  
Old Unix and Linux 1 Week Ago   -   Original Discussion by mrreds
mrreds's Unix or Linux Image
mrreds mrreds is offline
Registered User
 
Join Date: Jan 2011
Last Activity: 14 January 2018, 4:02 PM EST
Posts: 14
Thanks: 3
Thanked 0 Times in 0 Posts
Thank you RudiC, drysdalk!

Quote:
SunOS 5.11
Quote:
file
command is just displaying:
Quote:
XML document
I need to convert any encoding to UTF8.

A customer is sending me files not having UTF8 (seems ANSI), I just need to assign UTF8 encoding to all files coming to my system.
Sponsored Links
    #5  
Old Unix and Linux 1 Week Ago   -   Original Discussion by mrreds
RudiC's Unix or Linux Image
RudiC RudiC is online now Forum Staff  
Moderator
 
Join Date: Jul 2012
Last Activity: 23 January 2018, 5:16 AM EST
Location: Aachen, Germany
Posts: 11,972
Thanks: 354
Thanked 3,688 Times in 3,386 Posts
I don't know an ANSI char set but would be surprised if it contained codes that UTF-8 could not represent. Should you mean "ASCII", chars , will NOT exist in that source char set; mayhap in what is called "extended ASCII". Howsoever, Your problem now seems a bit strange to me...
Sponsored Links
    #6  
Old Unix and Linux 1 Week Ago   -   Original Discussion by mrreds
Don Cragun's Unix or Linux Image
Don Cragun Don Cragun is online now Forum Staff  
Administrator
 
Join Date: Jul 2012
Last Activity: 23 January 2018, 5:15 AM EST
Location: San Jose, CA, USA
Posts: 10,945
Thanks: 611
Thanked 3,824 Times in 3,268 Posts
You need to figure out whether the file you are trying to convert from is encoded in ISO 8859-1, ISO 8859-15, Windows 1252, or some other codeset. All three of the ones listed here have the lower 128 characters with the same encodings as US ASCII and all of them contain the and characters, but I'm not sure if they are encoded the same way in the three listed codesets. The only way iconv can work correctly is if you correctly tell it in what codeset the file it is reading is encoded and tell it to what codeset you want the output file to be written.
Sponsored Links
    #7  
Old Unix and Linux 1 Week Ago   -   Original Discussion by mrreds
Corona688's Unix or Linux Image
Corona688 Corona688 is offline Forum Staff  
Mead Rotor
 
Join Date: Aug 2005
Last Activity: 22 January 2018, 1:38 PM EST
Location: Saskatchewan
Posts: 22,574
Thanks: 1,163
Thanked 4,293 Times in 3,961 Posts
Quote:
Originally Posted by mrreds View Post
A customer is sending me files not having UTF8 (seems ANSI)
Problem solved then, as ANSI can be used in UTF-8 directly without conversion.
Sponsored Links
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
View file encoding then change encoding. mrreds Solaris 2 1 Week Ago 04:19 PM
Removing special chars from file and maintain field separator iffy290 UNIX for Advanced & Expert Users 6 04-21-2015 12:10 PM
All strings within two special chars Viernes Shell Programming and Scripting 20 01-15-2013 02:51 PM
treating special chars braindrain Shell Programming and Scripting 1 06-02-2007 01:52 PM
Supress special chars in vi divakarp UNIX for Advanced & Expert Users 1 01-26-2005 09:09 PM



All times are GMT -4. The time now is 06:17 AM.