05-29-2013
The command locale charmap is telling you (by convention) that the character mapping defining the characters in your current locale is related to ISO standard 8859-1. It says absolutely nothing about what codeset was used to encode text found in any particular file.
If a file only contains ASCII text, the ISO8859-1 and the UTF-8 encoding will be identical. If there are characters in a file with the high order bit set on one or more bytes, there are various heuristics you could try to use to determine if a given file was encoded using a particular codeset, but heuristics that could distinguish between various ISO 8859-* standard encodings would require more knowledge than just the contents of the file. Even determining that a file was encoded using UTF-8 would be impossible unless you know that the file only contains text (i.e., no binary data such as an integer or floating point value has been written into the file without converting it to text first).
The only way to use iconv to reliably convert a file from one codeset to another is to know (independently) what codeset was used when the file was created and what transformations have occurred to that file since then. If the file being converted contains some binary values and some text, you will have to know where the binary data is and just convert the text surrounding the binary data. (You can't do this with iconv, but you could use something like dd to extract the text and binary data into separate files, use iconv to convert the text files, and then create the converted output by putting the converted text files and the binary files back together. Of course, converting from 8859-* to or from UTF-8 can also significantly change the number of bytes needed to represent a string of text. If the data in the file contained binary data specifying the length of some of the text in the file, you would have to also be aware of that and modify the binary portions of the file as well as you reconstruct the output file.)
This User Gave Thanks to Don Cragun For This Post:
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
How can I suppress the 0a (line feed) in a text file (HP UX) (1 Reply)
Discussion started by: hipo
1 Replies
2. Solaris
Hi,
Can any suggest me some utility to convert xhtml file to postscript file format?
Also tell me from where to down load such utility..
With Regards,
Dattatray (0 Replies)
Discussion started by: dattatray.b
0 Replies
3. Programming
I'm new to C, and I'm attempting to write a script similar to the stat command for practice. I only had a problem converting st_mode to an octal permission format. I remembered I had littleutils installed which contains a script called filemode, so I checked the source and it yielded something like... (4 Replies)
Discussion started by: petebear
4 Replies
4. Shell Programming and Scripting
Hi all,
How can i convert a file from one encoding to another?
Lets say I have a file which is of utf-8 encoding and I want to convert it to cp875.
Can anyone tell me how can I achieve this in shell script?
Thanks,
Sridhar (2 Replies)
Discussion started by: sridhar_423
2 Replies
5. Shell Programming and Scripting
Hi Everyone,
Can someone plesae advise on converting the inputted file into required output? First field is L then amount will - symbol.
Sample input file
B|T|SASOFTB00016|20090330|20090330|15000|9089001
B|T|SABH00000012|20090330|20090330|7000|9089003... (4 Replies)
Discussion started by: gehlnar
4 Replies
6. Shell Programming and Scripting
This is the ldf file
dn: sdcsmsisdn=1000000049,sdcsDatabase=subscriberCache,dc=example,dc=com
objectClass: sdcsSubscriber
objectClass: top
postalCode: 29600
sdcsServiceLevel: 10
sdcsCustomerType: 14
givenName: Adelia
sdcsBlackListAll: FALSE
sdcsOwnerType: T-Mobile
sn: Actionteam... (1 Reply)
Discussion started by: LinuxFriend
1 Replies
7. Shell Programming and Scripting
Request if some one could provide me shell script that converts the below "input file" to "CSV format file" given
Name Domain Contact Phone Email Location
----------------------- ------------------------------------------------ ------- ----- ---------------------------------... (7 Replies)
Discussion started by: sreenath1037
7 Replies
8. Shell Programming and Scripting
Input file (each line is separaed by spaces )given below:
Name Domain Contact Phone Email Location
----------------------- ------------------------------------------------ ------- -----... (18 Replies)
Discussion started by: sreenath1037
18 Replies
9. Shell Programming and Scripting
Hi,
I have an excel file in unix and I want a script which changes the excel file to .csv file i.e (comma separated value file).
Would be thankful to you if some one helps me with this.
Thanks in advance. (3 Replies)
Discussion started by: karthikkasarla
3 Replies
10. Linux
I have a .CSV file when I check for the special characters in the file using the command cat -vet filename.csv, i get very lengthy lines with "^@", "^I^@" and "^@^M" characters in between each alphabet in all of the records. Using the code below file filename.csv I get the output as
I have a... (2 Replies)
Discussion started by: dhruuv369
2 Replies
ICONV(1) Linux User Manual ICONV(1)
NAME
iconv - convert text from one character encoding to another
SYNOPSIS
iconv [options] [-f from-encoding] [-t to-encoding] [inputfile]...
DESCRIPTION
The iconv program reads in text in one encoding and outputs the text in another encoding. If no input files are given, or if it is given
as a dash (-), iconv reads from standard input. If no output file is given, iconv writes to standard output.
If no from-encoding is given, the default is derived from the current locale's character encoding. If no to-encoding is given, the default
is derived from the current locale's character encoding.
OPTIONS
-f from-encoding, --from-code=from-encoding
Use from-encoding for input characters.
-t to-encoding, --to-code=to-encoding
Use to-encoding for output characters.
If the string //IGNORE is appended to to-encoding, characters that cannot be converted are discarded and an error is printed after
conversion.
If the string //TRANSLIT is appended to to-encoding, characters being converted are transliterated when needed and possible. This
means that when a character cannot be represented in the target character set, it can be approximated through one or several similar
looking characters. Characters that are outside of the target character set and cannot be transliterated are replaced with a ques-
tion mark (?) in the output.
-l, --list
List all known character set encodings.
-c Silently discard characters that cannot be converted instead of terminating when encountering such characters.
-o outputfile, --output=outputfile
Use outputfile for output.
-s, --silent
This option is ignored; it is provided only for compatibility.
--verbose
Print progress information on standard error when processing multiple files.
-?, --help
Print a usage summary and exit.
--usage
Print a short usage summary and exit.
-V, --version
Print the version number, license, and disclaimer of warranty for iconv.
EXIT STATUS
Zero on success, nonzero on errors.
ENVIRONMENT
Internally, the iconv program uses the iconv(3) function which in turn uses gconv modules (dynamically loaded shared libraries) to convert
to and from a character set. Before calling iconv(3), the iconv program must first allocate a conversion descriptor using iconv_open(3).
The operation of the latter function is influenced by the setting of the GCONV_PATH environment variable:
* If GCONV_PATH is not set, iconv_open(3) loads the system gconv module configuration cache file created by iconvconfig(8) and then, based
on the configuration, loads the gconv modules needed to perform the conversion. If the system gconv module configuration cache file is
not available then the system gconv module configuration file is used.
* If GCONV_PATH is defined (as a colon-separated list of pathnames), the system gconv module configuration cache is not used. Instead,
iconv_open(3) first tries to load the configuration files by searching the directories in GCONV_PATH in order, followed by the system
default gconv module configuration file. If a directory does not contain a gconv module configuration file, any gconv modules that it
may contain are ignored. If a directory contains a gconv module configuration file and it is determined that a module needed for this
conversion is available in the directory, then the needed module is loaded from that directory, the order being such that the first
suitable module found in GCONV_PATH is used. This allows users to use custom modules and even replace system-provided modules by pro-
viding such modules in GCONV_PATH directories.
FILES
/usr/lib/gconv
Usual default gconv module path.
/usr/lib/gconv/gconv-modules
Usual system default gconv module configuration file.
/usr/lib/gconv/gconv-modules.cache
Usual system gconv module configuration cache.
CONFORMING TO
POSIX.1-2001.
EXAMPLE
Convert text from the ISO 8859-15 character encoding to UTF-8:
$ iconv -f ISO-8859-15 -t UTF-8 < input.txt > output.txt
The next example converts from UTF-8 to ASCII, transliterating when possible:
$ echo abc B a EUR ac | iconv -f UTF-8 -t ASCII//TRANSLIT
abc ss ? EUR abc
SEE ALSO
locale(1), iconv(3), nl_langinfo(3), charsets(7), iconvconfig(8)
COLOPHON
This page is part of release 4.15 of the Linux man-pages project. A description of the project, information about reporting bugs, and the
latest version of this page, can be found at https://www.kernel.org/doc/man-pages/.
GNU 2018-02-02 ICONV(1)