Sponsored Content
Operating Systems AIX File enconding and conversion Post 302814517 by Don Cragun on Wednesday 29th of May 2013 03:15:59 PM
Old 05-29-2013
The command locale charmap is telling you (by convention) that the character mapping defining the characters in your current locale is related to ISO standard 8859-1. It says absolutely nothing about what codeset was used to encode text found in any particular file.

If a file only contains ASCII text, the ISO8859-1 and the UTF-8 encoding will be identical. If there are characters in a file with the high order bit set on one or more bytes, there are various heuristics you could try to use to determine if a given file was encoded using a particular codeset, but heuristics that could distinguish between various ISO 8859-* standard encodings would require more knowledge than just the contents of the file. Even determining that a file was encoded using UTF-8 would be impossible unless you know that the file only contains text (i.e., no binary data such as an integer or floating point value has been written into the file without converting it to text first).

The only way to use iconv to reliably convert a file from one codeset to another is to know (independently) what codeset was used when the file was created and what transformations have occurred to that file since then. If the file being converted contains some binary values and some text, you will have to know where the binary data is and just convert the text surrounding the binary data. (You can't do this with iconv, but you could use something like dd to extract the text and binary data into separate files, use iconv to convert the text files, and then create the converted output by putting the converted text files and the binary files back together. Of course, converting from 8859-* to or from UTF-8 can also significantly change the number of bytes needed to represent a string of text. If the data in the file contained binary data specifying the length of some of the text in the file, you would have to also be aware of that and modify the binary portions of the file as well as you reconstruct the output file.)
This User Gave Thanks to Don Cragun For This Post:
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

file conversion

How can I suppress the 0a (line feed) in a text file (HP UX) (1 Reply)
Discussion started by: hipo
1 Replies

2. Solaris

COnversion utility xhtml file to Postscript file

Hi, Can any suggest me some utility to convert xhtml file to postscript file format? Also tell me from where to down load such utility.. With Regards, Dattatray (0 Replies)
Discussion started by: dattatray.b
0 Replies

3. Programming

C File Permission Conversion

I'm new to C, and I'm attempting to write a script similar to the stat command for practice. I only had a problem converting st_mode to an octal permission format. I remembered I had littleutils installed which contains a script called filemode, so I checked the source and it yielded something like... (4 Replies)
Discussion started by: petebear
4 Replies

4. Shell Programming and Scripting

File Conversion

Hi all, How can i convert a file from one encoding to another? Lets say I have a file which is of utf-8 encoding and I want to convert it to cp875. Can anyone tell me how can I achieve this in shell script? Thanks, Sridhar (2 Replies)
Discussion started by: sridhar_423
2 Replies

5. Shell Programming and Scripting

File conversion

Hi Everyone, Can someone plesae advise on converting the inputted file into required output? First field is L then amount will - symbol. Sample input file B|T|SASOFTB00016|20090330|20090330|15000|9089001 B|T|SABH00000012|20090330|20090330|7000|9089003... (4 Replies)
Discussion started by: gehlnar
4 Replies

6. Shell Programming and Scripting

shell or perl script needed for ldif file to text file conversion

This is the ldf file dn: sdcsmsisdn=1000000049,sdcsDatabase=subscriberCache,dc=example,dc=com objectClass: sdcsSubscriber objectClass: top postalCode: 29600 sdcsServiceLevel: 10 sdcsCustomerType: 14 givenName: Adelia sdcsBlackListAll: FALSE sdcsOwnerType: T-Mobile sn: Actionteam... (1 Reply)
Discussion started by: LinuxFriend
1 Replies

7. Shell Programming and Scripting

Conversion of below Tabs Tex file into CSV format file : shell script needed

Request if some one could provide me shell script that converts the below "input file" to "CSV format file" given Name Domain Contact Phone Email Location ----------------------- ------------------------------------------------ ------- ----- ---------------------------------... (7 Replies)
Discussion started by: sreenath1037
7 Replies

8. Shell Programming and Scripting

Conversion of spaces Text file into CSV format file

Input file (each line is separaed by spaces )given below: Name Domain Contact Phone Email Location ----------------------- ------------------------------------------------ ------- -----... (18 Replies)
Discussion started by: sreenath1037
18 Replies

9. Shell Programming and Scripting

file conversion

Hi, I have an excel file in unix and I want a script which changes the excel file to .csv file i.e (comma separated value file). Would be thankful to you if some one helps me with this. Thanks in advance. (3 Replies)
Discussion started by: karthikkasarla
3 Replies

10. Linux

File conversion and removing special characters from a file in Linux

I have a .CSV file when I check for the special characters in the file using the command cat -vet filename.csv, i get very lengthy lines with "^@", "^I^@" and "^@^M" characters in between each alphabet in all of the records. Using the code below file filename.csv I get the output as I have a... (2 Replies)
Discussion started by: dhruuv369
2 Replies
ICONV(3)								 1								  ICONV(3)

iconv - Convert string to requested character encoding

SYNOPSIS
string iconv (string $in_charset, string $out_charset, string $str) DESCRIPTION
Performs a character set conversion on the string $str from $in_charset to $out_charset. PARAMETERS
o $in_charset - The input charset. o $out_charset - The output charset. If you append the string //TRANSLIT to $out_charset transliteration is activated. This means that when a character can't be represented in the target charset, it can be approximated through one or several similarly looking characters. If you append the string //IGNORE, characters that cannot be represented in the target charset are silently discarded. Otherwise, $str is cut from the first illegal character and an E_NOTICE is generated. o $str - The string to be converted. RETURN VALUES
Returns the converted string or FALSE on failure. EXAMPLES
Example #1 iconv(3) example <?php $text = "This is the Euro symbol 'EUR'."; echo 'Original : ', $text, PHP_EOL; echo 'TRANSLIT : ', iconv("UTF-8", "ISO-8859-1//TRANSLIT", $text), PHP_EOL; echo 'IGNORE : ', iconv("UTF-8", "ISO-8859-1//IGNORE", $text), PHP_EOL; echo 'Plain : ', iconv("UTF-8", "ISO-8859-1", $text), PHP_EOL; ?> The above example will output something similar to: Original : This is the Euro symbol 'EUR'. TRANSLIT : This is the Euro symbol 'EUR'. IGNORE : This is the Euro symbol ''. Plain : Notice: iconv(): Detected an illegal character in input string in .iconv-example.php on line 7 This is the Euro symbol ' PHP Documentation Group ICONV(3)
All times are GMT -4. The time now is 10:06 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy