Sponsored Content
Full Discussion: How to know file encoding?
Top Forums Shell Programming and Scripting How to know file encoding? Post 303038018 by rdrtx1 on Wednesday 21st of August 2019 11:30:04 AM
Old 08-21-2019
ASCII is a subset of UTF-8. But if there is a need to transliterate, try something like:
Code:
iconv -f UTF-8 -t ASCII//TRANSLIT < input_file


Last edited by rdrtx1; 02-18-2020 at 08:03 PM..
These 2 Users Gave Thanks to rdrtx1 For This Post:
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

File encoding in Unix

1. I have a shell script which creates a file using cat command. How can i find what encoding the file follows (e.g. UTF8, ANSI)? 2. I want to convert that file to PC-ANSI format. How can i achieve that? I am using HP-Unix. (6 Replies)
Discussion started by: ssmallya
6 Replies

2. Shell Programming and Scripting

Delete original wav file if lame was successful encoding.

In a bash script: src=”cooltrack.wav” dst=”cooltrack.mp3” lame $src $dst I would like to add some line that would delete the source wav file like: rm $src but I would like this only if the encoding was successful. What should I include before deleting the original to check that the... (2 Replies)
Discussion started by: Aia
2 Replies

3. Shell Programming and Scripting

get the file encoding

Hello! The system is AIX 5.3 Give please command or script to get the file encoding Thanks (2 Replies)
Discussion started by: vinment
2 Replies

4. AIX

get the file encoding

Hello! The system is AIX 5.3 Give please command or script to get the file encoding (1 Reply)
Discussion started by: vinment
1 Replies

5. Shell Programming and Scripting

Cygwin vi XML file encoding problem

Hi, I have got a zip (binary) file transferred from MacOS (thus it has additional __MACOSX directory packed inside). On extracting this zip, there are few *.xml files available. When I opened this *.xml file in vim editor using Cygwin (on windows) the editor displayed in the bottom. I tried... (4 Replies)
Discussion started by: royalibrahim
4 Replies

6. HP-UX

how to find the character encoding of a file in hp_ux

how to find the character encoding of a file in hp_ux (1 Reply)
Discussion started by: alokjyotibal
1 Replies

7. Shell Programming and Scripting

How to find the file encoding and updating the file encoding?

Hi, I am beginner to Unix. My requirement is to validate the encoding used in the incoming file(csv,txt).If it is encoded with UTF-8 format,then the file should remain as such otherwise i need to chnage the encoding to UTF-8. Please advice me how to proceed on this. (7 Replies)
Discussion started by: cnraja
7 Replies

8. UNIX for Dummies Questions & Answers

Determing the encoding of a file

Hi, I am trying to determine the encoding for the file, because to convert to UTF-8, it seems as though I have to know the encoding of the source. Tried this file <filename> give me this: <filename>:data or International Language text Tried to see the locale and this is the output:... (6 Replies)
Discussion started by: MIA651
6 Replies

9. UNIX for Advanced & Expert Users

ISO 88591 file encoding charset in Linux

Hello Experts, please help to provide any insight as I am facing issue migrating java application from hpux to redhat. The java program is using InputStreamReader to read a file without specifying any charset parameter. However, in new Linux Redhat 5.6 environent, when reading a file that... (1 Reply)
Discussion started by: sonic_air
1 Replies

10. Solaris

View file encoding then change encoding.

Hi all!! I´m using command file -i myfile.xml to validate XML file encoding, but it is just saying regular file . I´m expecting / looking an output as UTF8 or ANSI / ASCII Is there command to display the files encoding? Thank you! (2 Replies)
Discussion started by: mrreds
2 Replies
UTF(6)								   Games Manual 							    UTF(6)

NAME
UTF, Unicode, ASCII, rune - character set and format DESCRIPTION
The Plan 9 character set and representation are based on the Unicode Standard and on the ISO multibyte UTF-8 encoding (Universal Character Set Transformation Format, 8 bits wide). The Unicode Standard represents its characters in 16 bits; UTF-8 represents such values in an 8-bit byte stream. Throughout this manual, UTF-8 is shortened to UTF. In Plan 9, a rune is a 16-bit quantity representing a Unicode character. Internally, programs may store characters as runes. However, any external manifestation of textual information, in files or at the interface between programs, uses a machine-independent, byte-stream encoding called UTF. UTF is designed so the 7-bit ASCII set (values hexadecimal 00 to 7F), appear only as themselves in the encoding. Runes with values above 7F appear as sequences of two or more bytes with values only from 80 to FF. The UTF encoding of the Unicode Standard is backward compatible with ASCII: programs presented only with ASCII work on Plan 9 even if not written to deal with UTF, as do programs that deal with uninterpreted byte streams. However, programs that perform semantic processing on ASCII graphic characters must convert from UTF to runes in order to work properly with non-ASCII input. See rune(2). Letting numbers be binary, a rune x is converted to a multibyte UTF sequence as follows: 01. x in [00000000.0bbbbbbb] -> 0bbbbbbb 10. x in [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb 11. x in [bbbbbbbb.bbbbbbbb] -> 1110bbbb, 10bbbbbb, 10bbbbbb Conversion 01 provides a one-byte sequence that spans the ASCII character set in a compatible way. Conversions 10 and 11 represent higher- valued characters as sequences of two or three bytes with the high bit set. Plan 9 does not support the 4, 5, and 6 byte sequences pro- posed by X-Open. When there are multiple ways to encode a value, for example rune 0, the shortest encoding is used. In the inverse mapping, any sequence except those described above is incorrect and is converted to rune hexadecimal 0080. FILES
/lib/unicode table of characters and descriptions, suitable for look(1). SEE ALSO
ascii(1), tcs(1), rune(2), keyboard(6), The Unicode Standard. UTF(6)
All times are GMT -4. The time now is 11:55 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy