Sponsored Content
Full Discussion: Writing umlauts to a file
Top Forums Shell Programming and Scripting Writing umlauts to a file Post 303023689 by API on Friday 21st of September 2018 03:18:24 AM
Old 09-21-2018
Writing umlauts to a file

Hello all,


I have a strange Problem with writing umlauts like (ä, ü) to a file, which has an ISO-8859-1 Encoding.


My Shell-script is reading a file. The Encoding differs. Sometimes US-ASCII, UTF-8, ISO-8859-1. Then a I have to replace all "{" with a "ä".
I am reading the file line by line and do it with a sed on each line. Then I write the corrected line with an echo to a new file.


When the file is ready, within the hex Editor I can see, that the "ä" is represented as a "c3 a4" - thats an UTF-8 Encoding. What I Need is an ISO-8859 Encoding - a "e4".


Thats my code:


Code:
#!/bin/bash


ConvTmpFile=$1.out
rm -f $ConvTmpFile
while read line
do
  echo "$line" | sed 's/{/\ä/g' >> $ConvTmpFile
done < $1


My env-variables are as follows:


LC_ALL=en_US.UTF-8
LANG=en_US.UTF-8


  • Is it possible to force to write an ISO-8859-1 encoded file?
  • How do you would handle the various encoded files for reading? Should I convert them first with "iconv" to ISO-8859-1?
CU,
API
This User Gave Thanks to API For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

writing to a file

Hi All. I have the following simple shell program. It reads a number from the "/user/amit/bldno"; for example: file "bldno" contains value "100" After execution of the program the content should change to 101. --------- #!/usr/bin/tcsh V= `cat /user/amit/bldno` echo $V `rm -rf ... (1 Reply)
Discussion started by: amitrajvarma
1 Replies

2. UNIX for Dummies Questions & Answers

Problem writing file path to txt file

if test -z "$1" then echo "you must give a filename or filepath" else path=`dirname $1` f_name =`basename $1` if path="." then path=`pwd` fi fi cat $f_name $path >> index.txt The only problem I am encountering with this is writing $path to index.txt Keeps going gaga: cat:... (1 Reply)
Discussion started by: Vintage_hegoog
1 Replies

3. Programming

Writing a file in C

Hi All I am new to C and trying to write a code to get a file as an output. My text file should look like: <var1>tab<var2>tab<var3>...upto the elements in an array <varb1>tab<varb2>tab<varb3>...upto the elements in an array Can someone please guide me how to write the code or a sample... (3 Replies)
Discussion started by: amitsinha
3 Replies

4. Shell Programming and Scripting

Searching for Log / Bad file and Reading and writing to a flat file

Need to develop a unix shell script for the below requirement and I need your assistance: 1) search for file.log and file.bad file in a directory and read them 2) pull out "Load_Start_Time", "Data_File_Name", "Error_Type" from log file 4) concatinate each row from bad file as... (3 Replies)
Discussion started by: mlpathir
3 Replies

5. Shell Programming and Scripting

Comparing rows in same file and writing the result in new file

Help needed... Can you tell me how to compare the last two couple entries in a file and print their result in new file..:confused: I have one file Check1.txt \abc1 12345 \abc2 12327 \abc1 12345 \abc2 12330 I want to compare the entries in Check1 and write to... (1 Reply)
Discussion started by: kichu
1 Replies

6. Shell Programming and Scripting

Writing file name and date from LS command into a file to be imported into mysql

I am looking to do a ls on a folder and have the output of the ls be structured so that is is modificaiton date, file name with the date in a format that is compatible with mysql. I am trying to build a table that stores the last modification date of certain files so I can display it on some web... (4 Replies)
Discussion started by: personalt
4 Replies

7. Shell Programming and Scripting

reading a file extracting information writing to a file

Hi I am trying to extract information out of a file but keep getting grep cant open errors the code is below: #bash #extract orders with blank address details # # obtain the current date # set today to the current date ccyymmdd format today=`date +%c%m%d | cut -c24-31` echo... (8 Replies)
Discussion started by: Bruble
8 Replies

8. UNIX for Dummies Questions & Answers

Need help in not fetching a file while file writing operation is not completed

Hi All, We have a Unix program in oracle when we run the program this connects to specified ftp and will get the file into local server. We are facing a problem like when file writing operations is not completed, this program is getting the incomplete file. Could anyone please help me... (2 Replies)
Discussion started by: world.apps
2 Replies

9. UNIX for Dummies Questions & Answers

Writing a script that will take the first line from each file and store it in an output file

Hi, I have 1000 files names data1.txt through data1000.txt inside a folder. I want to write a script that will take each first line from the files and write them as output into a new file. How do I go about doing that? Thanks! (2 Replies)
Discussion started by: evelibertine
2 Replies

10. Shell Programming and Scripting

awk - writing matching pattern to a new file and deleting it from the current file

Hello , I have comma delimited file with over 20 fileds that i need to do some validations on. I have to check if certain fields are null and then write the line containing the null field into a new file and then delete the line from the current file. Can someone tell me how i could go... (2 Replies)
Discussion started by: goddevil
2 Replies
iconv_unicode(5)					Standards, Environments, and Macros					  iconv_unicode(5)

NAME
iconv_unicode - code set conversion tables for Unicode DESCRIPTION
The following code set conversions are supported: CODE SET CONVERSIONS SUPPORTED ------------------------------ FROM Code Set TO Code Set Code FROM Target Code TO Filename Filename Element Element ISO 8859-1 (Latin 1) 8859-1 UTF-8 UTF-8 ISO 8859-2 (Latin 2) 8859-2 UTF-8 UTF-8 ISO 8859-3 (Latin 3) 8859-3 UTF-8 UTF-8 ISO 8859-4 (Latin 4) 8859-4 UTF-8 UTF-8 ISO 8859-5 (Cyrillic) 8859-5 UTF-8 UTF-8 ISO 8859-6 (Arabic) 8859-6 UTF-8 UTF-8 ISO 8859-7 (Greek) 8859-7 UTF-8 UTF-8 ISO 8859-8 (Hebrew) 8859-8 UTF-8 UTF-8 ISO 8859-9 (Latin 5) 8859-9 UTF-8 UTF-8 ISO 8859-10 (Latin 6) 8859-10 UTF-8 UTF-8 Japanese EUC eucJP UTF-8 UTF-8 Chinese/PRC EUC (GB 2312-1980) gb2312 UTF-8 UTF-8 ISO-2022 iso2022 UTF-8 UTF-8 Korean EUC ko_KR-euc Korean UTF-8 ko_KR-UTF-8 ISO-2022-KR ko_KR-iso2022-7 Korean UTF-8 ko_KR_UTF-8 Korean Johap (KS C 5601-1987) ko_KR-johap Korean UTF-8 ko_KR-UTF-8 Korean Johap (KS C 5601-1992) ko_KR-johap92 Korean UTF-8 ko_KR-UTF-8 Korean UTF-8 ko_KR-UTF-8 Korean EUC ko_KR-euc Korean UTF-8 ko_KR-UTF-8 Korean Johap ko_KR-johap (KS C 5601-1987) Korean UTF-8 ko_KR-UTF-8 Korean Johap ko_KR-johap92 (KS C 5601-1992) KOI8-R (Cyrillic) KOI8-R UCS-2 UCS-2 KOI8-R (Cyrillic) KOI8-R UTF-8 UTF-8 PC Kanji (SJIS) PCK UTF-8 UTF-8 PC Kanji (SJIS) SJIS UTF-8 UTF-8 UCS-2 UCS-2 KOI8-R (Cyrillic) KOI8-R UCS-2 UCS-2 UCS-4 UCS-4 CODE SET CONVERSIONS SUPPORTED ------------------------------ FROM Code Set TO Code Set Code FROM Target Code TO Filename Filename Element Element UCS-2 UCS-2 UTF-7 UTF-7 UCS-2 UCS-2 UTF-8 UTF-8 UCS-4 UCS-4 UCS-2 UCS-2 UCS-4 UCS-4 UTF-16 UTF-16 UCS-4 UCS-4 UTF-7 UTF-7 UCS-4 UCS-4 UTF-8 UTF-8 UTF-16 UTF-16 UCS-4 UCS-4 UTF-16 UTF-16 UTF-8 UTF-8 UTF-7 UTF-7 UCS-2 UCS-2 UTF-7 UTF-7 UCS-4 UCS-4 UTF-7 UTF-7 UTF-8 UTF-8 UTF-8 UTF-8 ISO 8859-1 (Latin 1) 8859-1 UTF-8 UTF-8 ISO 8859-2 (Latin 2) 8859-2 UTF-8 UTF-8 ISO 8859-3 (Latin 3) 8859-3 UTF-8 UTF-8 ISO 8859-4 (Latin 4) 8859-4 UTF-8 UTF-8 ISO 8859-5 (Cyrillic) 8859-5 UTF-8 UTF-8 ISO 8859-6 (Arabic) 8859-6 UTF-8 UTF-8 ISO 8859-7 (Greek) 8859-7 UTF-8 UTF-8 ISO 8859-8 (Hebrew) 8859-8 UTF-8 UTF-8 ISO 8859-9 (Latin 5) 8859-9 UTF-8 UTF-8 ISO 8859-10 (Latin 6) 8859-10 UTF-8 UTF-8 Japanese EUC eucJP UTF-8 UTF-8 Chinese/PRC EUC gb2312 (GB 2312-1980) UTF-8 UTF-8 ISO-2022 iso2022 UTF-8 UTF-8 KOI8-R (Cyrillic) KOI8-R UTF-8 UTF-8 PC Kanji (SJIS) PCK UTF-8 UTF-8 PC Kanji (SJIS) SJIS UTF-8 UTF-8 UCS-2 UCS-2 UTF-8 UTF-8 UCS-4 UCS-4 UTF-8 UTF-8 UTF-16 UTF-16 UTF-8 UTF-8 UTF-7 UTF-7 UTF-8 UTF-8 Chinese/PRC EUC zh_CN.euc (GB 2312-1980) CODE SET CONVERSIONS SUPPORTED ------------------------------ FROM Code Set TO Code Set Code FROM Target Code TO Filename Filename Element Element UTF-8 UTF-8 ISO 2022-CN zh_CN.iso2022-7 UTF-8 UTF-8 Chinese/Taiwan Big5 zh_TW-big5 UTF-8 UTF-8 Chinese/Taiwan EUC zh_TW-euc (CNS 11643-1992) UTF-8 UTF-8 ISO 2022-TW zh_TW-iso2022-7 Chinese/PRC EUC zh_CN.euc UTF-8 UTF-8 (GB 2312-1980) ISO 2022-CN zh_CN.iso2022-7 UTF-8 UTF-8 Chinese/Taiwan Big5 zh_TW-big5 UTF-8 UTF-8 Chinese/Taiwan EUC zh_TW-euc UTF-8 UTF-8 (CNS 11643-1992) ISO 2022-TW zh_TW-iso2022-7 UTF-8 UTF-8 EXAMPLES
Example 1 The library module filename In the conversion library, /usr/lib/iconv (see iconv(3C)), the library module filename is composed of two symbolic elements separated by the percent sign (%). The first symbol specifies the code set that is being converted; the second symbol specifies the target code, that is, the code set to which the first one is being converted. In the conversion table above, the first symbol is termed the "FROM Filename Element". The second symbol, representing the target code set, is the "TO Filename Element". For example, the library module filename to convert from the Korean EUC code set to the Korean UTF-8 code set is ko_KR-euc%ko_KR-UTF-8 FILES
/usr/lib/iconv/*.so conversion modules SEE ALSO
iconv(1), iconv(3C), iconv(5) Chernov, A., Registration of a Cyrillic Character Set, RFC 1489, RELCOM Development Team, July 1993. Chon, K., H. Je Park, and U. Choi, Korean Character Encoding for Internet Messages, RFC 1557, Solvit Chosun Media, December 1993. Goldsmith, D., and M. Davis, UTF-7 - A Mail-Safe Transformation Format of Unicode, RFC 1642, Taligent, Inc., July 1994. Lee, F., HZ - A Data Format for Exchanging Files of Arbitrarily Mixed Chinese and ASCII characters, RFC 1843, Stanford University, August 1995. Murai, J., M. Crispin, and E. van der Poel, Japanese Character Encoding for Internet Messages, RFC 1468, Keio University, Panda Program- ming, June 1993. Nussbacher, H., and Y. Bourvine, Hebrew Character Encoding for Internet Messages, RFC 1555, Israeli Inter-University, Hebrew University, December 1993. Ohta, M., Character Sets ISO-10646 and ISO-10646-J-1, RFC 1815, Tokyo Institute of Technology, July 1995. Ohta, M., and K. Handa, ISO-2022-JP-2: Multilingual Extension of ISO-2022-JP, RFC 1554, Tokyo Institute of Technology, December 1993. Reynolds, J., and J. Postel, ASSIGNED NUMBERS, RFC 1700, University of Southern California/Information Sciences Institute, October 1994. Simonson, K., Character Mnemonics & Character Sets, RFC 1345, Rationel Almen Planlaegning, June 1992. Spinellis, D., Greek Character Encoding for Electronic Mail Messages, RFC 1947, SENA S.A., May 1996. The Unicode Consortium, The Unicode Standard, Version 2.0, Addison Wesley Developers Press, July 1996. Wei, Y., Y. Zhang, J. Li, J. Ding, and Y. Jiang, ASCII Printable Characters-Based Chinese Character Encoding for Internet Messages, RFC 1842, AsiaInfo Services Inc., Harvard University, Rice University, University of Maryland, August 1995. Yergeau, F., UTF-8, a transformation format of Unicode and ISO 10646, RFC 2044, Alis Technologies, October 1996. Zhu, H., D. Hu, Z. Wang, T. Kao, W. Chang, and M. Crispin, Chinese Character Encoding for Internet Messages, RFC 1922, Tsinghua University, China Information Technology Standardization Technical Committee (CITS), Institute for Information Industry (III), University of Washing- ton, March 1996. NOTES
ISO 8859 character sets using Latin alphabetic characters are distinguished as follows: ISO 8859-1 (Latin 1) For most West European languages, including: Albanian Finnish Italian Catalan French Norwegian Danish German Portuguese Dutch Galician Spanish English Irish Swedish Faeroese Icelandic ISO 8859-2 (Latin 2) For most Latin-written Slavic and Central European languages: Czech Polish Slovak German Rumanian Slovene Hungarian Croatian ISO 8859-3 (Latin 3) Popularly used for Esperanto, Galician, Maltese, and Turkish. ISO 8859-4 (Latin 4) Introduces letters for Estonian, Latvian, and Lithuanian. It is an incomplete predecessor of ISO 8859-10 (Latin 6). ISO 8859-9 (Latin 5) Replaces the rarely needed Icelandic letters in ISO 8859-1 (Latin 1) with the Turkish ones. ISO 8859-10 (Latin 6) Adds the last Inuit (Greenlandic) and Sami (Lappish) letters that were not included in ISO 8859-4 (Latin 4) to complete coverage of the Nordic area. SunOS 5.11 18 Apr 1997 iconv_unicode(5)
All times are GMT -4. The time now is 07:10 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy