Sponsored Content
Top Forums Shell Programming and Scripting extra character with iconv encoding Post 302530283 by peeyushgehlot on Monday 13th of June 2011 10:38:56 PM
Old 06-13-2011
extra character with iconv encoding

hey,

I am trying to convert a sample russian encoding file to English encoding using iconv utility.

Its almost done but with each converted character i am getting one extra character which must not come.

my sample Russian text is

test.txt
Code:
А Б В Г Д Е Ж З И Й К ~

and script which i am using for conversion is

script
Code:
>out
for i in `iconv -l`
do 
o=`iconv -f cp866 -t $i test.txt` 
len=`expr length "$o"`
if [ "$len" -gt 2 ]
then
echo $o#$i>>out
fi
done

and sample output for few almost successfully converted text are:

out
Code:
ト@ トA トB トC トD トE トG トH トI トJ トK ~	CP932
ト@ トA トB トC トD トE トG トH トI トJ トK ~	CSIBM932
ト@ トA トB トC トD トE トG トH トI トJ トK ~	CSIBM943
ト@ トA トB トC トD トE トG トH トI トJ トK ~	CSSHIFTJIS
ト@ トA トB トC トD トE トG トH トI トJ トK ~	CSWINDOWS31J
ト@ トA トB トC トD トE トG トH トI トJ トK ~	IBM-932
ト@ トA トB トC トD トE トG トH トI トJ トK ~	IBM-943
ト@ トA トB トC トD トE トG トH トI トJ トK ~	IBM932
ト@ トA トB トC トD トE トG トH トI トJ トK ~	IBM943
ト@ トA トB トC トD トE トG トH トI トJ トK ~	MS932
ト@ トA トB トC トD トE トG トH トI トJ トK ~	MS_KANJI
ト@ トA トB トC トD トE トG トH トI トJ トK ~	SHIFT-JIS
ト@ トA トB トC トD トE トG トH トI トJ トK ~	SHIFT_JIS
ト@ トA トB トC トD トE トG トH トI トJ トK ~	SHIFT_JISX0213
ト@ トA トB トC トD トE トG トH トI トJ トK ~	SJIS-OPEN
ト@ トA トB トC トD トE トG トH トI トJ トK ~	SJIS-WIN
ト@ トA トB トC トD トE トG トH トI トJ トK ~	SJIS
ト@ トA トB トC トD トE トG トH トI トJ トK ~	WINDOWS-31J

pls suggest where i am going wrong in this encoding process

Any help with that would be greatly appreciated.

---------- Post updated 06-14-11 at 08:08 AM ---------- Previous update was 06-13-11 at 09:20 PM ----------

hey guys can anyone help me on this..
 

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

iconv -l and ANSEL character set

I am forced to use the ANSEL character set for some GEDCOM documents but must convert them to a more modern set for another app which doesn't recognize ANSEL. I am unable to locate an ISO code for ANSEL in a search of the web. Would someone plese identify the ANSEL character set from the list given... (4 Replies)
Discussion started by: Whiterock
4 Replies

2. UNIX for Dummies Questions & Answers

character encoding in Fedora6

Hello, After upgrading the OS from Fedora4 to Fedora6, the firefox view>character encoding doesn't work anymore. None of the foreign characters can be displayed, no matter what character encoding to select. Any suggestions? Thanks, bsky :confused (1 Reply)
Discussion started by: bsky
1 Replies

3. AIX

Vacation message character encoding

I am trying to send a vacation message (.vacation.msg) from my AIX 5.3 server. Message is UTF-8 characters. Some email clients (like apple mail) have no problems displaying the correct text, however, some, like Windows Outlook, display garbage. Is there a way of forcing the client to use proper... (0 Replies)
Discussion started by: lanny
0 Replies

4. Shell Programming and Scripting

sort file adding extra character

HI all i have this script : #!/bin/bash sort /usr/tmp/"REPORT"$1 -o \ /usr/tmp/"SREPORT"$1 -k 1,7 -S 150 end of script now i'm doing this command : ls -lsgt *REPORT* 4 -rw-r--r-- 300 Sep 16 REPORT54784 4 -rw-r--r-- 301 Sep 16 SREPORT54784 as you can see the sorted file... (5 Replies)
Discussion started by: naamas03
5 Replies

5. Shell Programming and Scripting

how to delete extra character in a line?

And I want to delete the characters longer than 20 for each line start with #. The other lines should remain the same. I think this can be done by sed. Could anyone help me with this? Thanks! my input file: #ZP_05494889.1_Clostridium_papyrosolvens... (3 Replies)
Discussion started by: ritacc
3 Replies

6. Shell Programming and Scripting

Remove extra character

Hi I am using cat <filename> command in one of my datastage job(Command Activity). It is giving actual value but giving extra line. Eg: Displayed Output: 1 and showing extraline(Eg: 1 ) I had checked even wc -c it is giving one character extra. If the file contains 11. wc -c says 3. ... (3 Replies)
Discussion started by: cnrj
3 Replies

7. HP-UX

how to find the character encoding of a file in hp_ux

how to find the character encoding of a file in hp_ux (1 Reply)
Discussion started by: alokjyotibal
1 Replies

8. Shell Programming and Scripting

Awk while-loop printing extra character

Hi, I'm using a while-loop in an awk script. If it matches a regular expression, it prints a line. Unfortunately, each line that is printed in this loop is followed by an extra character, "1". While-statement extracted from my script: getline temp; while (temp ~ /.* x .*/) print temp... (3 Replies)
Discussion started by: redbluefish
3 Replies

9. Solaris

connect to ILOM via ssh character encoding

Hello all, I am connecting to ILOM using ssh client (putty) but when RedHat start booting everything look chinese for me... Probably i have to configure the character set, i tried also utf-8 but the issue remain. Any idea? Thanks in advance (0 Replies)
Discussion started by: @dagio
0 Replies

10. Shell Programming and Scripting

sed removing extra character from end

Hi, Searching through forum I found "sed 's/*$//'" can be used to remove trailing whitespaces and tabs from file. The command works fine but I see minor issue as below. Can you please suggest if I am doing something wrong here. $ cat a.txt upg_prod_test upg_prod_new $ cat a.txt |sed... (11 Replies)
Discussion started by: bhupinder08
11 Replies
All times are GMT -4. The time now is 02:48 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy