iconv -l and ANSEL character set


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users iconv -l and ANSEL character set
# 1  
Old 01-03-2007
iconv -l and ANSEL character set

I am forced to use the ANSEL character set for some GEDCOM documents but must convert them to a more modern set for another app which doesn't recognize ANSEL. I am unable to locate an ISO code for ANSEL in a search of the web. Would someone plese identify the ANSEL character set from the list given by
Code:
iconv -l

thanks Smilie
# 2  
Old 01-04-2007
It's ISO 5426 (probably -2). But iconv has to know how to deal with it - mine does not it appears.

This file: /usr/lib/nls/iconv/config.iconv
lists the character sets for your iconv, and shows aliases for many of them.
(it may be parked in another directory tree on your system, find will get it)

This is not much help, but it's all I know.
# 3  
Old 01-04-2007
no nls directory or config.iconv file found on my Mac. Executing "iconv -l" lists the available aliases but there is no 5426 among them. Disturbing. Thanks for the help.
# 4  
Old 01-05-2007
Am curious about this,

is there any way of identifying the character set ?

for ex:

to determine the character set file <A> contains.

file <A> contains character set of UTF-7
SJIS something like that.


Or do we have write our own custom code for parsing? Smilie
# 5  
Old 01-05-2007
ISO 5426-2 is a specification for biliographic (library) work that has Latin ANSI characters and Latin ANSI characters to which a diacritical mark is added - the extra characters use bit 8. It takes special hardware (normally) to process it.
This is OLD - the original specifcation for this was back in 1980.

We had a GIS system that could read input in ISO 5426, so that's the only reason I know about it. And because it was more of a hardware thing iconv and yaz_inconv, etc., don't support conversion.

What the OP will have to do is to create a table in C that maps the diacritical-lized characters to the normal 7-bit character. Then read thru the original file converting each character that is above 127. This is an example for converting ascii-ebcdic, since I don't know the tabling for ANSEL:

Code:
/* conversion tables */
#include <sys/types.h>
static  unsigned char
        ASCII_translate_EBCDIC [ 256 ] =
            {
            0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08,
            0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F,
            0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18,
            0x19, 0x1A, 0x1B, 0x1C, 0x1D, 0x1E, 0x1F,
            0x40, 0x5A, 0x7F, 0x7B, 0x5B, 0x6C, 0x50, 0x7D, 0x4D,
            0x5D, 0x5C, 0x4E, 0x6B, 0x60, 0x4B, 0x61,
            0xF0, 0xF1, 0xF2, 0xF3, 0xF4, 0xF5, 0xF6, 0xF7, 0xF8,
            0xF9, 0x7A, 0x5E, 0x4C, 0x7E, 0x6E, 0x6F,
            0x7C, 0xC1, 0xC2, 0xC3, 0xC4, 0xC5, 0xC6, 0xC7, 0xC8,
            0xC9, 0xD1, 0xD2, 0xD3, 0xD4, 0xD5, 0xD6,
            0xD7, 0xD8, 0xD9, 0xE2, 0xE3, 0xE4, 0xE5, 0xE6, 0xE7,
            0xE8, 0xE9, 0xAD, 0xE0, 0xBD, 0x5F, 0x6D,
            0x7D, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, 0x88,
            0x89, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96,
            0x97, 0x98, 0x99, 0xA2, 0xA3, 0xA4, 0xA5, 0xA6, 0xA7,
            0xA8, 0xA9, 0xC0, 0x6A, 0xD0, 0xA1, 0x4B,
            0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B,
            0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B,
            0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B,
            0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B,
            0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B,
            0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B,
            0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B,
            0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B,
            0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B,
            0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B,
            0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B,
            0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B,
            0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B,
            0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B,
            0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B,
            0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B, 0x4B 
            } ;

static  unsigned char
        EBCDIC_translate_ASCII [ 256 ] =
            {
            0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08,
            0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F,
            0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18,
            0x19, 0x1A, 0x1B, 0x1C, 0x1D, 0x1E, 0x1F,
            0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27, 0x28,
            0x29, 0x2A, 0x2B, 0x2C, 0x2D, 0x2E, 0x2F,
            0x2E, 0x2E, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38,
            0x39, 0x3A, 0x3B, 0x3C, 0x3D, 0x2E, 0x3F,
            0x20, 0x2E, 0x2E, 0x2E, 0x2E, 0x2E, 0x2E, 0x2E, 0x2E,
            0x2E, 0x2E, 0x2E, 0x3C, 0x28, 0x2B, 0x7C,
            0x26, 0x2E, 0x2E, 0x2E, 0x2E, 0x2E, 0x2E, 0x2E, 0x2E,
            0x2E, 0x21, 0x24, 0x2A, 0x29, 0x3B, 0x5E,
            0x2D, 0x2F, 0x2E, 0x2E, 0x2E, 0x2E, 0x2E, 0x2E, 0x2E,
            0x2E, 0x7C, 0x2C, 0x25, 0x5F, 0x3E, 0x3F,
            0x2E, 0x2E, 0x2E, 0x2E, 0x2E, 0x2E, 0x2E, 0x2E, 0x2E,
            0x2E, 0x3A, 0x23, 0x40, 0x27, 0x3D, 0x22,
            0x2E, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67, 0x68,
            0x69, 0x2E, 0x2E, 0x2E, 0x2E, 0x2E, 0x2E,
            0x2E, 0x6A, 0x6B, 0x6C, 0x6D, 0x6E, 0x6F, 0x70, 0x71,
            0x72, 0x2E, 0x2E, 0x2E, 0x2E, 0x2E, 0x2E,
            0x2E, 0x7E, 0x73, 0x74, 0x75, 0x76, 0x77, 0x78, 0x79,
            0x7A, 0x2E, 0x2E, 0x2E, 0x5B, 0x2E, 0x2E,
            0x2E, 0x2E, 0x2E, 0x2E, 0x2E, 0x2E, 0x2E, 0x2E, 0x2E,
            0x2E, 0x2E, 0x2E, 0x2E, 0x5D, 0x2E, 0x2E,
            0x7B, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47, 0x48,
            0x49, 0x2E, 0x2E, 0x2E, 0x2E, 0x2E, 0x2E,
            0x7D, 0x4A, 0x4B, 0x4C, 0x4D, 0x4E, 0x4F, 0x50, 0x51,
            0x52, 0x2E, 0x2E, 0x2E, 0x2E, 0x2E, 0x2E,
            0x5C, 0x2E, 0x53, 0x54, 0x55, 0x56, 0x57, 0x58, 0x59,
            0x5A, 0x2E, 0x2E, 0x2E, 0x2E, 0x2E, 0x2E,
            0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38,
            0x39, 0x2E, 0x2E, 0x2E, 0x2E, 0x2E, 0x2E 
            } ;
/* from EBC TO ASC using nul term string */
unsigned char  *to_ASCII(unsigned char *s)
{ 
       register unsigned char *buf=s;
       for(; *buf; buf++) *buf=EBCDIC_translate_ASCII[*buf];
       return s;	
}
/* ASC TO EBC using nul term string */
unsigned char *to_EBCDIC(unsigned char *s)
{
       register unsigned char *buf=s;
       for(; *buf; buf++) *buf=ASCII_translate_EBCDIC[*buf];
       return s;	
}
/* length defined coversion EBC TO ASC */
void EBCDIC_to_ASCII(const size_t len, unsigned char *s)
{ 
       register unsigned char *buf=s;
       register int j=0;
       for(j=len; j; j--) *buf++=EBCDIC_translate_ASCII[*buf];	
       return;
}
/* length defined coversion ASC TO EBC */
void ASCII_to_EBCDIC(const size_t len, unsigned char *s)
{ 
      register unsigned char *buf=s;
      register int j=0;     
      for (j=len; j; j--) *buf++=ASCII_translate_EBCDIC[*buf];
      return;
}

Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Russian character set issue.

Hi All, I'm facing issue while opening xls file while contains Russian/Siberian character I tried various options which I could get from google but still issue persists hence thought of taking help here, We are trying to export data from Oracle via shell script using sqlplus utility. After... (8 Replies)
Discussion started by: arvindshukla81
8 Replies

2. Shell Programming and Scripting

How to set character limit on READ?

Hello, I created the following (snippet from larger code): echo -n "A1: " read A1 VERIFY=$(echo -n $A1|wc -c) if ; then echo -e "TOO MANY CHARACTERS" fi echo -n "A2: " read A2 echo -n "A3: " read A3 echo -e "Concat: $B1/$B2/$B3" Basically what it does is it... (4 Replies)
Discussion started by: jl487
4 Replies

3. UNIX for Dummies Questions & Answers

Character set problem

Hi, I'm trying to edit a file with vi, but all special characters (áéíóú etc) don't seem to show correctly. They don't seem to be supported by the OS (SunOS 5.10). I'm using MobaXterm as the terminal emulator, which is configured to use ISO-8859-1. The same charset is used on Solaris. If I open... (4 Replies)
Discussion started by: Subbeh
4 Replies

4. Shell Programming and Scripting

extra character with iconv encoding

hey, I am trying to convert a sample russian encoding file to English encoding using iconv utility. Its almost done but with each converted character i am getting one extra character which must not come. my sample Russian text is test.txt А Б В Г Д Е Ж З И Й К ~ and script which i... (4 Replies)
Discussion started by: peeyushgehlot
4 Replies

5. UNIX for Advanced & Expert Users

ASCII Character Set

I thought I would point this out. This has a lot of the non printing characters. ASCII Character Set (7 Replies)
Discussion started by: cokedude
7 Replies

6. Solaris

help me to change the character set

dears i am using solaris 10 i am facing a problem when i make setup for solaris i choose the country egypt and i select the language north america but i forget to do that the i found the date Jun written in arabic i want to change character set to written in english -rw-r--r-- 1 root ... (4 Replies)
Discussion started by: hosney00ux
4 Replies

7. Shell Programming and Scripting

Unix character set problem

Hi All, We are getting file into our unix box with multibyte characters. When we tried to view the file the record looks like this Frédéric Actually the data sent to us is Frédéric --> my locale charmap of unix is set to UTF8 only ... but still i am getting this problem. I... (6 Replies)
Discussion started by: sandeeppvk
6 Replies

8. Programming

character set solaris

hi , i am trying to work on a script that transforms some special Dutch characters and send them to a Xerox printer .. the problem is that while doing so iam unable to identify th correct character set that is used by solaris , to transfer these characcters to Xerox character set . thanks... (2 Replies)
Discussion started by: ppass
2 Replies

9. Solaris

latin 2 character-set with xterm

Hi, We have problems with the latin 2 Character-set with xterm. We have installed SunRay-Server with Solaris 8. Our Thinclients use hu- and cz-keyboards. I have set the right local-settings and xmodemaps. If I use the dtterm all is running fine. As soon as I use the xterm, it cannot display... (0 Replies)
Discussion started by: paho
0 Replies
Login or Register to Ask a Question