char problem ?


 
Thread Tools Search this Thread
Top Forums Programming char problem ?
# 1  
Old 02-09-2010
char problem ?

Here is a C function that replaces some non-ASCII chars to html decimal entities. It seems that the char "ç" does not get replaced correctly but the rest do. Any idea why this is happening ?

(Please note that I had to place a space before each ; or they would not post correctly in this forum because the browser would convert them automatically.)

The function outputs:
Bar&#38 ;#38 ;#38 ;#38 ;#231 ;a
but it should output:
Bar&#231 ;a

Code:
char *conversionpointer = NULL;

char *chartodecimal(const char *STRING) // replaces all occurrences of char to html decimals
{
    conversionpointer = (char *)realloc(conversionpointer, strlen(STRING) + 1);

    strcpy(conversionpointer, STRING);

    char original[15][3] = {        "ç",   "\"",    "&",     "ñ",     "ä",     "é",     "ë",     "ü",     "ã",     "º",     "ª",     "á",     "ó",     "ø",     "ß"};
    char replacement[15][7] = {"&#231 ;","&#34 ;","&#38 ;","&#241 ;","&#228 ;","&#233 ;","&#235 ;","&#252 ;","&#227 ;","&#186 ;","&#170 ;","&#225 ;","&#243 ;","&#248 ;","&#223 ;"};

    char *conversionpointeroriginalp = NULL;
    char *buffer = NULL;
    int count = 0;

    while(count < 15)
    {
        conversionpointeroriginalp = strstr(conversionpointer, original[count]);

        while(conversionpointeroriginalp != NULL)
        {
            buffer = (char *)realloc(buffer, strlen(conversionpointer) + 1 + (strlen(replacement[count]) - strlen(original[count])));
            strncpy(buffer, conversionpointer, (size_t)(conversionpointeroriginalp - conversionpointer));

            sprintf(buffer + (conversionpointeroriginalp - conversionpointer), "%s%s", replacement[count], conversionpointeroriginalp + strlen(original[count]));

            conversionpointer = (char *)realloc(conversionpointer, strlen(buffer) + 1);

            strcpy(conversionpointer, buffer);

            conversionpointeroriginalp = strstr(conversionpointeroriginalp, original[count]);
        }

        count++;
    }

    if (buffer != NULL)
    {
        free(buffer);
    }

    return conversionpointer;
}

int main(void)
{
    printf("%s\n", chartodecimal("Barça"));

    return 0;
}

# 2  
Old 02-09-2010
What codepage is this in? You should know that C source code doesn't technically allow ASCII chars >= 128 to be actually present in the source, which many codesets use to encode extended characters and UTF-8 uses to encode multibyte sequences. It still sometimes works, especially with gcc, but its actual behavior is undefined and may change in the future. In any case you need to be very, very aware of what code set you're editing the source with as well as what code set your program is supposed to read and write if you want it to work as you expect.

If they're not UTF8, which needs more complicated handling, you might refer to these characters by their codes, not their actual characters. You can embed arbitrary hex in a string, "\x20" is a space for example. This would protect you from mistakes that happen from editing the source code in the wrong code page.

Last edited by Corona688; 02-09-2010 at 04:39 PM..
# 3  
Old 02-10-2010
I tried to compile it explicitly stating UTF-8:
-std=c99 -finput-charset=UTF-8 -fexec-charset=UTF-8 -fwide-exec-charset=UTF-8

I tried using Hex codes as you suggested but only Unicode codes are recognized (example: \u00E7).

But those are not the problems as the function I posted works except for the first string in both arrays... try this:

Modify original[0] to "B"
Modify replacement[0] to "&#000 ;" (I placed space before ; so browser can show it.)
I get this: &#38 ;#38;#38;#000;arça (I placed space before ; in &#38 ; so browser can show it.)

Modify original[0] to "B"
Modify replacement[0] to "#000;"
I get a segfault.

Any idea why this is happening ?

Last edited by cyler; 02-10-2010 at 01:47 PM..
# 4  
Old 02-10-2010
Quote:
Originally Posted by cyler
I tried to compile it explicitly stating UTF-8:
-std=c99 -finput-charset=UTF-8 -fexec-charset=UTF-8 -fwide-exec-charset=UTF-8

I tried using Hex codes as you suggested but only Unicode codes are recognized (example: \u00E7).
Ah, so its all UTF-8. Good to know. Like I said, unicode characters require more complex handling since they're sometimes multibyte. If the \u syntax works I'd suggest it over putting raw UTF8 in your source code, since \u is probably less illegal than raw UTF8 and it stops it getting garbled when you edit or post it anywhere UTF8 might not be understood.
Quote:
But those are not the problems as the function I posted works except for the first string in both arrays... try this:
I can't. The raw UTF8 you won't stop embedding gets scrambled when I copy/paste.

I may have a more efficient solution for you later... I've already made some UTF-8 I/O routines which would let you check the string character by character instead of scanning it completely through for each different kind of entity.
# 5  
Old 02-10-2010
Quote:
Originally Posted by Corona688
I can't. The raw UTF8 you won't stop embedding gets scrambled when I copy/paste.
Code:
    char original[15][7] = {   "\u00E7"/*ç*/,   "\""     ,    "&"     ,"\u00F1"/*ñ*/,"\u00E4"/*ä*/,"\u00E9"/*é*/,"\u00EB"/*ë*/,"\u00FC"/*ü*/,"\u00E3"/*ã*/,"\u00BA"/*º*/,"\u00AA"/*ª*/,"\u00E1"/*á*/,"\u00F3"/*ó*/,"\u00F8"/*ø*/,"\u00DF"/*ß*/}; // Unicode is used instead of the char for portability
    char replacement[15][7] = {""&#231 ;"/*ç*/,"&#34 ;"/*"*/,"&#38 ;"/*&*/,"&#241 ;"/*ñ*/,"&#228 ;"/*ä*/,"&#233 ;"/*é*/,"&#235 ;"/*ë*/,"&#252 ;"/*ü*/,"&#227 ;"/*ã*/,"&#186 ;"/*º*/,"&#170 ;"/*ª*/,"&#225 ;"/*á*/,"&#243 ;"/*ó*/,"&#248 ;"/*ø*/,"&#223 ;"/*ß*/};

Ok, posted the relevant part of the function in Unicode code and html-decimal-entities with a space before the ;

Last edited by cyler; 02-10-2010 at 06:41 PM..
# 6  
Old 02-10-2010
[edit] my mistake.
# 7  
Old 02-11-2010
Ok, I finally got it.

original[2] is an ampersand but replacement[2] also has an ampersand. So what was happening was that the while loop was replacing the replacement as well.

Solution:
move the ampersand to position zero in both arrays and then include a break statement in the while loop so that the ampersand can only be replaced once.

Last edited by cyler; 02-11-2010 at 11:01 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Perl regex problem on strings with several occurences of one char

Hi all, i have the following line in a record file : retenu=non demande=non script=#vtbackup /path=/save/backup/demande position=140+70 and i want to use Perl regex to have the following output key : "retenu" value : "non" key : "demande" value "non" key : "script" value :... (2 Replies)
Discussion started by: Fundix
2 Replies

2. Programming

Invalid conversion from char* to char

Pointers are seeming to get the best of me and I get that error in my program. Here is the code #include <stdio.h> #include <stdlib.h> #include <string.h> #define REPORTHEADING1 " Employee Pay Hours Gross Tax Net\n" #define REPORTHEADING2 " Name ... (1 Reply)
Discussion started by: Plum
1 Replies

3. UNIX for Dummies Questions & Answers

Problem in C shell (csh) prompt setting containing the '$' char

Hi, I am trying to customize the command prompt of the C shell as follows: set prompt=" " The above one works fine but when I try to add a '$' (dollar) symbol into the string as set prompt=" " I am getting the error as: Illegal variable name However, this one set prompt = "-- %T %n %~ --... (2 Replies)
Discussion started by: royalibrahim
2 Replies

4. Programming

"char" memory layout problem!

In the following code, why the final result of "usC=cA+(char)ucB;" is 0xFF00? In my opioion the memory layout of cA is "10000000" and (char)cB is "10000000",usC type is unsigned short ,so the result should be "100000000" ,the 0x100. Please help tell me what is wrong? Thanks!!;) ... (2 Replies)
Discussion started by: micky.zhou
2 Replies

5. Programming

error: invalid conversion from ‘const char*’ to ‘char*’

Compiling xpp (The X Printing Panel) on SL6 (RHEL6 essentially): xpp.cxx: In constructor ‘printFiles::printFiles(int, char**, int&)’: xpp.cxx:200: error: invalid conversion from ‘const char*’ to ‘char*’ The same error with all c++ constructors - gcc 4.4.4. If anyone can throw any light on... (8 Replies)
Discussion started by: GSO
8 Replies

6. Programming

concat const char * with char *

hello everybody! i have aproblem! i dont know how to concatenate const char* with char const char *buffer; char *b; sprintf(b,"result.txt"); strcat(buffer,b); thanx in advance (4 Replies)
Discussion started by: nicos
4 Replies

7. Programming

Adding a single char to a char pointer.

Hello, I'm trying to write a method which will return the extension of a file given the file's name, e.g. test.txt should return txt. I'm using C so am limited to char pointers and arrays. Here is the code as I have it: char* getext(char *file) { char *extension; int i, j;... (5 Replies)
Discussion started by: pallak7
5 Replies

8. Shell Programming and Scripting

How to replace any char with newline char.

Hi, How to replace any character in a file with a newline character using sed .. Ex: To replace ',' with newline Input: abcd,efgh,ijkl,mnop Output: abcd efgh ijkl mnop Thnx in advance. Regards, Sasidhar (5 Replies)
Discussion started by: mightysam
5 Replies

9. Programming

C programming + problem with char arrays

Im trying to write some code atm which gets the complete pathname of a folder and strips off references to the parent folders. The end result should be just the name of the folder. Currently Im able to extract the folder name, however Im getting junk added onto the name as well which is making... (7 Replies)
Discussion started by: JamesGoh
7 Replies

10. Programming

char array problem

hello i have a program in C (Unix - SOlaris5.7), and i have the next question: i have a lot of char variable, and i want store their values in a char array. The problem is what i don´t know how to put the char variable's value into the array, and i don`t know how to define the array please... (4 Replies)
Discussion started by: DebianJ
4 Replies
Login or Register to Ask a Question