Perl: encoding changes and odd symbols


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Perl: encoding changes and odd symbols
# 1  
Old 04-12-2020
Perl: encoding changes and odd symbols

*** FIXED ISSUE - SOLUTION BELOW ***



This is a much simplified version of a script that I'm using. The program finds the number 1 in brackets-((1))-and replaces it with a sentence. The text is French because the program translates into French and I want to know it works properly with accents.

replace.sh (can just be pasted into shell):
Code:
#!/bin/sh
num=1
rm -rf temp.tmp
touch temp.tmp
iconv -f utf-8 temp.tmp
echo '((1)) ((2))' >> temp.tmp
text='Il reçoit 5 000 $ à la livraison. 5 000 $?'
perl -i -CS -pne 's/\(\('"${num}"'\)\)/'"${text}"'/' temp.tmp
cat temp.tmp

The result is:


In an editor it displays as:

Code:
Il reçoit 5 000   la livraison. 5 000 0 ((2))

In shell it displays as:
Code:
$ cat temp.tmp.tmp
Il reçoit 5 000 � la livraison. 5 000 0 ((2))

The file was a UTF-8 before perl wrote on it and now it is iso-8859-1:
Code:
$ file -i temp.tmp
temp: text/plain; charset=iso-8859-1

I would like the result to be:
Code:
Il reçoit 5 000 $ à la livraison. 5 000 $? ((2))

It seems that after using echo the file format changes.




*** SOLUTION ***



I used sed instead and did this:


Code:
#!/bin/sh
num=1
rm -rf temp.tmp
touch temp.tmp
iconv -f utf-8 temp.tmp
echo '((1)) ((2))' >> temp.tmp
text='Il reçoit 5 000 $ à la livraison. 5 000 $?'

LC_ALL=C sed -i 's/(('"${num}"'))/'"${text}"'/g' temp.tmp
cat temp.tmp



Second last line from the bottom works fine.

Last edited by bedtime; 04-12-2020 at 12:47 PM..
# 2  
Old 04-12-2020
I found the problem with the perl code $ is expanded in perl RE to avoid this I put the string in a perl variable with single quotes around it like this:

Code:
$ text='QIl reçoit 5 000 $ à la livraison. 5 000 $?'
$ echo '((1)) ((2))' > temp.tmp
$ perl -i -CA -pne 'my $val='\'"${text}"\''; s/\(\('"${num}"'\)\)/$val/' temp.tmp
$ file -i temp.tmp
temp.tmp: text/plain; charset=utf-8
$ cat temp.tmp
QIl reçoit 5 000 $ à la livraison. 5 000 $? ((2))

This User Gave Thanks to Chubler_XL For This Post:
# 3  
Old 04-13-2020
Quote:
Originally Posted by Chubler_XL
I found the problem with the perl code $ is expanded in perl RE to avoid this I put the string in a perl variable with single quotes around it like this:

Code:
$ text='QIl reçoit 5 000 $ à la livraison. 5 000 $?'
$ echo '((1)) ((2))' > temp.tmp
$ perl -i -CA -pne 'my $val='\'"${text}"\''; s/\(\('"${num}"'\)\)/$val/' temp.tmp
$ file -i temp.tmp
temp.tmp: text/plain; charset=utf-8
$ cat temp.tmp
QIl reçoit 5 000 $ à la livraison. 5 000 $? ((2))

Thank you. This solution didn't quite work for me as it didn't end up replacing the text:


Code:
((1)) ((2))


I did however use this method of defining the variable with sed and it worked fine.
# 4  
Old 04-13-2020
Quote:
Originally Posted by bedtime
Thank you. This solution didn't quite work for me as it didn't end up replacing the text:


Code:
((1)) ((2))


I did however use this method of defining the variable with sed and it worked fine.
Apologies, I forgot to set $num in my example:

Code:
$ text='QIl reçoit 5 000 $ à la livraison. 5 000 $?'
$ num=1
$ echo '((1)) ((2))' > temp.tmp
$ perl -i -CA -pne 'my $val='\'"${text}"\''; s/\(\('"${num}"'\)\)/$val/' temp.tmp
$ file -i temp.tmp
temp.tmp: text/plain; charset=utf-8
$ cat temp.tmp
QIl reçoit 5 000 $ à la livraison. 5 000 $? ((2))

This User Gave Thanks to Chubler_XL For This Post:
# 5  
Old 04-14-2020
Quote:
Originally Posted by Chubler_XL
Apologies, I forgot to set $num in my example:

Code:
$ text='QIl reçoit 5 000 $ à la livraison. 5 000 $?'
$ num=1
$ echo '((1)) ((2))' > temp.tmp
$ perl -i -CA -pne 'my $val='\'"${text}"\''; s/\(\('"${num}"'\)\)/$val/' temp.tmp
$ file -i temp.tmp
temp.tmp: text/plain; charset=utf-8
$ cat temp.tmp
QIl reçoit 5 000 $ à la livraison. 5 000 $? ((2))

Thank you. I can confirm that this works. I can't believe that I missed that variable too.Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Solaris

View file encoding then change encoding.

Hi all!! I´m using command file -i myfile.xml to validate XML file encoding, but it is just saying regular file . I´m expecting / looking an output as UTF8 or ANSI / ASCII Is there command to display the files encoding? Thank you! (2 Replies)
Discussion started by: mrreds
2 Replies

2. Shell Programming and Scripting

Calling a Perl script in a Bash script -Odd Situation

I am creating a startup script for an application. This application's startup script is in bash. It will also need to call a perl script (which I will not be able to modify) for the application environment prior to calling the application. The problem is that this perl script creates a new shell... (5 Replies)
Discussion started by: leepet01
5 Replies

3. Shell Programming and Scripting

How to find the file encoding and updating the file encoding?

Hi, I am beginner to Unix. My requirement is to validate the encoding used in the incoming file(csv,txt).If it is encoded with UTF-8 format,then the file should remain as such otherwise i need to chnage the encoding to UTF-8. Please advice me how to proceed on this. (7 Replies)
Discussion started by: cnraja
7 Replies

4. Shell Programming and Scripting

Perl: How to Print symbols like " and ;

Hi, How do I print a line with symbols in a file? Exp: If I want to print line: Hi "Lisa;John" Command: print FILE "Hi "Lisa;John""; - will give me error Bareword found where operator expected... Can someone advise how can I print any line consiting symbols like example above. Thanks... (3 Replies)
Discussion started by: SSGKT
3 Replies

5. Shell Programming and Scripting

Encoding conversion in PERL script

I have oracle 9i database installed with UTF-8 Encoding. I want a perl script that converts unicode to utf8 before commiting in database and utf8 to unicode when retreiving from database For example : the word Ïntêrnatïônàlîzâtion has to be stored in database as Internationalization and when retreived... (6 Replies)
Discussion started by: vkca
6 Replies

6. Shell Programming and Scripting

Encoding troubles

Hello All I have a set of files, each one containing some lines that follows that regex: regex='disabled\,.*\,\".*\"'and here is what file says about each files: file <random file> <random file> ASCII text, with CRLF line terminatorsSo, as an example, here is what a file ("Daffy Duck - The... (3 Replies)
Discussion started by: tukuyomi
3 Replies

7. Shell Programming and Scripting

Araic Encoding

hi folks , I have a shell script which contain SQL query that dump some data from the DB in arabic and this data is written to a file in unix machine but the problem that the arabic data is appear like ??????????|111|???????? even when I move it to my windows XP machine. Any one have an Idea... (2 Replies)
Discussion started by: habuzahra
2 Replies

8. AIX

get the file encoding

Hello! The system is AIX 5.3 Give please command or script to get the file encoding (1 Reply)
Discussion started by: vinment
1 Replies

9. Shell Programming and Scripting

Perl IO vs GLOB symbols

Hi, Can someone please clarify how we are able to use both IO and GLOB symbols of a package variable interchangeably? Please consider the following code: open(FH,"myfile") || die "Unable to open file myfile:$@"; my $glob_var = *main::FH{GLOB}; my $io_var = *main::FH{IO}; print $glob_var... (0 Replies)
Discussion started by: srinivasan_85
0 Replies

10. UNIX for Dummies Questions & Answers

encoding

Hi, I'm using putty and when I try to write ü it writes | (or when I try to write é , it writes i) I tried to change settings/translation of putty but with no success I have KSH # locale LANG= LC_CTYPE="C" LC_NUMERIC="C" LC_TIME="C" LC_COLLATE="C" LC_MONETARY="C" LC_MESSAGES="C"... (3 Replies)
Discussion started by: palmer18
3 Replies
Login or Register to Ask a Question