Spanish accent symbol removed by sed


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Spanish accent symbol removed by sed
# 1  
Old 06-04-2010
Spanish accent symbol removed by sed

Hello All
in a text file I have to replace some numeric code by a string.

This is an exemple of the file:

Code:
000000001 LDR   L ^^^^^nam^^2200169Ia^45e0
000000001 008   L 100604s9999^^^^xx^^^^^^^^^^^^000^0^und^d
000000001 022   L $$a0365-6675
000000001 090   L $$aBMA 1934-1937.
000000001 245   L $$aANALES DE LA REAL SOCIEDAD ESPAÃOLA DE FISICA Y QUIMICA$$h2
000000001 260   L $$aMADRID (ESPAÃA)$$bREAL SOCIEDAD ESPAÃOLA DE FISICA Y QUIMICA$$c1902
000000001 310   L $$aMENSUAL
000000001 500   L $$aCONTINUADA POR: "ANALES DE FISICA Y QUIMICA", ISSN: 0365-2351.
000000001 650   L $$a400000
000000001 650   L $$a660000
000000001 666   L $$aFISICA;QUIMICA

so in this file I must change this:
Only the 2 first digits are relevant

Code:
000000001 650   L $$a400000

by it's correspondence.

To do so, I choose sed, like this:

Code:
sed -i '/[0-9]\{9\} 650   L $$a40[0-9]\{4\}/{G;s/^\(.* $$a\)[0-9]\{5\}\(.*\)\(\n\)$/\1Química\2\3\1Chemistry\2/g}' revistas1.mrk_aleph_sec.dat

This code works but it change the accents from this
Code:
MADRID (ESPAÃA)

to this:

Code:
MADRID (ESPAÃ<91>A)


Whit this new code, the accents are lost...
Any idea?
# 2  
Old 06-04-2010
Maybe try setting the LANG environmental variable?

Quote:
8.2 Internationalization Variables

This section describes environment variables that are relevant to the operation of internationalized interfaces described in IEEE Std 1003.1-2001.

Users may use the following environment variables to announce specific localization requirements to applications. Applications can retrieve this information using the setlocale() function to initialize the correct behavior of the internationalized interfaces. The descriptions of the internationalization environment variables describe the resulting behavior only when the application locale is initialized in this way. The use of the internationalization variables by utilities described in the Shell and Utilities volume of IEEE Std 1003.1-2001 is described in the ENVIRONMENT VARIABLES section for those utilities in addition to the global effects described in this section.

LANG
This variable shall determine the locale category for native language, local customs, and coded character set in the absence of the LC_ALL and other LC_* ( LC_COLLATE , LC_CTYPE , LC_MESSAGES , LC_MONETARY , LC_NUMERIC , LC_TIME ) environment variables. This can be used by applications to determine the language to use for error messages and instructions, collating sequences, date formats, and so on.

LC_ALL
This variable shall determine the values for all locale categories. The value of the LC_ALL environment variable has precedence over any of the other environment variables starting with LC_ ( LC_COLLATE , LC_CTYPE , LC_MESSAGES , LC_MONETARY , LC_NUMERIC , LC_TIME ) and the LANG environment variable.

LC_COLLATE
This variable shall determine the locale category for character collation. It determines collation information for regular expressions and sorting, including equivalence classes and multi-character collating elements, in various utilities and the strcoll() and strxfrm() functions. Additional semantics of this variable, if any, are implementation-defined.

LC_CTYPE
This variable shall determine the locale category for character handling functions, such as tolower(), toupper(), and isalpha(). This environment variable determines the interpretation of sequences of bytes of text data as characters (for example, single as opposed to multi-byte characters), the classification of characters (for example, alpha, digit, graph), and the behavior of character classes. Additional semantics of this variable, if any, are implementation-defined.

LC_MESSAGES
This variable shall determine the locale category for processing affirmative and negative responses and the language and cultural conventions in which messages should be written. [XSI] It also affects the behavior of the catopen() function in determining the message catalog. Additional semantics of this variable, if any, are implementation-defined. The language and cultural conventions of diagnostic and informative messages whose format is unspecified by IEEE Std 1003.1-2001 should be affected by the setting of LC_MESSAGES .

LC_MONETARY
This variable shall determine the locale category for monetary-related numeric formatting information. Additional semantics of this variable, if any, are implementation-defined.

LC_NUMERIC
This variable shall determine the locale category for numeric formatting (for example, thousands separator and radix character) information in various utilities as well as the formatted I/O operations in printf() and scanf() and the string conversion functions in strtod(). Additional semantics of this variable, if any, are implementation-defined.

LC_TIME
This variable shall determine the locale category for date and time formatting information. It affects the behavior of the time functions in strftime(). Additional semantics of this variable, if any, are implementation-defined.

NLSPATH
[XSI] This variable shall contain a sequence of templates that the catopen() function uses when attempting to locate message catalogs. Each template consists of an optional prefix, one or more conversion specifications, a filename, and an optional suffix
.

Reference: The Open Group Base Specifications Issue 6
IEEE Std 1003.1, 2004 Edition
Copyright © 2001-2004 The IEEE and The Open Group, All Rights reserved.


8. Environment Variables
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. AIX

Accent words file from windows

Hi guys, I'm having a issue with a windows file from, at first the file is readed like one big and extense line and the famous "Ctrl+V Ctrl+R" or "^M return carriage"... fixed with: perl -pe 'if ( s/\r\n?/\n/g ) { $f=1 }; if ( $f || ! $m ) { s/()\z/$1\n/ }; $m=1' $file_input > file_output ... (5 Replies)
Discussion started by: jockx
5 Replies

2. Shell Programming and Scripting

remove caret (^) symbol from pattern using sed

Hi, I am trying to remove the caret symbol from a bash variable. This is the variable: var="GOTAN^TOK^B"and this is the code I am trying to use to remove the caret symbol: nocarrot=`echo $var | sed -e 's/^/_/g'`This is the output intended (but not acheived with the above function):... (3 Replies)
Discussion started by: goodbenito
3 Replies

3. HP-UX

Problem with sftp for accent character files

Hi All, Following are the problems i am facing.Resolution for these would be highly commendable: a)I am trying to transfer files from my local C: directory to unix server using sftp. The problem is coming when files with accent characters are picked for transfer.These files are not... (6 Replies)
Discussion started by: destinykrishan
6 Replies

4. UNIX for Dummies Questions & Answers

To replace '(' and ')' symbol using tr or sed

I am trying to replace '(' and ')' symbol with nul text using tr command. But i am not able to get the expected output . Please help # cat test.txt 155170816-(75767Mb) # tr '(' '' < test.txt 155170816-(75767Mb) # tr ')' '' < test.txt 155170816-(75767Mb) # I want the o/p as ... (8 Replies)
Discussion started by: thomasraj87
8 Replies

5. Shell Programming and Scripting

Remove spanish accent from file name

Hello All hope all fine, I have a question about spanish accents... I have in a redhat server, a lot of files with Ñ or Ú accent into the NAME of the file. So my question is: Is it possible to change this name but with a script...to change all the occurence in one shot. Exemple: cd... (2 Replies)
Discussion started by: ldiaz2106
2 Replies

6. Solaris

/usr/lib/passwdutil.so.1: symbol __nsl_fgetspent_r: referenced symbol not found

deleteing post (0 Replies)
Discussion started by: dshakey
0 Replies

7. UNIX for Dummies Questions & Answers

Keyboard/Spanish

I am looking for information to find out the easiest way to have my keyboard be able to do Spanish punctuation marks when typing in word processing in Open Office....... (0 Replies)
Discussion started by: scuup
0 Replies

8. Shell Programming and Scripting

Sed accent

Hi everyone ! I'd like to write a unix command for correcting all european accent errors in a document (spanish, german, french, danisch, etc )! i need to do this for correcting my document : sed -e 's/%2B/\ /g' -e 's/%25C9/É/g' doc1 > doc2 The first command is ok and change "%2B" into... (3 Replies)
Discussion started by: Tomat75
3 Replies

9. UNIX for Dummies Questions & Answers

accent in emacs

Hello, I try to insert charcheters with accent with emacs. It doesn't work. How can I do ? Thank you in advance (2 Replies)
Discussion started by: annemar
2 Replies
Login or Register to Ask a Question