Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Remove Unicode/special chars from XML Post 302597546 by Corona688 on Friday 10th of February 2012 03:25:25 PM
Old 02-10-2012
Some things show out-of-range characters AS ^Z or whatever, but that doesn't mean it's literally the character ^ followed by the character Z. That's just its way of showing you characters it can't represent any other way.

So I think deleting the UTF8 characters themselves would be a good thing to try first; they're probably still there, unconverted. Since all UTF8 characters are >=128, we can use tr to strip out that entire range.

Code:
tr -d '[\200-\377]' < input > output

 

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Supress special chars in vi

Hi, One of our application is producing log files. But if we open the log file in vi or less or view mode, it shows all the special characters in it. The 'cat' shows correctly but it shows only last page. If I do 'cat' <file_name> | more, then again it shows special characters. ... (1 Reply)
Discussion started by: divakarp
1 Replies

2. Shell Programming and Scripting

treating special chars

Hi, I need some advise on treating non printable chars over ascii value 126 Case 1 : On some fields in the text , I need to retiain then 'as-is' and load to a database.I understand it also depends on database codepage. but i just wanna know how do i ensure it do not change while loading... (1 Reply)
Discussion started by: braindrain
1 Replies

3. Shell Programming and Scripting

special chars arrangement in code

here is my simple script to show process and owners except me: ps `-ef |grep xterm |grep -v aucar` | while read a1 a2 a3 a4 a5 a6 a7 a8 do echo KILL..\($a1\).. $a2 |more done how can I pass values from command "ps -ef |grep xterm|grep -v aucar" to ? because above command... (2 Replies)
Discussion started by: xramm
2 Replies

4. UNIX for Dummies Questions & Answers

remove special and unicode characters

Hi, How do I remove the lines where special characters or Unicode characters appear? The following query does work but I wonder if there is a better way. cat test.txt | egrep -v '\)|#|,|&|-|\(|\\|\/|\.' The following lines show that my query is incomplete. Warning: The word "*Khan" is... (1 Reply)
Discussion started by: shantanuo
1 Replies

5. Shell Programming and Scripting

finding files with unicode chars in the filename

I'm trying to check-in a repository to svn -- but the import is failing because some files waaaay down deep in some graphics-library folder are using unicode characters in the file name - which are masked using the ls command but picked up when piping output to more: # ls -l 1914* -rwxrwxr-x 1... (2 Replies)
Discussion started by: mshallop
2 Replies

6. Shell Programming and Scripting

comm command help with unicode chars in file

Hi, I have a Master file (file.txt) with good and bad records( records with unicode characters). I ahve a file with only bad records (bad.txt) I want the records in file.txt which are not present in bad.txt ie only the good records. I tried comm -23 file.txt bad.txt It is giving... (14 Replies)
Discussion started by: ashwin3086
14 Replies

7. Shell Programming and Scripting

print all between patterns with special chars

Hi, I'm having trouble with awk print all characters between 2 patterns. I tried more then one solution found on this forum but with no success. Probably my mistakes are due to the special characters "" and "]"in the search patterns. Well, have a log file like this: logfile.txt ... (3 Replies)
Discussion started by: ginolatino
3 Replies

8. Shell Programming and Scripting

All strings within two special chars

I have a file with multiple lines. From each line I want to get all strings that starts with '+' and ends with '/'. Then I want the strings to be separated by ' + ' Example input: +$A$/NOUN+At/NSUFF_FEM_PL+K/CASE_INDEF_ACC Sample output: $A$ + At + K (20 Replies)
Discussion started by: Viernes
20 Replies

9. Shell Programming and Scripting

Safely Remove Files with Special Chars

Hey Guys, I'm swamped writing code for the forums: Could someone write a script or command line to safely delete files with special chars in filenames from a directory: Example: -rw-r--r-- 1 root root 148 Apr 30 23:00 ?xA?? -rw-r--r-- 1 root root 148... (8 Replies)
Discussion started by: Neo
8 Replies

10. UNIX for Beginners Questions & Answers

Shell script to split data with a delimiter having chars and special chars

Hi Team, I have a file a1.txt with data as follows. dfjakjf...asdfkasj</EnableQuotedIDs><SQL><SelectStatement modified='1' type='string'><! The delimiter string: <SelectStatement modified='1' type='string'><! dlm="<SelectStatement modified='1' type='string'><! The above command is... (7 Replies)
Discussion started by: kmanivan82
7 Replies
tr(1)							      General Commands Manual							     tr(1)

Name
       tr - translate characters

Syntax
       tr [-cds] [string1[string2]]

Description
       The  command copies the standard input to the standard output with substitution or deletion of selected characters.  Input characters found
       in string1 are mapped into the corresponding characters of string2.  When string2 is short it is padded to the length of string1 by  dupli-
       cating  its  last character.  Any combination of the options -cds may be used: -c complements the set of characters in string1 with respect
       to the universe of characters whose ASCII codes are 0 through 0377 octal; -d deletes all input  characters  in  string1;  -s  squeezes  all
       strings of repeated output characters that are in string2 to single characters.

       In  either string the notation a-b means a range of characters from a to b in increasing ASCII order.  The backslash character () followed
       by 1, 2 or 3 octal digits stands for the character whose ASCII code is given by those digits.  A  followed by any other  character  stands
       for that character.

       The  following  example creates a list of all the words in `file1' one per line in `file2', where a word is taken to be a maximal string of
       alphabetics.  The second string is quoted to protect  from the Shell.  012 is the ASCII code for newline.
       tr -cs A-Za-z '12' <file1 >file2

Options
       -c   Translates complements:  string1 to those not in string1.

       -d   Deletes all characters in string1 from output.

       -s   Squeezes succession of a character in string1 to one in output.

Restrictions
       `', `0', and `00' are equivalent for NUL character.

       `12' is treated as octal 12 and not a NUL followed by characters 1 and 2.

See Also
       ed(1), ascii(7), expand(1)

																	     tr(1)
All times are GMT -4. The time now is 03:34 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy