10-28-2011
Remove invalid database characters on a file
Hi All -
I'm building a script wherein it is design to remove characters that are not accepted on a non-unicode database. Examples are the following: ï,¿,½,Â,é, etc.
I can easily sed those characters one-by-one but I there's a problem when other unicode characters are found. Is there any way to remove all of them? I'm thinking they are all not found on a standard keyboard.
Please help. Thanks.
Also, I can't sed/grep characters with grave/accent like: ù
Last edited by Jin_; 10-28-2011 at 02:29 AM..
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
I am working on AIX. We ftp files to a database. The flat files are having thousands of records and each record is having some 50 to 60 characters(there are fields having certain character length). In addition to some valid ascii characters some invalid characters like Å, å, Ä, ä or pipes creep in which... (5 Replies)
Discussion started by: kanu_pathak
5 Replies
2. UNIX for Dummies Questions & Answers
I am working on AIX. We ftp files to a database. The flat files are having thousands of records and each record is having some 50 to 60 characters(there are fields having certain character length). In addition to some valid ascii characters some invalid characters like Å, å, Ä, ä or pipes creep in which... (15 Replies)
Discussion started by: kanu_pathak
15 Replies
3. Programming
This is a pretty straight-forward question. Within a program of mine, I have a string that's going to be used as a filename, but it might have some invalid characters in it that wouldn't be valid in a filename. If there are any invalid characters, I want to get rid of them and essentially squeeze... (4 Replies)
Discussion started by: cleopard
4 Replies
4. Shell Programming and Scripting
Here is my code.
for file in *1.3.html ; do mv "$file" `echo $file | tr '.1.3' ''` ; done
For some reason I am getting an error.
mv: file.idlesince.1.3.html and file.idlesince.1.3.html are identical
Could this be done a different way? (5 Replies)
Discussion started by: mrlayance
5 Replies
5. Shell Programming and Scripting
Hi guys,
Hope you are all well.
This is a line of data from a csv file. I have used vi and set the 'set list' option to display the trailing $ character.
"01","Grocery","01006","eat Fish & Spreads"$
I have tried the following commands, but neither of them appear to be working?
1) tr... (13 Replies)
Discussion started by: Krispy
13 Replies
6. Shell Programming and Scripting
Hi All,
As all of us know that while moving a file from Windows to Unix some unwanted ^M characters appear in the file. For my case I have release package in zip format which looks like Module_Name_Tag.zip. It contains some directory structure...like
Module_Name_Tag.zip
|
|--trunk/... (2 Replies)
Discussion started by: bhaskar_m
2 Replies
7. Shell Programming and Scripting
Hi,
I have one file in the following format.
exa_resu_adj.4ge v.47645 PERSONAL INFORMAIONS PVT LTD 31 Dec 2009 04:36 Page 1
SALARY REPORT
Account Account Name CCode Bill No Balance T Amt
----------- ------------ ------- ---------- ------------- -------------
17490001 Mr Ram PM 10... (6 Replies)
Discussion started by: Kattoor
6 Replies
8. UNIX Desktop Questions & Answers
I tried using below command
tr -cd "" < InputFile.xml > output.txt ============= This removes all the tabs/newline/extra spaces from a file
it successfully removed all the extra spaces,tabs and new line characters but then the complete file become one record. I want to retain one new line... (1 Reply)
Discussion started by: saini
1 Replies
9. UNIX for Dummies Questions & Answers
Hello,
Can any one help me in below query to search all the invalid characters that UNIX cannot recognize from a file. can we do anything with the help of grep command or any other commands.
Also, i am not sure what are the invalid characters present in the file.
Many thanks in advance.
... (6 Replies)
Discussion started by: schandru
6 Replies
10. Shell Programming and Scripting
i know , the below question has been repeated.
can you guys guide me .
I have the below input
999999 xxxxxxxxxxxxxx 123.45 2013-05-02 08:14 1 1 1 xxxx
999999 xxxxxxxxxxxxxx 123.45 2013-06-02 02:14 1 4 1 dddd
i need to remove from the column 54 to 70 , as like the below output.... (9 Replies)
Discussion started by: expert
9 Replies
UNICODE(1) General Commands Manual UNICODE(1)
NAME
unicode - command line unicode database query tool
SYNOPSIS
unicode [options] string
DESCRIPTION
This manual page documents the unicode command.
unicode is a command line unicode database query tool.
OPTIONS
-h --help
Show help and exit.
-x --hexadecimal
Assume string to be a hexadecimal number
-d --decimal
Assume string to be a decimal number
-r --regexp
Assume string to be a regular expression
-s --string
Assume string to be a sequence of characters
-a --auto
Try to guess type of string from one of the above (default)
-mMAXCOUNT
--max=MAXCOUNT
Maximal number of codepoints to display, default: 20; use 0 for unlimited
-iCHARSET
--io=IOCHARSET
I/O character set. For maximal pleasure, run unicode on UTF-8 capable terminal and specify IOCHARSET to be UTF-8. unicode tries to
guess this value from your locale, so with properly set up locale, you should not need to specify it.
-cADDCHARSET
--charset-add=ADDCHARSET
Show hexadecimal reprezentation of displayed characters in this additional charset.
-CUSE_COLOUR
--colour=USE_COLOUR
USE_COLOUR is one of on off auto
--colour=on will use ANSI colour codes to colourise the output
--colour=off won't use colours.
--colour=auto will test if standard output is a tty, and use colours only when it is.
--color is a synonym of --colour
-v --verbose
Be more verbose about displayed characters, e.g. display Unihan information, if available.
-w --wikipedia
Spawn browser pointing to Wikipedia entry about the character.
USAGE
unicode tries to guess the type of an argument. For example, you can use any of the following to display information about U+00E1 LATIN
SMALL LETTER A WITH ACUTE (a):
unicode 00E1
unicode U+00E1
unicode a
unicode 'latin small letter a with acute'
You can specify a range of characters as argumets, unicode will show these characters in nice tabular format, aligned to 256-byte bound-
aries. Use two dots ".." to indicate the range, e.g.
unicode 0450..0520
will display the whole cyrillic and hebrew blocks (characters from U+0400 to U+05FF)
unicode 0400..
will display just characters from U+0400 up to U+04FF
BUGS
Tabular format does not deal well with full-width, combining, control and RTL characters.
SEE ALSO
ascii(1)
AUTHOR
Radovan Garabik <garabik @ kassiopeia.juls.savba.sk>
2003-01-31 UNICODE(1)