![]() |
|
|
|
|
|||||||
| Forums | Portal | Register | Forum Rules | FAQ | Contribute | Members List | Arcade | Search | Today's Posts | Mark Forums Read |
| UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !! |
|
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Removing certain characters in a file | bombcan | Shell Programming and Scripting | 2 | 04-25-2008 12:53 PM |
| Replacing characters in csv file | finwhiz | UNIX for Dummies Questions & Answers | 1 | 03-31-2008 02:25 AM |
| Invalid Characters in the file. | kanu_pathak | Shell Programming and Scripting | 5 | 02-01-2008 06:45 AM |
| how to see special characters in a file using vi | jingi1234 | UNIX for Dummies Questions & Answers | 6 | 10-19-2005 08:57 AM |
| grepping the first 3 characters from a file | rachael | UNIX for Dummies Questions & Answers | 2 | 10-15-2001 11:33 AM |
|
|
Submit Tools | LinkBack | Thread Tools | Display Modes |
|
#1
|
|||
|
|||
|
Invalid Characters in the file.
I am working on AIX. We ftp files to a database. The flat files are having thousands of records and each record is having some 50 to 60 characters(there are fields having certain character length). In addition to some valid ascii characters some invalid characters like Å, å, Ä, ä or pipes creep in which datawarehouse rejects to load in.
Example: AcuM-^?a 051706 ; above is a field in the record which is having special characters like -,^ and ? , which should not have been there. The record separator is a new line and there is no field seperator. How can I remove these invalid or special characters to creep in the records? Please help me to find the logic in the shell sripting.. |
| Forum Sponsor | ||
|
|
|
#2
|
||||
|
||||
|
Code:
$ echo "invalid characters like Å, å, Ä, ä or" invalid characters like Å, å, Ä, ä or $ echo "invalid characters like Å, å, Ä, ä or"| tr -dc " a-zA-Z0-9,\n" invalid characters like , , , or $ |
|
#3
|
|||
|
|||
|
Thanks for the reply Perderabo. There are some more doubts;
=>There can be lots more invalid character like these, so shall I give all them in the 'like' command? =>If I want to replace the 'invalid character' with a 'space' how can I do that? =>How to run whole process for thousands of records in the file? Attached is an example in this regard... |
|
#4
|
||||
|
||||
|
The
echo "invalid characters like Å, å, Ä, ä or" is providing the input data with illegal characters that need removal. I need some test data and this is one way to demo a command. And the command I am showing is tr -dc " a-zA-Z0-9,\n" and that is what removes the garbage. The tr command, in this form, lists the valid characters, not the invalid ones. You may need to add stuff to the list. To replace invalid characters with a space use Code:
$ echo "invalid characters like Å, å, Ä, ä or"| tr -c ' a-zA-Z0-9,\n' ' ' invalid characters like , , , or $ Code:
tr -c ' a-zA-Z0-9,\n' ' ' < inputfile > outputfile |
|
#5
|
|||
|
|||
|
I used the translate commands as follows:
1) tr -c ' a-zA-Z0-9,\n' ' ' < inputfile > outputfile ==> No result 2) tr -c '[:print:][:cntrl:]' ' ' < inputfile > outputfile ==> The second command replaced one of the invalid character with space but retained all others. When I again ran the command on the resulted file, the complete files.(input as well as output.) Please tell me some other combinations. =================================================== Among the following characters, ~æ£ÇÄ, Ç was replaced. =================================================== Thanks in advance. Kanu |
|
#6
|
|||
|
|||
|
It is "Print". I dont know why this emoticon(invalid characters in my life?) came up.
Thanks |
|
#7
|
||||
|
||||
|
You can click "Disable smilies in text" option when you post. I edited your post to do that. Don't know why you're having trouble and I can't test in your environment but [:alnum:] is all letters and numbers. Try that. [:print:] is anything you can see, which is not what you want here. You want to discard some visible characters but keep others. Another thing to try is [a-z][A-Z][0-9] It's supposed to work without the brackets by some very old versions of tr required brackets for a range.
|
||||
| Google The UNIX and Linux Forums |