Invalid Characters in the file.


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Invalid Characters in the file.
# 1  
Old 01-31-2008
Invalid Characters in the file.

I am working on AIX. We ftp files to a database. The flat files are having thousands of records and each record is having some 50 to 60 characters(there are fields having certain character length). In addition to some valid ascii characters some invalid characters like Å, å, Ä, ä or pipes creep in which datawarehouse rejects to load in.
Example: AcuM-^?a 051706 ;
above is a field in the record which is having special characters like -,^ and ? , which should not have been there.

The record separator is a new line and there is no field seperator.

How can I remove these invalid or special characters to creep in the records?
Please help me to find the logic in the shell sripting..
# 2  
Old 01-31-2008
Code:
$ echo "invalid characters like Å, å, Ä, ä or"
invalid characters like Å, å, Ä, ä or
$ echo "invalid characters like Å, å, Ä, ä or"| tr -dc " a-zA-Z0-9,\n"
invalid characters like , , ,  or
$

# 3  
Old 01-31-2008
Thanks for the reply Perderabo. There are some more doubts;

=>There can be lots more invalid character like these, so shall I give all them in the 'like' command?

=>If I want to replace the 'invalid character' with a 'space' how can I do that?

=>How to run whole process for thousands of records in the file?

Attached is an example in this regard...
# 4  
Old 01-31-2008
The
echo "invalid characters like Å, å, Ä, ä or"
is providing the input data with illegal characters that need removal. I need some test data and this is one way to demo a command. And the command I am showing is
tr -dc " a-zA-Z0-9,\n"
and that is what removes the garbage. The tr command, in this form, lists the valid characters, not the invalid ones. You may need to add stuff to the list. To replace invalid characters with a space use
Code:
$ echo "invalid characters like Å, å, Ä, ä or"| tr -c ' a-zA-Z0-9,\n'  ' '
invalid characters like  ,  ,  ,   or
$

I have switched to single quotes which may be better if you need certain special characters to be accepted. In your case you may want to just do
Code:
tr -c ' a-zA-Z0-9,\n'  ' ' < inputfile > outputfile

Read the tr man page for more info.
# 5  
Old 02-01-2008
Question

I used the translate commands as follows:


1) tr -c ' a-zA-Z0-9,\n' ' ' < inputfile > outputfile ==> No result

2) tr -c '[:print:][:cntrl:]' ' ' < inputfile > outputfile ==>

The second command replaced one of the invalid character with space but retained all others. When I again ran the command on the resulted file, the complete files.(input as well as output.)

Please tell me some other combinations.

===================================================
Among the following characters, ~æ£ÇÄ, Ç was replaced.
===================================================

Thanks in advance.
Kanu
# 6  
Old 02-01-2008
It is "Print". I dont know why this emoticon(invalid characters in my life?) came up.

Thanks
# 7  
Old 02-01-2008
You can click "Disable smilies in text" option when you post. I edited your post to do that. Don't know why you're having trouble and I can't test in your environment but [:alnum:] is all letters and numbers. Try that. [:print:] is anything you can see, which is not what you want here. You want to discard some visible characters but keep others. Another thing to try is [a-z][A-Z][0-9] It's supposed to work without the brackets by some very old versions of tr required brackets for a range.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Search for the invalid url in a file

Hello guys, Here i am writing a script to check for a valid url from a file,i am getting the valid url & i print it in a file and i want to print the invalid url also.how to do that? #here is my script if then URL=$(grep -E -o... (2 Replies)
Discussion started by: Meeran Rizvi
2 Replies

2. Shell Programming and Scripting

How to get the Invalid records from a file using awk?

My Input file is fixed length record ends with . as end of the line and the character length is 4156 Example: 12234XYZ TY^4253$+00000-00000........... I need to check is there any control characters(like ^M,^Z) The line will be splitted awk '{id=substr($0,1,5) nm=substr($0,6,3)... (2 Replies)
Discussion started by: dineshaila
2 Replies

3. UNIX for Dummies Questions & Answers

To get the invalid characters from a file

Hello, Can any one help me in below query to search all the invalid characters that UNIX cannot recognize from a file. can we do anything with the help of grep command or any other commands. Also, i am not sure what are the invalid characters present in the file. Many thanks in advance. ... (6 Replies)
Discussion started by: schandru
6 Replies

4. Shell Programming and Scripting

Valid and invalid date in the file

Hi All, How to validate the 4th column,it is date column in the file, if it valid move to valid file else moved invalid file. 9f680174-cb87|20077337254|0|20120511|N 9f680174-cb88|20077337254|0|20120534|N i want two file valid.txt and invalid.txt Thanks, (7 Replies)
Discussion started by: bmk
7 Replies

5. Shell Programming and Scripting

Remove invalid database characters on a file

Hi All - I'm building a script wherein it is design to remove characters that are not accepted on a non-unicode database. Examples are the following: ï,¿,½,Â,é, etc. I can easily sed those characters one-by-one but I there's a problem when other unicode characters are found. Is there any way to... (1 Reply)
Discussion started by: Jin_
1 Replies

6. Shell Programming and Scripting

Capturing the invalid records to error file

HI, I have a source file which has the below data. Tableid,table.txt sourceid,1,2,3,4,5,6 targetid,1,2,3,4,5,6 Tableid,table sourceid,1,2,3,4,5,6 targetid,1,2,3,4,5,6 Tableid,table.txt sourceid,1,2,3,4,5,6 targetid,1,2,3,4,5,6 Tableid,table sourceid,1,2,3,4,5,6 targetid,1,2,3,4,5,6... (6 Replies)
Discussion started by: shruthidwh
6 Replies

7. UNIX for Dummies Questions & Answers

to delete an invalid file

there is a file is generated from my program due to undefined filename. -rw-r--r-- 1 angie angie 8644055 Jun 22 09:17 Ô$ÿÿÿÿÿÆ may i know how to delete this file..??? thanks in advance... :) (5 Replies)
Discussion started by: chxxangie
5 Replies

8. Shell Programming and Scripting

writing shell script to find line of invalid characters

Hi, I have to write s script to check an input file for invalid characters. In this script I have to find the exact line of the invalid character. If the input file contain 2 invalid character sat line 10 and 17, the script will show the value 10 and 17. Any help is appreciated. (3 Replies)
Discussion started by: beginner82
3 Replies

9. Programming

string with invalid characters

This is a pretty straight-forward question. Within a program of mine, I have a string that's going to be used as a filename, but it might have some invalid characters in it that wouldn't be valid in a filename. If there are any invalid characters, I want to get rid of them and essentially squeeze... (4 Replies)
Discussion started by: cleopard
4 Replies

10. Shell Programming and Scripting

Invalid Characters in the file.

I am working on AIX. We ftp files to a database. The flat files are having thousands of records and each record is having some 50 to 60 characters(there are fields having certain character length). In addition to some valid ascii characters some invalid characters like Å, å, Ä, ä or pipes creep in which... (5 Replies)
Discussion started by: kanu_pathak
5 Replies
Login or Register to Ask a Question