Invalid Characters in the file.


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Invalid Characters in the file.
# 8  
Old 02-01-2008
Thanks Perderabo...

I am using AIX, version 4.

I want to keep the special characters like double quotes("), spaces, +, - etc. retained. When I used :
tr -c '[:print:][:cntrl:]' ' ' < input_file > output_file

all these characters also got removed. (Anything to do with [:print:] ?)


Also, [:alnum:] didn't change the file itself :

tr -c '[:alnum:]' ' ' < subs.bak > sub.err


Have got tired of using so many options. Have to resolve this issue as soon as possible. Clients jumping on me.

Thanks
Kanu
# 9  
Old 02-04-2008
Data

Are there some more options? Thanks in advance..
-Kanu
# 10  
Old 02-04-2008
If the list of characters to be removed is finite have them in the file "unwanted_chars"

and try this code

Code:
#! /opt/third-party/bin/perl

open(FILE, "<", "unwanted_chars");

while(<FILE>) {
  chomp;
  $fileHash{$_} = 0;
}

close(FILE);

open(FILE, "<", $ARGV[0]) || die ("unable to open <$!>\n");

while( read(FILE, $data, 1) == 1 ) {
  $ordVal = ord($data);
  print $data if( !defined($fileHash{$ordVal}) );
}

close(FILE);

exit 0

But if you are sure only ASCII characters need to be in the data file then its much more simpler, range can be specified and no need to use an external lookup file
# 11  
Old 02-04-2008
Thanks Madhan! The list of characters is not finite. These are some dump character which creep into the records. I believe it would be helpful if you see my attached example.

Anyways, the logic is good. Atleast I can remove some of the characters which I know.

Thanks
Kanu
# 12  
Old 02-04-2008
Quote:
These are some dump character which creep into the records.
I doubt that. Could you please double - check what is the encoding character that is set as default and what is the encoding character set of the input file ?

Actually these are valid characters but not for all the character sets.

It might be that they are valid and due to the encoding character set of the terminal it might turn out to be junk or unwanted characters.

Better take a hex dump of the file and verify its needed or not
# 13  
Old 02-04-2008
I dont know why, but I always wanted to get this answer.
I always doubted this. I think at the place of 'space' some dump characters are coming.
How can I check the encoding character set of the terminal?
Can I change it too?
I want to resolve the root cause, so that we can avoid using some more scripts.

Thanks
Kanu
# 14  
Old 02-04-2008
Question

This is what I got :

#echo $LANG
En_US

Helpful?
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Search for the invalid url in a file

Hello guys, Here i am writing a script to check for a valid url from a file,i am getting the valid url & i print it in a file and i want to print the invalid url also.how to do that? #here is my script if then URL=$(grep -E -o... (2 Replies)
Discussion started by: Meeran Rizvi
2 Replies

2. Shell Programming and Scripting

How to get the Invalid records from a file using awk?

My Input file is fixed length record ends with . as end of the line and the character length is 4156 Example: 12234XYZ TY^4253$+00000-00000........... I need to check is there any control characters(like ^M,^Z) The line will be splitted awk '{id=substr($0,1,5) nm=substr($0,6,3)... (2 Replies)
Discussion started by: dineshaila
2 Replies

3. UNIX for Dummies Questions & Answers

To get the invalid characters from a file

Hello, Can any one help me in below query to search all the invalid characters that UNIX cannot recognize from a file. can we do anything with the help of grep command or any other commands. Also, i am not sure what are the invalid characters present in the file. Many thanks in advance. ... (6 Replies)
Discussion started by: schandru
6 Replies

4. Shell Programming and Scripting

Valid and invalid date in the file

Hi All, How to validate the 4th column,it is date column in the file, if it valid move to valid file else moved invalid file. 9f680174-cb87|20077337254|0|20120511|N 9f680174-cb88|20077337254|0|20120534|N i want two file valid.txt and invalid.txt Thanks, (7 Replies)
Discussion started by: bmk
7 Replies

5. Shell Programming and Scripting

Remove invalid database characters on a file

Hi All - I'm building a script wherein it is design to remove characters that are not accepted on a non-unicode database. Examples are the following: ï,¿,½,Â,é, etc. I can easily sed those characters one-by-one but I there's a problem when other unicode characters are found. Is there any way to... (1 Reply)
Discussion started by: Jin_
1 Replies

6. Shell Programming and Scripting

Capturing the invalid records to error file

HI, I have a source file which has the below data. Tableid,table.txt sourceid,1,2,3,4,5,6 targetid,1,2,3,4,5,6 Tableid,table sourceid,1,2,3,4,5,6 targetid,1,2,3,4,5,6 Tableid,table.txt sourceid,1,2,3,4,5,6 targetid,1,2,3,4,5,6 Tableid,table sourceid,1,2,3,4,5,6 targetid,1,2,3,4,5,6... (6 Replies)
Discussion started by: shruthidwh
6 Replies

7. UNIX for Dummies Questions & Answers

to delete an invalid file

there is a file is generated from my program due to undefined filename. -rw-r--r-- 1 angie angie 8644055 Jun 22 09:17 Ô$ÿÿÿÿÿÆ may i know how to delete this file..??? thanks in advance... :) (5 Replies)
Discussion started by: chxxangie
5 Replies

8. Shell Programming and Scripting

writing shell script to find line of invalid characters

Hi, I have to write s script to check an input file for invalid characters. In this script I have to find the exact line of the invalid character. If the input file contain 2 invalid character sat line 10 and 17, the script will show the value 10 and 17. Any help is appreciated. (3 Replies)
Discussion started by: beginner82
3 Replies

9. Programming

string with invalid characters

This is a pretty straight-forward question. Within a program of mine, I have a string that's going to be used as a filename, but it might have some invalid characters in it that wouldn't be valid in a filename. If there are any invalid characters, I want to get rid of them and essentially squeeze... (4 Replies)
Discussion started by: cleopard
4 Replies

10. Shell Programming and Scripting

Invalid Characters in the file.

I am working on AIX. We ftp files to a database. The flat files are having thousands of records and each record is having some 50 to 60 characters(there are fields having certain character length). In addition to some valid ascii characters some invalid characters like Å, å, Ä, ä or pipes creep in which... (5 Replies)
Discussion started by: kanu_pathak
5 Replies
Login or Register to Ask a Question