The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers
Google UNIX.COM
Home Forums Register Rules & FAQ Members List Arcade Search Today's Posts Mark Forums Read


UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!


Other UNIX.COM Threads You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Removing certain characters in a file bombcan Shell Programming and Scripting 2 04-25-2008 12:53 PM
Replacing characters in csv file finwhiz UNIX for Dummies Questions & Answers 1 03-31-2008 02:25 AM
Invalid Characters in the file. kanu_pathak Shell Programming and Scripting 5 02-01-2008 05:45 AM
how to see special characters in a file using vi jingi1234 UNIX for Dummies Questions & Answers 6 10-19-2005 08:57 AM
grepping the first 3 characters from a file rachael UNIX for Dummies Questions & Answers 2 10-15-2001 11:33 AM

Reply
 
Submit Tools LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 01-31-2008
Registered User
 

Join Date: Jan 2008
Location: India
Posts: 14
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiReddit! Stumble this Post!Spurl this Post!
Invalid Characters in the file.

I am working on AIX. We ftp files to a database. The flat files are having thousands of records and each record is having some 50 to 60 characters(there are fields having certain character length). In addition to some valid ascii characters some invalid characters like Å, å, Ä, ä or pipes creep in which datawarehouse rejects to load in.
Example: AcuM-^?a 051706 ;
above is a field in the record which is having special characters like -,^ and ? , which should not have been there.

The record separator is a new line and there is no field seperator.

How can I remove these invalid or special characters to creep in the records?
Please help me to find the logic in the shell sripting..
Reply With Quote
Forum Sponsor
  #2 (permalink)  
Old 01-31-2008
Perderabo's Avatar
Unix Daemon
 

Join Date: Aug 2001
Location: Washington DC Area
Posts: 8,207
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiReddit! Stumble this Post!Spurl this Post!
Code:
$ echo "invalid characters like Å, å, Ä, ä or"
invalid characters like Å, å, Ä, ä or
$ echo "invalid characters like Å, å, Ä, ä or"| tr -dc " a-zA-Z0-9,\n"
invalid characters like , , ,  or
$
Reply With Quote
  #3 (permalink)  
Old 01-31-2008
Registered User
 

Join Date: Jan 2008
Location: India
Posts: 14
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiReddit! Stumble this Post!Spurl this Post!
Thanks for the reply Perderabo. There are some more doubts;

=>There can be lots more invalid character like these, so shall I give all them in the 'like' command?

=>If I want to replace the 'invalid character' with a 'space' how can I do that?

=>How to run whole process for thousands of records in the file?

Attached is an example in this regard...
Attached Files
File Type: txt Example.txt (1.4 KB, 3 views)
Reply With Quote
  #4 (permalink)  
Old 01-31-2008
Perderabo's Avatar
Unix Daemon
 

Join Date: Aug 2001
Location: Washington DC Area
Posts: 8,207
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiReddit! Stumble this Post!Spurl this Post!
The
echo "invalid characters like Å, å, Ä, ä or"
is providing the input data with illegal characters that need removal. I need some test data and this is one way to demo a command. And the command I am showing is
tr -dc " a-zA-Z0-9,\n"
and that is what removes the garbage. The tr command, in this form, lists the valid characters, not the invalid ones. You may need to add stuff to the list. To replace invalid characters with a space use
Code:
$ echo "invalid characters like Å, å, Ä, ä or"| tr -c ' a-zA-Z0-9,\n'  ' '
invalid characters like  ,  ,  ,   or
$
I have switched to single quotes which may be better if you need certain special characters to be accepted. In your case you may want to just do
Code:
tr -c ' a-zA-Z0-9,\n'  ' ' < inputfile > outputfile
Read the tr man page for more info.
Reply With Quote
  #5 (permalink)  
Old 02-01-2008
Registered User
 

Join Date: Jan 2008
Location: India
Posts: 14
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiReddit! Stumble this Post!Spurl this Post!
Question

I used the translate commands as follows:


1) tr -c ' a-zA-Z0-9,\n' ' ' < inputfile > outputfile ==> No result

2) tr -c '[:print:][:cntrl:]' ' ' < inputfile > outputfile ==>

The second command replaced one of the invalid character with space but retained all others. When I again ran the command on the resulted file, the complete files.(input as well as output.)

Please tell me some other combinations.

===================================================
Among the following characters, ~æ£ÇÄ, Ç was replaced.
===================================================

Thanks in advance.
Kanu
Reply With Quote
Google UNIX.COM
Reply



Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -7. The time now is 05:34 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited.
The UNIX and Linux Forums Content Copyright ©1993-2008 The CEP Blog All Rights Reserved -Ad Management by RedTyger

Search Engine Optimization by vBSEO 3.1.0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102