ksh check for non printable characters in a string


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting ksh check for non printable characters in a string
# 1  
Old 02-10-2015
ksh check for non printable characters in a string

Hi All,

I am trying to find non-printable characters in a string. The sting could have alphanumeric, puntuations and characters like (*&%$#.') but not non-printable (or that is what I think they are called) which are introduced when you copy any text from DOS to unix box.

Input string1:
Code:
 TEXT="This is a sample text with supposedly non-printable character^Y."

Input string2:
Code:
TEXT="This is a sample text with supposedly non-printable character"

I've got a code from the website for bash but still this is not working.

Code:
if ! [[ "$TEXT" =~ ^[a-zA-Z0-9]+$ ]];then echo "invalid"; fi

Note: I've removed the dot also "." from "Input string2" just to check if the command works or not.

This prints "invalid" in both the cases?! Also is there any equivalent command that works in KSH? Please suggest.

-dips

Last edited by rbatte1; 02-10-2015 at 06:12 AM.. Reason: Tightened CODE tags
# 2  
Old 02-10-2015
Your pattern is treating <space> as a non-printable character and, since it is present in both strings, you are getting invalid for both.

For a more portable test that should work with any POSIX conforming shell, try:
Code:
if [ "${TEXT#*[![:print:]]}" = "$TEXT" ];then echo 'no non-printables found';else echo 'non-printable found';fi

# 3  
Old 02-10-2015
Hi Don,

Thanks for telling me that <space> is being treated like non-printable character. But I want this seach to look for all NON UTF-8 characters actually, I don't have any inkling on how to check those?

The XML file of the application takes only UTF-8 characters and anything other than this will not let the jobs run through this application. Hence is there any way to check for UTF-8 characters? Can you please suggest?

For e.g.
Code:
TEXT="This is a sample text with supposedly non-printable character^Y."

The highlighted character shown in the file is what I've in my application which when seen in unix appears to be ^Y. How to identify such characters?

-dips
ksh check for non printable characters in a string-special_charpng
# 4  
Old 02-10-2015
If what you show as ^Y represents <CTRL> Y (0x19, "EM"), it is member of the ASCII char set which in turn is a subset of UTF-8. Although there exist byte sequences that are not valid UTF-8 characters, they should not show up in texts or HTML files, unless created by a failed transmission or conversion.
Please show us a hexdump (od -ctx1 file) of your problematic file.
# 5  
Old 02-10-2015
This works in ksh93 or bash3:
Code:
if [[ $TEXT =~ [^[:print:]] ]] ; then echo invalid; fi

# 6  
Old 02-11-2015
Hi,

I'll be not able to convert it to hex file.

But in a simplest manner, can I check for only alphanumeric characters plus few punctuations which I know will get passed?

Code:
if [ "${TEXT#*[![:alnum:]][.,;:'"/\()-_+=~@&*]}" = "$TEXT" ];then echo 'no non-printables found';else echo 'non-printable found';fi

This is clearly not working. Can you please help me?
-dips
# 7  
Old 02-11-2015
Quote:
Originally Posted by dips_ag
Hi,

I'll be not able to convert it to hex file.

But in a simplest manner, can I check for only alphanumeric characters plus few punctuations which I know will get passed?

Code:
if [ "${TEXT#*[![:alnum:]][.,;:'"/\()-_+=~@&*]}" = "$TEXT" ];then echo 'no non-printables found';else echo 'non-printable found';fi

This is clearly not working. Can you please help me?
-dips
You have the syntax off a little bit, but that is close (and I assume you don't want <space> to cause a "non-printable found" either). Try:
Code:
if [ "${TEXT#*[![:alnum:] .,;:'"/\()_+=~@&*-]}" = "$TEXT" ];then echo 'no non-printables found';else echo 'non-printable found';fi

Note that a space was added, one pair of square brackets was removed, and the minus sign was moved to the end of the non-matching bracket expression element list.

And, yes you can convert your string to hex:
Code:
printf '%s' "$TEXT" | od -t co1x1

will display your string as characters, octal bytes, and hex bytes. (If your version of od doesn't have a -t option, just use:
Code:
printf '%s' "$TEXT" | od -cb

to get character and octal byte output.)
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Redirecting records with non-printable characters

Hi, I have a huge file (50 Mil rows) which has certain non-printable ASCII characters in it. I am cleaning the file by deleting those characters using the following command - tr -cd '\11\12\15\40-\176' < unclean_file > clean_file Please note that I am excluding the following - tab,... (6 Replies)
Discussion started by: rishigc
6 Replies

2. Shell Programming and Scripting

Unable to grep control/non printable characters

Unable to grep: Able to grep: (11 Replies)
Discussion started by: proactiveaditya
11 Replies

3. UNIX for Dummies Questions & Answers

removing non printable characters

Hi, in a file, i have records as below: 123|62|absnb|267629 123|267|28728|uiuip 123|567|26761|2676 i want to remove the non printable characters after the end of each record. I guess there are certain charcters but not visible. i don't know what character that is exactly. I used... (2 Replies)
Discussion started by: pandeesh
2 Replies

4. Shell Programming and Scripting

Removing Non-printable characters in unix file

Hi, We have a non printable character "®" in our file , we want to remove this character, we tried tr -dc '' < oldfile> newfile but this command is removing all new line entries along with the non printable character and all the records are coming in one line(it is changing the format of the... (2 Replies)
Discussion started by: pyaranoid
2 Replies

5. Shell Programming and Scripting

Check whether there is a non printable character in the unix variables

cp $l_options $srcdirfile $destdirfile If i want to check whether there is a non printable character in the variables $l_options $srcdirfile $destdirfile how it can be done? (2 Replies)
Discussion started by: lalitpct
2 Replies

6. HP-UX

Non-printable characters

I have been using OKI data Microline printers; models 590 and 591 to print a bar code using the following escape sequence: \E^PA^H^C00^D^C^A^A^A\E^PB^H The escape sequence is stored in a unix file which is edited using vi. Now, we are considering Microline printer model 395C and the bar code... (3 Replies)
Discussion started by: Joy Conner
3 Replies

7. UNIX for Dummies Questions & Answers

delete non printable characters from file

i have a file which contains non printable characters like enter,escape etc i want to delete them from the file (2 Replies)
Discussion started by: alokjyotibal
2 Replies

8. Shell Programming and Scripting

grep non printable characters

Sometimes obvious things... are not so obvious. I always thought that it was possible to grep non printable characters but not with my GNU grep (5.2.1) version. printf "Hello\tWorld" | grep -l '\t' printf "Hello\tWorld" | grep -l '\x09' printf "Hello\tWorld" | grep -l '\x{09}' None of them... (3 Replies)
Discussion started by: ripat
3 Replies

9. UNIX for Dummies Questions & Answers

Ksh Checking if string has 2 characters and does not contain digits?

How could I check if a string variable contains at least (or only) 2 characters, and check and make sure that the string does not contain any numeric digits?...I need to know how to do this as simple as possible. and I am using the Ksh shell. Thanks. (1 Reply)
Discussion started by: developncode
1 Replies

10. Shell Programming and Scripting

Best way to search files for non-printable characters?

I need to check ftp'd incoming files for characters that are not alphanumeric,<tab>, <cr>, or <lf> characters. Each file would have 10-20,000 line with up to 3,000 characters per line. Should I use awk, sed, or grep and what would the command look like to do such a search? Thanks much to anyone... (2 Replies)
Discussion started by: jvander
2 Replies
Login or Register to Ask a Question