The only language that I ever have encountered with a built-in test for this was
perl. I dislike
perl and seldom use it. But I did give this feature a try. It seemed broken because it allowed many non-ascii characters before it finally declared a file to be binary. Since I then had to code my own test, I returned to ksh. But I do prefer
perl's terminology. It calls this "text files" and "binary files".
Unless you inspect every byte of the file, you are not going to get this 100%. And there is a big performance hit with inspecting every byte. But after some experiments, I settled on an algorithm that works for me. I examine the first line and declare the file to be binary if I encounter even one non-text byte. It seems a little slack, I know, but I seem to get away with it.
Here is a little script that demonstrates this. Note that where I have used (TAB) to indicate a place where you must actually type the tab character.
Code:
#! /usr/bin/ksh
typeset -L30 fmtfile
for file in * ; do
if read line < $file ; then
if [[ "$line" = *[!\(TAB)\ -\~]* ]] ; then
type=binary
else
type=text
fi
else
type=unreadable
fi 2> /dev/null
fmtfile=$file
echo "$fmtfile is a $type file"
done
exit 0