The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers
Google UNIX.COM



View Single Post in UNIX Forums - Click on the Thread or Permalink to View Entire Thread -->
  #7 (permalink)  
Old 02-05-2002
Perderabo's Avatar
Perderabo Perderabo is online now
Unix Daemon
 

Join Date: Aug 2001
Location: Washington DC Area
Posts: 8,712
The only language that I ever have encountered with a built-in test for this was perl. I dislike perl and seldom use it. But I did give this feature a try. It seemed broken because it allowed many non-ascii characters before it finally declared a file to be binary. Since I then had to code my own test, I returned to ksh. But I do prefer perl's terminology. It calls this "text files" and "binary files".

Unless you inspect every byte of the file, you are not going to get this 100%. And there is a big performance hit with inspecting every byte. But after some experiments, I settled on an algorithm that works for me. I examine the first line and declare the file to be binary if I encounter even one non-text byte. It seems a little slack, I know, but I seem to get away with it.

Here is a little script that demonstrates this. Note that where I have used (TAB) to indicate a place where you must actually type the tab character.
Code:
#! /usr/bin/ksh
typeset -L30 fmtfile
for file in * ; do
      if read line < $file ; then
           if [[ "$line" = *[!\(TAB)\ -\~]* ]] ; then
                 type=binary
           else
                 type=text
           fi
      else
           type=unreadable
      fi 2> /dev/null
      fmtfile=$file
      echo "$fmtfile is a $type file"
done
exit 0