First, convert it to a text file. There is a command, antiword, to extract the text from a MS .doc file.