I'd say you need (at least) 3 components:
- A crawler that downloads pages, and follows links on those pages.
- An indexer that builds a list of words used on each page (maybe in relation to other words nearby), and saves that to a database.
- A front-end to query the database.
For the crawler you can use just about any language since the main limitation is the network speed. For the indexer I'd recommend either C/C++ (for speed) or a language geared towards natural language processing (like Perl). For the front-end you can again choose whatever language you're comfortable with.