Combine is an open and extensible system for crawling Internet resources, including harvesting and indexing. It can be used both as a general and focused crawler. Integration with database systems are provided in order to make complete vertical search engine generation possible.
License: GNU General Public License (GPL)
Changes:
A fulltext-index was added in MySQL table search, as was a configuration variable to enable or disable it. Integration with the Zebra database system was fixed. Updates, fixes, and code cleaning were done. Support for SVM classifiers was added (which depends on SVMLight). Country determination was added (adding a dependency on GeoIp). Two new PlugIn types were added: "relevant text extraction" and "extra analysis".
More...