Tim Bass
11-24-2008 03:14 PM
MapReduce is a software framework implemented in
C++ with interfaces in
Python and
Java introduced by
Google to support parallel computations over large (multiple
petabyte) data sets on clusters of computers.* The Apache*
Hadoop project is a free open source
Java MapReduce implementation.*
Mahout is an Apache project, based on Hadoop, with an objective to build scalable, Apache-licensed machine learning libraries.
The Mahout team is initially focused on building the ten machine learning libraries detailed in
Map-Reduce for Machine Learning on Multicore by seven members of Stanford’s computer science department.** These libraries, some, if not all, critical for “real” complex event processing, include;
- Locally Weighted Linear Regression (LWLR),
- Naive Bayes (NB),
- Gaussian Discriminative Analysis (GDA),
- k-means,
- Logistic Regression (LR),
- Neural Network (NN),
- Principal Components Analysis (PCA),
- Independent Component Analysis (ICA),
- Expectation Maximization (EM), and
- Support Vector Machine (SVM)
Ready to move beyond rules and rule-based systems to process complex events? Interested folks should visit the
Apache Mahout Wiki.
Note: See also,
Map-Reduce: A Relevance to Analytics?
Source...