HPL-2008-91R1
Extremely Fast Text Feature Extraction for Classification and Indexing - Forman, George; Kirshenbaum, Evan
Keyword(s): text mining, text indexing, bag-of-words, feature engineering, feature extraction, document categorization, text tokenization
Abstract: Most research in speeding up text mining involves algorithmic improvements to induction algorithms, and yet for many large scale applications, such as classifying or indexing large document repositories, the time spent extracting word features from texts can itself greatly exceed the initial trainin ...
Full Report
More...