HPL-2007-32 (R.1)
BNS Scaling: An Improved Representation over TF.IDF for SVM Text Classification - Forman, George
Keyword(s): text classification; topic identification; machine learning; feature selection; Support Vector Machine; TF*IDF text representation
Abstract: In the realm of machine learning for text classification, TF.IDF is the most widely used representation for real-valued feature vectors. Unfortunately, it is oblivious to the training class labels, and naturally scales some features inappropriately. We replace IDF with Bi-Normal Separation (BNS), wh ...
Full Report
More...