Logo image
An evaluation study on text categorization using automatically generated labeled dataset
Journal article   Open access   Peer reviewed

An evaluation study on text categorization using automatically generated labeled dataset

D. Zhu and K.W. Wong
Neurocomputing, Vol.249, pp.321-336
2017
pdf
evaluation-study-on-text-categorization.pdfDownloadView
Author’s Version Open Access
url
Link to Published Version *Subscription may be requiredView

Abstract

Naïve Bayes, k-nearest neighbors, Adaboost, support vector machines and neural networks are five among others commonly used text classifiers. Evaluation of these classifiers involves a variety of factors to be considered including benchmark used, feature selections, parameter settings of algorithms, and the measurement criteria employed. Researchers have demonstrated that some algorithms outperform others on some corpus, however, inconsistency of human labeling and high dimensionality of feature spaces are two issues to be addressed in text categorization. This paper focuses on evaluating the five commonly used text classifiers by using an automatically generated text document collection which is labeled by a group of experts to alleviate subjectivity of human category assignments, and at the same time to examine the influence of the number of features on the performance of the algorithms.

Details

Metrics

209 File views/ downloads
55 Record Views

InCites Highlights

These are selected metrics from InCites Benchmarking & Analytics tool, related to this output

Collaboration types
Domestic collaboration
Citation topics
4 Electrical Engineering, Electronics & Computer Science
4.61 Artificial Intelligence & Machine Learning
4.61.145 Classification Algorithms
Web Of Science research areas
Computer Science, Artificial Intelligence
ESI research areas
Computer Science
Logo image