An evaluation study on text categorization using automatically generated labeled dataset

D. Zhu; K.W. Wong

doi:10.1016/j.neucom.2016.04.072

Back

An evaluation study on text categorization using automatically generated labeled dataset

Journal article

Open access

Peer reviewed

An evaluation study on text categorization using automatically generated labeled dataset

D. Zhu and K.W. Wong

Neurocomputing, Vol.249, pp.321-336

2017

DOI: https://doi.org/10.1016/j.neucom.2016.04.072

Files and links (2)

pdf

evaluation-study-on-text-categorization.pdfDownload View

Author’s Version Open Access

url

Link to Published Version *Subscription may be requiredView

Abstract

Naïve Bayes, k-nearest neighbors, Adaboost, support vector machines and neural networks are five among others commonly used text classifiers. Evaluation of these classifiers involves a variety of factors to be considered including benchmark used, feature selections, parameter settings of algorithms, and the measurement criteria employed. Researchers have demonstrated that some algorithms outperform others on some corpus, however, inconsistency of human labeling and high dimensionality of feature spaces are two issues to be addressed in text categorization. This paper focuses on evaluating the five commonly used text classifiers by using an automatically generated text document collection which is labeled by a group of experts to alleviate subjectivity of human category assignments, and at the same time to examine the influence of the number of features on the performance of the algorithms.

Details

Title: An evaluation study on text categorization using automatically generated labeled dataset
Authors/Creators: D. Zhu (Author/Creator) - Curtin University
K.W. Wong (Author/Creator) - Murdoch University
Publication Details: Neurocomputing, Vol.249, pp.321-336
Publisher: Elsevier B.V.
Identifiers: 991005545345307891
Murdoch Affiliation: School of Engineering and Information Technology
Language: English
Resource Type: Journal article

Metrics

209 File views/ downloads

55 Record Views

6 Times Cited - Web of Science

InCites Highlights

These are selected metrics from InCites Benchmarking & Analytics tool, related to this output

Collaboration types: Domestic collaboration
Citation topics: 4 Electrical Engineering, Electronics & Computer Science; 4.61 Artificial Intelligence & Machine Learning; 4.61.145 Classification Algorithms
Web Of Science research areas: Computer Science, Artificial Intelligence
ESI research areas: Computer Science