Conference paper
Non-segmented document clustering using self-organizing map and frequent max substring technique
Springer Verlag
16th International Conference on Neural Information Processing, ICONIP 2009 (Bangkok, 01/12/2009–05/12/2009)
2009
Abstract
This paper proposes a non-segmented document clustering method using self-organizing map (SOM) and frequent max substring mining technique to improve the efficiency of information retrieval. The proposed technique appears to be a promising alternative for clustering non-segmented text documents. To illustrate the proposed technique, experiment on clustering the Thai text documents is presented in this paper. The frequent max substring mining technique is first applied to discover the patterns of interest called Frequent Max substrings or FM from the non-segmented Thai text documents. These discovered patterns are then used as indexing terms, together with their number of occurrences, to form a document vector. SOM is then applied to generate the document cluster map by using the document vector. As a result, the generated document cluster map can be used to find the relevant documents according to a user's query more efficiently.
Details
- Title
- Non-segmented document clustering using self-organizing map and frequent max substring technique
- Authors/Creators
- T. Chumwatana (Author/Creator)K.W. Wong (Author/Creator)H. Xie (Author/Creator)
- Conference
- 16th International Conference on Neural Information Processing, ICONIP 2009 (Bangkok, 01/12/2009–05/12/2009)
- Publisher
- Springer Verlag
- Identifiers
- 991005542240207891
- Copyright
- © 2009 Springer-Verlag
- Murdoch Affiliation
- School of Information Technology
- Language
- English
- Resource Type
- Conference paper
- Note
- Appears in "Neural Information Processing" Lecture Notes in Computer Science, Volume 5864/2009 pp. 691-698
Metrics
63 Record Views