Automatic web content extraction for generating tag clouds from Thai web sites

W. Thanadechteemapat; C.C. Fung

doi:10.1109/ICEBE.2011.34

Back

Automatic web content extraction for generating tag clouds from Thai web sites

Conference paper

Open access

Automatic web content extraction for generating tag clouds from Thai web sites

W. Thanadechteemapat and C.C. Fung

2011 IEEE 8th International Conference on e-Business Engineering, pp.85-89

2011 8th IEEE International Conference on e-Business Engineering, ICEBE 2011 (Beijing,China, 19/10/2011–21/10/2011)

2011

DOI: https://doi.org/10.1109/ICEBE.2011.34

Files and links (2)

pdf

automatic_web_content_extraction.pdfDownload View

Author’s Version Open Access

url

Link to Published Version *Subscription may be requiredView

Abstract

This paper proposes a novel Web content extraction approach based on heuristic rules and the XPath utility in XML. The main objective is to address the problem of Web visualization by generating tag clouds from Thai Web sites in order to provide an overview of the key words in the Web pages. This paper also proposes a detailed method to assess the Web content extraction technique on a single Web page by using the length of the extracted content. There are three main steps in the proposed technique: Web page elements and features extraction, Block detection, and Content extraction selection. The empirical results have shown this technique produces high accuracies.

Details

Title: Automatic web content extraction for generating tag clouds from Thai web sites
Authors/Creators: W. Thanadechteemapat (Author/Creator) - Murdoch University
C.C. Fung (Author/Creator) - Murdoch University
Publication Details: 2011 IEEE 8th International Conference on e-Business Engineering, pp.85-89
Conference: 2011 8th IEEE International Conference on e-Business Engineering, ICEBE 2011 (Beijing,China, 19/10/2011–21/10/2011)
Identifiers: 991005540077407891
Murdoch Affiliation: School of Information Technology
Language: English
Resource Type: Conference paper
Note: Appears in Proceedings - 2011 8th IEEE International Conference on e-Business Engineering, ICEBE 2011 2011, Article number 6104601, Pages 85-89

Metrics

485 File views/ downloads

98 Record Views