Journal article
Efficient finer-grained incremental processing with MapReduce for big data
Future Generation Computer Systems, Vol.80, pp.102-111
2018
Abstract
With the continuous development of the Internet and information technology, more and more mobile terminals, wear equipment etc. contribute to the tremendous data. Thanks to the distributed computing, we can analyze the big data with quite high speed. However, many kinds of big data have an obvious common character that the datasets grow incrementally overtime, which means the distributed computing should focus on incremental processing. A number of systems for incremental data processing are available, such as Google’s Percolator and Yahoo’s CBP. However, in order to utilize these mature framework, one needs to make a troublesome change for their program to adapt to the environment requirement.
In this paper, we introduce a MapReduce framework, named HadInc, for efficient incremental computations. HadInc is designed for offline scenes, in which real-time is needless and in-memory cluster computing is invalid. HadInc takes the advantages of finer-grained computing and Content-defined Chunking(CDC) to make sure that the system can still reuse the results which we have computed before, even if the split data has been changed seriously. Instead of re-computing the changed data entirely, HadInc can quickly find out the difference between the new split and the old one, and then merge the delta and old results into the latest result of the new datasets. Meanwhile, the dividing stability of the datasets is a key factor for reusing the results. In order to guarantee the stability of the dataset’s division, we propose a series of novel algorithms based on CDC.
We implemented HadInc by extending the Hadoop framework, and evaluated it with many experiments including three specific cases and a practical case. From the comparing results it can be seen that the proposed HadInc is very efficient.
Details
- Title
- Efficient finer-grained incremental processing with MapReduce for big data
- Authors/Creators
- L. Zhang (Author/Creator) - Xidian UniversityY. Feng (Author/Creator) - Xidian UniversityP. Shen (Author/Creator) - Xidian UniversityG. Zhu (Author/Creator) - Xidian UniversityW. Wei (Author/Creator) - School of Computer Science and Engineering, Xian University of Technology, Xian 710048, ChinaJ. Song (Author/Creator) - Xidian UniversityS.A.A. Shah (Author/Creator) - The University of Western AustraliaM. Bennamoun (Author/Creator) - The University of Western Australia
- Publication Details
- Future Generation Computer Systems, Vol.80, pp.102-111
- Publisher
- Elsevier
- Identifiers
- 991005541752707891
- Copyright
- © 2017 Elsevier B.V.
- Murdoch Affiliation
- Murdoch University
- Language
- English
- Resource Type
- Journal article
Metrics
28 Record Views
InCites Highlights
These are selected metrics from InCites Benchmarking & Analytics tool, related to this output
- Collaboration types
- Domestic collaboration
- International collaboration
- Citation topics
- 4 Electrical Engineering, Electronics & Computer Science
- 4.48 Knowledge Engineering & Representation
- 4.48.1522 Big Data
- Web Of Science research areas
- Computer Science, Theory & Methods
- ESI research areas
- Computer Science