This site is deprecated and no longer maintained. Please visit the new site for up-to-date information.

This site is deprecated and no longer maintained. Please visit the new site for up-to-date information.

SPARQL Basic Graph Pattern Processing with Iterative MapReduce

From IDSlab

Jump to: navigation, search
Recent | By Year | By Topic | Only SCI or SCIE | In Journal | In Conference | Tabular Form | Search
Title SPARQL Basic Graph Pattern Processing with Iterative MapReduce
Authors

Jaeseok Myung, Jongheum Yeon, Sang-goo Lee

Date 2010-04
Keywords MapReduce, Data Warehouse, Cloud Computing, RDF, SPARQL, Basic Graph Pattern, Query Processing
Acknowledgement ITRC
Publication Type International Conference
Publication Info Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud (MDAC 2010) , Volume , Page
Conference Info International Workshop on Massive Data Analytics over the Cloud (MDAC2010) in conjunction with WWW 2010 Raleigh, USA April 26, 2010 (WWW April 26~30, 2010)
Publisher ACM
SCIE
Other Information ISBN: 978-1-60558-991-6
ISSN:
Link URL DOI
Download Media:MDAC2010-jsmyung.pdf
Related Research
Related Project


Abstract (Korean)



Abstract (English)
There have been a number of approaches to adopt the RDF data model and the MapReduce framework for a data warehouse, as the data model is suitable for data integration and the data processing framework is good for large-scale fault-tolerant data analyses. Nevertheless, most approaches consider the data model and the framework separately. It has been difficult to create synergy because there have been only a few algorithms which connects the data model and the framework. In this paper, we offer a general and efficient MapReduce algorithm for SPARQL Basic Graph Pattern which is a set of triple patterns to be joined. In a MapReduce world, it is known that the join operation requires computationally expensive MapReduce iterations. For this reason, we minimize the number of iterations with the followings. First, we adopt traditional multi-way join into MapReduce instead of multiple individual joins. Second, by analyzing a given query, we select a good join-key to avoid unnecessary iterations. As a result, the algorithm shows good performance and scalability in terms of time and data size.