Author
Listed:
- Yashvardhan Sharma
(Department of Computer Science and Information Systems, Birla Institute of Technology and Science, Pilani, India)
- Saurabh Verma
(Department of Computer Science and Information Systems, Birla Institute of Technology and Science, Pilani, India)
- Sumit Kumar
(Department of Computer Science and Information Systems, Birla Institute of Technology and Science, Pilani, India)
- Shivam U.
(Department of Computer Science and Information Systems, Birla Institute of Technology and Science, Pilani, India)
Abstract
To achieve high reliability and scalability, most large-scale data warehouse systems have adopted the cluster-based architecture. In this context, MapReduce has emerged as a promising architecture for large scale data warehousing and data analytics on commodity clusters. The MapReduce framework offers several lucrative features such as high fault-tolerance, scalability and use of a variety of hardware from low to high range. But these benefits have resulted in substantial performance compromise. In this paper, we propose the design of a novel cluster-based data warehouse system, Daenyrys for data processing on Hadoop – an open source implementation of the MapReduce framework under the umbrella of Apache. Daenyrys is a data management system which has the capability to take decision about the optimum partitioning scheme for the Hadoop's distributed file system (DFS). The optimum partitioning scheme improves the performance of the complete framework. The choice of the optimum partitioning is query-context dependent. In Daenyrys, the columns are formed into optimized groups to provide the basis for the partitioning of tables vertically. Daenyrys has an algorithm that monitors the context of current queries and based on the observations, it re-partitions the DFS for better performance and resource utilization. In the proposed system, Hive, a MapReduce-based SQL-like query engine is supported above the DFS.
Suggested Citation
Yashvardhan Sharma & Saurabh Verma & Sumit Kumar & Shivam U., 2013.
"A Context-Based Performance Enhancement Algorithm for Columnar Storage in MapReduce with Hive,"
International Journal of Cloud Applications and Computing (IJCAC), IGI Global, vol. 3(4), pages 38-50, October.
Handle:
RePEc:igg:jcac00:v:3:y:2013:i:4:p:38-50
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:igg:jcac00:v:3:y:2013:i:4:p:38-50. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Journal Editor (email available below). General contact details of provider: https://www.igi-global.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.