Performance models and dynamic characteristics analysis for HDFS write and read operations: A systematic view

B. Dong, Qinghua Zheng, Feng Tian, Kuo-Ming Chao, Nick Godwin, Tian ma, Haipeng Xu

Research output: Contribution to journalArticle

16 Citations (Scopus)

Abstract

Hadoop has emerged as a successful framework for large-scale data-intensive computing applications. However, there is no research on performance models for the Hadoop Distributed File System (HDFS). Due to the complexity of HDFS and the difficulty of modeling the multiple impact factors for HDFS performance, to establish HDFS performance models based directly on these impact factors is very complicated. In this paper, the relationship between file size and HDFS Write/Read (denoted as W/R for short) throughput, i.e., the average flow rate of a HDFS W/R operation, is studied to build HDFS performance models from a systematic view. Based on the measured data of specially designed experiments (in which HDFS W/R operations can be viewed as single-input single-output systems), a system identification-based approach is applied to construct performance models for HDFS W/R operations under different conditions. Furthermore, dynamic characteristics metrics for HDFS performance are defined, and based on the identified performance models and these metrics, the dynamic characteristics of HDFS W/R operations, such as steady state and overshoot, are studied, and the relationships between impact factors and dynamic characteristics are analyzed. These analysis results can provide effective guidance and implications for the design and configuration of HDFS and Hadoop-based applications.
Original languageEnglish
Pages (from-to)132-151
Number of pages19
JournalJournal of Systems and Software
Volume93
Early online date2 Mar 2014
DOIs
Publication statusPublished - Jul 2014

Fingerprint Dive into the research topics of 'Performance models and dynamic characteristics analysis for HDFS write and read operations: A systematic view'. Together they form a unique fingerprint.

  • Cite this