Main Image
Volume 299 - The 7th International Conference on Computer Engineering and Networks (CENet2017) - Session II - Wireless communication
A Method to Improve the Performance for Storing Massive Small Files in Hadoop
T. Zheng,* G. Fan, W. Guo
*corresponding author
Full text: pdf
Pre-published on: 2017-07-17 11:56:25
Published on: 2017-09-06 14:01:04
Abstract
As a new open source project, Hadoop provides a new way to store massive data. Because of high scalability, low cost, good flexibility, high speed and strong fault tolerance performance, it has been widely adopted by the internet companies. However, the performance of Hadoop will reduce significantly once it is used to handle massive small files. As a result, this paper proposes a new scheme to merge small files, which occupy much memory in NameNode, into large files and establish the mapping relationship between small files and large files, and then store the mapping information in HBase. In order to improve the reading performance, the scheme provides a prefetching mechanism by analyzing the access logs and
putting the metadata frequently accessed merge files in the client’s memory. The experiment results show that this scheme can efficiently optimize small files storage in HDFS, thus reduce the overload of
NameNode and improve the performance of file access.
Open Access
Creative Commons LicenseCopyright owned by the author(s) under the term of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.