PoS - Proceedings of Science
Volume 299 - The 7th International Conference on Computer Engineering and Networks (CENet2017) - Session II - Wireless communication
A Method to Improve the Performance for Storing Massive Small Files in Hadoop
T. Zheng*, G. Fan and W. Guo
Full text: pdf
Pre-published on: July 17, 2017
Published on: September 06, 2017
Abstract
As a new open source project, Hadoop provides a new way to store massive data. Because of high scalability, low cost, good flexibility, high speed and strong fault tolerance performance, it has been widely adopted by the internet companies. However, the performance of Hadoop will reduce significantly once it is used to handle massive small files. As a result, this paper proposes a new scheme to merge small files, which occupy much memory in NameNode, into large files and establish the mapping relationship between small files and large files, and then store the mapping information in HBase. In order to improve the reading performance, the scheme provides a prefetching mechanism by analyzing the access logs and
putting the metadata frequently accessed merge files in the client’s memory. The experiment results show that this scheme can efficiently optimize small files storage in HDFS, thus reduce the overload of
NameNode and improve the performance of file access.
DOI: https://doi.org/10.22323/1.299.0022
How to cite

Metadata are provided both in "article" format (very similar to INSPIRE) as this helps creating very compact bibliographies which can be beneficial to authors and readers, and in "proceeding" format which is more detailed and complete.

Open Access
Creative Commons LicenseCopyright owned by the author(s) under the term of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.