A REVIEW ON SMALL FILES IN HADOOP A NOVEL APPROACH TO UNDESTAND SMALL FILES PROBLEM IN HADOOP

Avanti Khare, Prof. Dr. B. Indira

Abstract


Hadoop is an open source data management system designed for storing and processing large volumes of data, minimum size being 64MB. Storing and processing of Small Files smaller than the minimum block size cannot be efficiently handled by hadoop because Small Files results in lots of seeks and lots of hopping between the datanodes.  A survey on the existing literature has been carried out to analyze the effect / solutions for the Small Files problem in hadoop. This paper presents the same and lists many effective solutions for this problem and further this paper says that there is a need to carry out lot of research on small file problem in order to attain effective and efficient solutions.

Keywords


Hadoop; HDFS; Small Files; Datanode;

References


“Hadoop: The Definitive Guide” Tom White

The Small Files Problem by Tom White February 2, 2009

“Apache Hadoop” http.apache.org/2009

“Improving Metadata Management for Small Files in HDFS”. Grant Mackey, Saba Sehrish, Jun Wang, University of Central Florida, Orlando. 978-1-4244-5012-1/09/$25.00 ©2009 IEEE

“An optimized approach for storing and accessing Small Files on cloud storage”. Bo Dong, Qinghua Zheng, Feng Tian, Kuo-Ming Chao, Rui Ma, Rachid Anane MOE. Journal of Network and Computer Applications 35 (2012) 1847–1862

“Hadoop: What It Is And How It Works” Brian Proffitt May 23, 2013

“Efficient Way for Handling Small Files using Extended HDFS”. Kashmira P. Jayakar, Y.B.Gurav, International Journal of Computer Science and Mobile Computing, Vol.3 Issue.6, June- 2014, Pune University, India

“Efficient FP Growth using Hadoop-(Improved Parallel FP-Growth)”. Sankalp Mitra, Suchit Bande, Shreyas Kudale, Advait Kulkarni, Asst. Prof. Leena A. Deshpande. International Journal of Scientific and Research Publications, Volume 4, Issue 7, July 2014 1 ISSN 2250-3153, VIIT, Pune

“An Innovative Strategy for Improved Processing of Small Files in Hadoop”. Priyanka Phakade1, Dr. Suhas Raut, International Journal of Application or Innovation in Engineering & Management (IJAIEM) Volume 3, Issue 7, July 2014, Solapur

“Managing Small Size Files through Indexing in Extended Hadoop File System”. K. P. Jayakar, Y. B. Gurav. International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 8, August 2014 University of Pune India

“Performance Evaluation of Hadoop Distributed FileSystem and Local File System”. International Journal of Science and Research (IJSR) ISSN +Volume 3 Issue 9, September 2014 Linthala Srinithya, Dr. G. Venkata Rami Reddy, JNTUH, Hyderabad, India

“Improving the Performance of Processing for Small Files in Hadoop: A Case Study of Weather Data Analytics”. Guru Prasad M S1, Nagesh H R 2, Deepthi M. Guru Prasad M S et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (5) , 2014, 6436-6439

“An Optimized Storing and Accessing Mechanism for Small Files on HDFS”. Shrikrishna Utpat, K. A. Dehamane, Srinivasa Kini. Maharashtra, India. IJARCSSE, Volume 5, Issue 1, January 2015

“SFMapReduce: An Optimized MapReduce Framework for Small Files”. Fang Zhou, Hai Pham, Jianhui Yue, Hao Zou, Weikuan Yu, Auburn University 978-1-4673-7891-8/15/$31.00 ©2015 IEEE

“Research on the Small Files Problem of Hadoop”. Xiao Jun Liu, Chong Peng, Zhi Chao Yu, Huanggang Normal University, Hubei Huanggang, China. International Conference on Education, Management, Commerce and Society (EMCS 2015)

Working with Small Files in hadoop – Part 1. Chris Deptula, Feb 11, 2015

Working with Small Files in hadoop – Part 2. Chris Deptula, Feb 18, 2015

“Performance enhancement for accessing small-files in hadoop”. Varun Pal, Mrs. D. Hemavathi, SRM University, Chennai, India. IJERSS Volume 2 | Issue 4 APRIL 2015

“Optimization Scheme for Small Files Storage Based on Hadoop Distributed File System”. Yingchi Mao, Bicong Jia, Wei Min and Jiulong Wang, Hohai University, China. International Journal of Database Theory and Application Vol.8, No.5 (2015)

“Improving Access Efficiency of Small Files in HDFS”. Monica B. Bisane, Asst.Prof. Pushpanjali M. Chouragade, Amravati, India. International Journal of Scientific & Engineering Research, Volume 7, Issue 2, February-2016 ISSN 2229-5518

“An Improved HDFS for Small File”. Liu Changtong, Huazhong University, China. ISBN Jan. 31 ~ Feb. 3, 2016.

“HDFSX: Big Data Distributed File System with Small Files Support”. Passent M EIKafrawy, AmrM Sauber, Mohamed M Hafez, Menofia University, Egypt. 9781509028634/16/$31.00 ©2016 IEEE

“Optimization Scheme for Storing and Accessing Huge Number of Small Files on HADOOP Distributed File System”. L. Prasanna Kumar, Sampathirao Suneetha. Andhra University, Visakhapatnam. IJRITCC | February 2016

“An Efficient Approach for Storing and Accessing Small Files with Big Data Technology”. Bharti Gupta, Rajender Nath, Girdhar Gopal, Kartik. Kurukshetra University, Haryana, India. International Journal of Computer Applications (0975 – 8887) Volume 146 – No.1, July 2016.


Full Text: PDF

Refbacks

  • There are currently no refbacks.




Copyright © 2012 - 2023, All rights reserved.| ijitr.com

Creative Commons License
International Journal of Innovative Technology and Research is licensed under a Creative Commons Attribution 3.0 Unported License.Based on a work at IJITR , Permissions beyond the scope of this license may be available at http://creativecommons.org/licenses/by/3.0/deed.en_GB.