Optimizing Big Data Processing: An Analysis of Map Reduce Scheduling Algorithms and Frameworks
Main Article Content
Abstract
Google makes use of the MapReduce programming model in order to process enormous amounts of data in an effective manner inside a distributed computing environment. Typically, it is employed for the purpose of doing distributed computing on clusters of computers. The computational processing of data is often performed on data that is stored in either a file system or a database. The MapReduce framework leverages the principle of data localization, enabling the processing of data in close proximity to the storage locations. This approach effectively mitigates the need for superfluous data transport. Recent developments in the field of big data have revealed a notable surge in the volume of data, demonstrating an exponential growth pattern. In recent years, this phenomenon has served as a source of inspiration for numerous scholars, prompting them to delve into novel avenues of inquiry within the realm of big data across various domains. The widespread appeal of large data processing platforms that make use of the MapReduce architecture is a primary factor driving the growing interest in improving the platforms' performance. The improvement of resources and job scheduling is of utmost importance as it plays a crucial role in determining whether applications can successfully meet performance objectives in various usage scenarios. The significance of scheduling in big data is significant, primarily focused on the optimization of execution time and cost associated with processing. The purpose of this research is performing an examination of scheduling in MapReduce, with a particular emphasis on two important aspects: Nomenclature as well as operational efficiency analysis.
Article Details

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.