Hadoop is an open-source ecosystem used for storing, managing and analyzing a large volume of data. It is a data management system bringing together massive amounts of structured and unstructured data that touch nearly every layer of the traditional enterprise data stack, positioned to occupy a central place within a data center. It fills a gap in the market by effectively storing and providing computational capabilities over substantial amounts of data. Hadoop is different from previous distributed approaches in the following ways:
Hadoop ecosystem consists of several sub projects, and two of them form the very basic of Hadoop ecosystem, which are the distributed computation framework MapReduce and the distributed storage layer Hadoop Distributed File System (HDFS).
The MapReduce framework is both a parallel computation framework and a scheduling framework. The Apache MapReduce framework consists of two components:
Until the Hadoop 2.x release, HDFS and MapReduce employed single-master models, resulting in single points of failure