Hadoop-Let us Admin

Apache HDFS

The Hadoop Distributed File System (HDFS) is a solution to store large files across clustered machines. Hadoop and HDFS were derivative names created from Google File System (GFS) paper. It is portable across heterogeneous hardware and software platforms. It is the primary distributed storage used by Hadoop applications. It is implemented as a block-structured filesystem. It has a simple coherency model named write-once-read-many (WORM) for files and works on idea of moving computation to data.

Goal

The goal of Hadoop distributed file system is to address the issues of hardware failure, high throughput data access and process large data sets of applications. Its implementation addresses a number of problems that are present in a number of distributed filesystems.

The HDFS architecture diagram depicts basic interactions among NameNode, the DataNodes, and the clients. A HDFS cluster primarily consists of a NameNode that manages the file system metadata and DataNodes that store the actual data.

One of the requirements for such a block-structured filesystem is the capability to store, manage, and access file metadata (information about files and blocks) reliably, and to provide fast access to the metadata store