Hadoop-Let us Admin

Block Replication

The NameNode is responsible for block replication. The Name Node makes all decisions regarding replication of blocks.Replica placement determines HDFS reliability, availability and performance. Each replica on unique racks helps in preventing data loses on entire rack failure and allows use of bandwidth from multiple racks when reading data. This policy evenly distributes replicas in the cluster which makes it easy to balance load on component failure. However, this policy increases the cost of writes because a write needs to transfer blocks to multiple racks. The NameNode keeps checking the number of replicas.

If a block is under replication, then it is put in the replication priority queue. The highest priority is given to low replica value. Placement of new replica is also based on priority of replication. If the number of existing replicas is one, then a different rack is chosen to place the next replica. In case of two replicas of the block on the same rack, the third replica is placed on a different rack. Otherwise, the third replica is placed on a different node in the same rack as an existing replica. The NameNode also checks that all replica of a block should not be at one rack. If so, NameNode treats the block as under-replicated and replicates the block to a different rack and deletes the old replica.