hdfs-default.xml Default HDFS properties. The file is located in the following JAR file: hadoop-hdfs-2.2.0.jar (assuming version 2.2.0).
hdfs-site.xml Site–specific HDFS properties. Properties configured in this file override the properties in the hdfs-default.xml file.
The hdfs-default.xml and hdfs-site.xml files configure the properties for the HDFS. Together with the core-site. xml file described next, the HDFS is configured for the cluster. As you learned in Chapter 2, NameNode and Secondary NameNode are responsible for managing the HDFS. The hdfs-.xml files configure the NameNode and the Secondary NameNode components of the Hadoop system. The hdfs-.xml set of files Iare also used for configuring the runtime properties of the HDFS as well the properties associated with the physical storage of files in HDFS on the individual data nodes. Although the list of properties covered in this section is not exhaustive, it provides a deeper understanding of the HDFS design at the physical and operational level. This section explores the key properties of the hdfs-*.xml file. Some of the important properties of the hdfs-site.xml file include the following:
dfs.namenode.checkpoint.dir: Determines where the Secondary NameNode should store the temporary images to merge on the local/network file system accessible to the Secondary NameNode. Recall from Chapter 2 that this is the location where the fsimage file from the NameNode is copied into for merging with the edits file from the NameNode. If this is a comma-delimited list of directories, the image is replicated in all the directories for redundancy. The default value is file://${hadoop.tmp.dir}/dfs/namesecondary.
dfs.namenode.checkpoint.edits.dir: Determines where the Secondary NameNode should store the edits file copied from the NameNode to merge the fsimage file copied in the folder defined by the dfs.namenode.checkpoint.dir property on the local/network file system accessible to the Secondary NameNode. If it is a comma-delimited list of directories, the edits files are replicated in all the directories for redundancy. The default value is the same as dfs.namenode.checkpoint.dir.
dfs.namenode.checkpoint.period: The number of seconds between two checkpoints. As an interval equal to this parameter elapses, the checkpoint process begins, which merges the edits file with the fsimage file from the NameNode.
dfs.namenode.handler.count: Represents the number of server threads the NameNode uses to communicate with the DataNodes. The default is 10, but the recommendation is about 10% of the number of nodes, with a minimum value of 10. If this value is too low, you might notice messages in the DataNode logs indicating that the connection was refused by the NameNode when the DataNode tried to communicate with the NameNode through heartbeat messages.
dfs.datanode.du.reserved: Reserved space in bytes per volume that represents the amount of space to be reserved for non-HDFS use. The default value is 0, but it should be at least 10 GB or 25% of the total disk space, whichever is lower. • dfs.hosts: This is a fully qualified path to a file name that contains a list of hosts that are permitted to connect with the NameNode. If the property is not set, all nodes are permitted to connect with the NameNode.