Hadoop-Let us Admin

Core-*.xml

core-default.xml Default core Hadoop properties. The file is located in the following JAR file: hadoop-common-2.2.0.jar (assuming version 2.2.0).

core-site.xml Site–specific common Hadoop properties. Properties configured in this file override the properties in the core-default.xml file.

fs.defaultFS hdfs://localhost:9000

The core-default.xml and core-site.xml files configure the common properties for the Hadoop system. This section explores the key properties of core-site.xml, including the following: • hadoop-tmp-dir: Base for other temporary directories. The default value is /tmp/hadoop- ${user.name}. We referenced this property on a few occasions as the root directory for several properties in the hdfs-site.xml file. • fs.defaultFs: Name of the default path prefix used by HDFS clients when none is provided. In Chapter 3, “Getting Started with the Hadoop Framework,” you configured it to be hdfs://localhost:9000 for the pseudo-cluster mode. This property specifies the name and port for the NameNode (for example, hdfs://:9000). For local clusters, the value of this property is file:///. When the High Availability (HA) feature of HDFS is used, this property should be set to the logical HA URI. (See the Hadoop documentation for configuration of HDFS HA.) • io.file.buffer-size: Chapter 2, “Hadoop Concepts,” described the mechanics of the HDFS file create and HDFS file read processes. This property is relevant to those processes. It refers to the size of buffer to stream files. The size of this buffer should be a multiple of hardware page size (4096 on Intel x86), and it determines how much data is buffered during read and write operations. The default is 4096.

  • io.bytes.per.checksum: Hadoop transparently applies checksums to all data written to it and verifies them at the time of reading. This parameter defines the number of bytes to which a checksum is applied. The default value is 512 bytes. The applied CRC-32 checksum is 4 bytes long. Thus, by default there is an overhead of approximately 1% per 512 bytes stored (with the default setting). Note that this parameter must not be a higher value than the io.file.buffer-size parameter because the checksum will be calculated on the data while in the memory where streaming during the HDFS read/write process. It needs to be calculated during the read process to verify the checksums stored during the write process.

  • io.compression.codecs: A comma-separated list of the available compression codec classes that can be used for compression/decompression. The default settings that indicate the available compression codecs are org.apache.hadoop.io.compress.DefaultCodec, org. apache.hadoop.io.compress.GzipCodec, org.apapache.hadoop.io.compress.BZip2Codec, org.apache.hadoop.io.compress.DeflateCodec, and org.apache.hadoop.io.compress. SnappyCodec