Hadoop-Let us Admin

Hadoop Hive

Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query and analysis. Users of MapReduce quickly realized that developing MapReduce programs is a very programming-intensive task, which makes it error-prone and hard to test. Using Hadoop was not easy for end users, especially for the ones who were not familiar with MapReduce framework. End users had to write map/reduce programs for simple tasks like getting raw counts or averages.There was a need for more expressive languages such as SQL to enable users to focus on the problem instead of low-level implementations of typical SQL artifacts.

Hive was created to make it possible for analysts with strong SQL skills (but meager Java programming skills) to run queries on the huge volumes of data to extract patterns and meaningful information. It provides an SQL-like language called HiveQL while maintaining full support for map/reduce. In short, a Hive query is converted to MapReduce tasks.

Apache Hive evolved to provide a data warehouse (DW) capability to large datasets. Users can express their queries in Hive Query Language, which is very similar to SQL. The Hive engine converts these queries to low-level MapReduce jobs transparently. More advanced users can develop user-defined functions (UDFs) in Java. Hive also supports standard drivers such as ODBC and JDBC. Hive is also an appropriate platform to use when developing Business Intelligence (BI) types of applications for data stored in Hadoop.