Hive is one of the members in the Hadoop ecosystem. Hadoop is written in java. So initially hadoop was limited to only the subset of engineers who know java. Later, some smart guys in facebook designed a layer on top of hadoop which can act as a mediator between the SQL experts and hadoop. They made an application that will accept SQL standard queries and talks to hadoop by parsing the queries. This application is called hive. This is internally accessing the HDFS and data processing is happening through mapreduce. The main advantage is that the user doesn’t need to worry about the complexity of writing lengthier mapreduce programs. After the invention of this application, hadoop became popular among SQL experts through hive. Another advantage of hive is that the development time for some solutions are very faster compared to writing java programs. Sometimes a few lines of queries may work well instead of writing several hundred lines of code.
In hive the data is represented as tables. A table is a representation
of data with a schema. In hadoop data is stored in hdfs. So if we look at the
data through a schema, we will be able to visualize the data in tabular format.
In hive the schema is stored in metastore. A metastore is a lightweight
database where the hive stores the metadata of tables. By default hive uses
derby database, which is not suitable for production or multi-user environments.
So usually people use mysql, postgresql etc as metastore.
No comments:
Post a Comment