Installing Apache Hive with a MySQL Metastore in CentOS

Hive is a pretty nifty data warehousing extension of Hadoop that lets you dump structured data into HDFS and query it using a SQL-like language called HiveQL which runs all the map/reduce junk for you.

It’s pretty darn simple to install, but if you want to really free it up you need to do some tweaking.

Out of the box Hive keeps the information about what data lives where in HDFS in an embedded Apache Derby database (a.k.a. the metastore).  This can be restrictive if you want to have more than one connection to Hive at a time.  If you want to have two sessions to Hive CLI, or run the Hive Web Interface (HWI) simultaneously with a CLI session, you’re SOL.

This is pretty easy to fix, though, by using MySQL (which can handle multiple connections) to hold the metastore.

Earlier we installed Hadoop using Cloudera’s testing Yum repository. We want a little newer version of Hive so we’re going to grab the RPM from thier unstable repository.

rpm -ivh  http://archive.cloudera.com/redhat/cdh/unstable/RPMS/noarch/hadoop-hive-0.5.0+20-1.noarch.rpm

Now lets modify our /etc/hive/conf/hive-site.xml to use MySQL instead of Derby:

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://hivemeta.yourcompany.com:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
</property>
<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hadoop</value>
</property>
<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>hadoop</value>
</property>

Now let’s give Hive the MySQL connector:

curl http://mysql.he.net/Downloads/Connector-J/mysql-connector-java-5.1.10.tar.gz | tar zxv
cp mysql-connector-java-5.1.10/mysql-connector-java-5.1.10-bin.jar /usr/lib/hive/lib/

Install the MySQL server (and set a root password) wherever you want to host it.

yum -y install mysql-server
/usr/bin/mysqladmin -u root password '******'

Login to MySQL and setup the hadoop user. Not the most secure setup, but you get the idea. Hive makes the database automatically so you can restrict after it’s created.

mysql -u root -p

CREATE USER 'hadoop'@'hivemeta.yourcompany.com' IDENTIFIED BY 'hadoop';
GRANT ALL PRIVILEGES ON *.* TO 'hivemeta.yourcompany.com' WITH GRANT OPTION;

And now we can give it a go:

(nathan@test1:~)$ hive
Hive history file=/tmp/nathan/hive_job_log_nathan_201006211927_2002309384.txt
hive> select * from dwcl_click limit 10;
OK

a bunch of junk here

Time taken: 1.245 seconds

As long as it works, we know we can have multiple sessions. I’ll write soon about setting up the Hive Web Interface (HWI) as well as daemonizing Hive’s thrift server so we can really get the Hive humming.

Share
  • Pingback: Getting the Hive Web Interface (HWI) to work on CentOS | blog.milford.io

  • Pingback: Daemonizing the Apache Hive Thrift server on CentOS | blog.milford.io

  • http://twitter.com/marcellodesales Marcello de Sales

    There were 2 problems while starting with this post:

    * Don’t forget to create a brand new hive-site.xml using the hive-default.xml’s “header” (xml declaration, documention tag).

    * Just to add that I couldn’t grant the permissions to the user because of the syntax used…

    [mdesales@cu093 clusternet]$ hive
    Hive history file=/tmp/mdesales/hive_job_log_mdesales_201011092051_775400090.txt
    hive> show tables;
    FAILED: Error in metadata: javax.jdo.JDOFatalDataStoreException: Access denied for user ‘hadoop’@'localhost’ to database ‘hive’
    NestedThrowables:
    com.mysql.jdbc.exceptions.MySQLSyntaxErrorException: Access denied for user ‘hadoop’@'localhost’ to database ‘hive’
    FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
    hive> exit

    So I could fix it by using the syntax username @ host:

    GRANT ALL PRIVILEGES ON *.* TO ‘hadoop’@'localhost’ WITH GRANT OPTION;

    Thanks for the post!
    Marcello

  • Pingback: hadoop and hive « 阿喵就像家

  • lauraine 2012

    Thanks for sharing your info. I really appreciate your efforts and I will be waiting for your further write ups thanks once again.
    html5 player| html5 video player

  • Kaihoku

    Do you know if there is a way to set up Hive with a MySQL master & slave?