Productionizing the Hive Thrift Server.

Ha! First day of my long awaited vacation and what do I do? Write a blog post about stuff I do at work of course!

A good portion of our team prefers to interface with Hive programatically using the Hive Thrift Server

The more we rely on it, the more we need to harden it.

It is not really setup or packaged for this so we need to go to town on it.

Continue reading

Share

Daemonizing the Apache Hive Thrift server on CentOS

Earlier I showed you how to setup Hadoop, then how to setup Hive to use a MySQL-backed Metastore.

These notes presume that you have setup your Hive metastore to use MySQL. If you don’t you’ll only be able to have one Hive instance running at a time (so no CLI while the HWI or thrift server is a-runnin’)

Got carried away, I daemonized myself :P

Continue reading

Share

Building Scribe from Silas Sewell’s source RPMs for CentOS / RHEL 5

Before, I wrote about how to build Scribe 2.1 and its’ dependencies on CentOS from source by hand.

I’m not a life long Linux admin, only been at it for a few years, so the process of discovery involved in getting it built made my hairline recede a bit…

The author just after running  ’make install’

However, you’re in luck if you want to avoid all that and keep your hair, if you’re willing to make a few compromises.

Silas Sewell was awesome enough to share his own source RPMS with the world!

Continue reading

Share

Building Facebook Scribe 2.1 on CentOS 5.5

At Outbrain, I’ve recently been tasked with setting up and testing Facebook’s Scribe log aggregation server for collecting clicks, impressions and other data for eventual loading into our data warehouse.

From the README:

Scribe is a server for aggregating log data that’s streamed in realtime from clients. It is designed to be scalable and reliable.

Facebook Scribe can be found here.

Here is an ancient access_log from from a Sumerian web server dating from the 26th century BC. If you read cuneiform, you’d clearly see the entries from when Enki thought it was a cool idea to release his code red nam shub on the world.

ancient apache acccess_logs

Obviously, if they had proper, scalable log aggregation and analytics they might have nipped that in the bud before it turned into the great pre-biblical DDoS.

From reading around the web I have gathered that building Scribe is notoriously difficult and I’ve found a few installation guides, but mostly for less package-conservative linux distributions than CentOS.  The steps I outline below for building and installation are what worked for me, and assume you have the EPEL repository installed.

NOTE:  Silas Sewell has posted some wonderful source RPMS for CentOS 5 / RHEL 5 on his blog which I will elaborate on in a future post.  My only problem with them is that thrift is built –without-java –without-perl –without-ruby –without-csharp for various reasons.  I did steal his Boost 1.36 to 1.33 hack and init scripts for this build though.

So, without further ado let’s get started.

Continue reading

Share