So far, I’ve shown two different ways (from source or RPM) of getting Scribe installed on your CentOS / RHEL 5 server and soon I’ll write more on configuration.
For now let’s just get it running with our JVM-based app. In this scenario we’re going to set it up to log a Cassandra instance to a scribe instance on the same server, which is setup to log to a buffer store locally if the remote scribe master should become unavailable.
So we’ll assume you’ve got git installed (you can get it from EPEL), you have log4j properly installed in your JVM-based application and you have local and remote scribe instances setup.
Here is the config (based on the example1 & 2 configs that come with scribe) for the central server:
Before, I wrote about how to build Scribe 2.1 and its’ dependencies on CentOS from source by hand.
I’m not a life long Linux admin, only been at it for a few years, so the process of discovery involved in getting it built made my hairline recede a bit…
The author just after running ’make install’
However, you’re in luck if you want to avoid all that and keep your hair, if you’re willing to make a few compromises.
Silas Sewell was awesome enough to share his own source RPMS with the world!
At Outbrain, I’ve recently been tasked with setting up and testing Facebook’s Scribe log aggregation server for collecting clicks, impressions and other data for eventual loading into our data warehouse.
From the README:
Scribe is a server for aggregating log data that’s streamed in realtime from clients. It is designed to be scalable and reliable.
Facebook Scribe can be found here.
Here is an ancient access_log from from a Sumerian web server dating from the 26th century BC. If you read cuneiform, you’d clearly see the entries from when Enki thought it was a cool idea to release his code red nam shub on the world.
Obviously, if they had proper, scalable log aggregation and analytics they might have nipped that in the bud before it turned into the great pre-biblical DDoS.
From reading around the web I have gathered that building Scribe is notoriously difficult and I’ve found a few installation guides, but mostly for less package-conservative linux distributions than CentOS. The steps I outline below for building and installation are what worked for me, and assume you have the EPEL repository installed.
NOTE: Silas Sewell has posted some wonderful source RPMS for CentOS 5 / RHEL 5 on his blog which I will elaborate on in a future post. My only problem with them is that thrift is built –without-java –without-perl –without-ruby –without-csharp for various reasons. I did steal his Boost 1.36 to 1.33 hack and init scripts for this build though.
So, without further ado let’s get started.