Building Facebook Scribe 2.1 on CentOS 5.5

At Outbrain, I’ve recently been tasked with setting up and testing Facebook’s Scribe log aggregation server for collecting clicks, impressions and other data for eventual loading into our data warehouse.

From the README:

Scribe is a server for aggregating log data that’s streamed in realtime from clients. It is designed to be scalable and reliable.

Facebook Scribe can be found here.

Here is an ancient access_log from from a Sumerian web server dating from the 26th century BC. If you read cuneiform, you’d clearly see the entries from when Enki thought it was a cool idea to release his code red nam shub on the world.

ancient apache acccess_logs

Obviously, if they had proper, scalable log aggregation and analytics they might have nipped that in the bud before it turned into the great pre-biblical DDoS.

From reading around the web I have gathered that building Scribe is notoriously difficult and I’ve found a few installation guides, but mostly for less package-conservative linux distributions than CentOS.  The steps I outline below for building and installation are what worked for me, and assume you have the EPEL repository installed.

NOTE:  Silas Sewell has posted some wonderful source RPMS for CentOS 5 / RHEL 5 on his blog which I will elaborate on in a future post.  My only problem with them is that thrift is built –without-java –without-perl –without-ruby –without-csharp for various reasons.  I did steal his Boost 1.36 to 1.33 hack and init scripts for this build though.

So, without further ado let’s get started.

# Create important directories:

mkdir -p ~/build/
cd ~/build/
mkdir /etc/scribed/

# Install whatever dependencies we can from yum.

yum -y install gcc-c++ boost boost-devel libevent libevent-devel \
                    automake autoconf m4 bison zlib zlib-devel bzip2 \
                    bzip2-devel flex pkgconfig python-devel ruby-devel

# Thrift requires a newer version of libtool to build, so remove CentOS’s and build anew:

rpm -qa | grep libtool | rpm -e --nodeps $(xargs)
curl http://ftp.gnu.org/gnu/libtool/libtool-2.2.8.tar.gz | tar zxv
cd libtool-2.2.8/ && ./configure && make && make install && cd ..

# Get a modern version of Java (gotta get it from Oracle directly… that feels weird to type)

rpm -ivh /mnt/temp/rpms/jdk-6u20-linux-amd64.rpm
echo 'export JAVA_HOME="/usr/java/jdk1.6.0_20"' > /etc/profile.d/java.sh
echo 'PATH=${JAVA_HOME}/bin:${PATH}' >> /etc/profile.d/java.sh
export JAVA_HOME=/usr/java/jdk1.6.0_20
export PATH=${JAVA_HOME}/bin:${PATH}

# Thrift requires a new version of ant

curl http://www.fightrice.com/mirrors/apache/ant/binaries/apache-ant-1.8.1-bin.tar.gz | tar zxv
mv apache-ant-1.8.1/ /opt/ant
echo 'export ANT_HOME=/opt/ant' > /etc/profile.d/ant.sh
echo 'export PATH=/opt/ant/bin:$PATH' >> /etc/profile.d/ant.sh
export ANT_HOME=/opt/ant
export PATH=/opt/ant/bin:$PATH

# Build & install thrift 0.2.0

curl http://mirror.atlanticmetro.net/apache/incubator/thrift/0.2.0-incubating/thrift-0.2.0-incubating.tar.gz | tar zxv
cd thrift-0.2.0
cp /usr/share/aclocal/pkg.m4 aclocal/
./bootstrap.sh && ./configure --with-csharp=no --with-erlang=no --with-ruby=no
make && make install

# Build & install Facebook Bassline

cd contrib/fb303/
./bootstrap.sh && ./configure
make && make install && cd ../../../

# Finally, build & install scribe after modifying it to use boost 1.33 (Thanks Silas!):

curl http://cloud.github.com/downloads/facebook/scribe/scribe-2.1.tar.gz | tar zxv
cd scribe-2.1
export LD_LIBRARY_PATH="/usr/local/lib"
sed -i 's/1.36/1.33/' configure.ac
sed -i 's/dir_iter->filename/dir_iter->leaf/' src/file.cpp
./bootstrap.sh && ./configure && make && make install

Depending on the version of thrift installed you may need to execute the following on the scribe_cat script.

sed -i 's/log_entry = scribe.LogEntry(dict(category=category, message=sys.stdin.read()))/log_entry = scribe.LogEntry(category=category, message=sys.stdin.read())/' ./examples/scribe_cat

# Copy the example stuff out:

cp ./examples/scribe_cat /usr/local/bin
cp ./examples/scribe_ctrl /usr/local/bin
cp ./examples/example1.conf /etc/scribed/default.conf

# Here is the init script, /etc/init.d/scribed

#!/bin/sh
#
# scribed - this script starts and stops the scribed daemon
#
# chkconfig:   - 84 16
# description:  Scribe is a server for aggregating log data \
#               streamed in real time from a large number of \
#               servers.
# processname: scribed
# config:      /etc/scribed/scribed.conf
# config:      /etc/sysconfig/scribed
# pidfile:     /var/run/scribed.pid

# Source function library
. /etc/rc.d/init.d/functions

run="/usr/local/bin/scribed"
run_ctrl="/usr/local/bin/scribe_ctrl"
prog=$(basename $run)

[ -e /etc/sysconfig/$prog ] && . /etc/sysconfig/$prog

port=$(egrep "^port=" $SCRIBED_CONFIG | awk -F"=" '{ print $2 }')

lockfile=/var/lock/subsys/scribed

start() {
    echo -n $"Starting $prog: "
    daemon nohup $run -c $SCRIBED_CONFIG &> /dev/null &
    retval=$?
    echo
    [ $retval -eq 0 ] && touch $lockfile
    return $retval
}

stop() {
    echo -n $"Stopping $prog: "
    $run_ctrl stop $port
    retval=$?
    echo
    [ $retval -eq 0 ] && rm -f $lockfile
    return $retval
}

status() {
    $run_ctrl status $port
}

restart() {
    stop
    start
}

reload() {
    echo "Probably not implemented."
    $run_ctrl reload $port
}

case "$1" in
    start|stop|restart|status|reload)
        $1
        ;;
    *)
        echo $"Usage: $0 {start|stop|status|restart|reload}"
        exit 2
esac

# Set it up :

chmod +x /etc/init.d/scribed
chkconfig --add scribed

# Create the /etc/sysconfig file

echo "SCRIBED_CONFIG=/etc/scribed/default.conf" >> /etc/sysconfig/scribed

I’ll post more about Silas’ RPMs, Digg’s scribe-log4j-appender and actual server configuration so stay tuned!

Share
  • gaojinbo

    Good.
    Thank you for everything.
    Howto install scribe client or web admin?
    Email:admin@gaojinbo.com

    • http://blog.milford.io Nathan Milford

      I’m not aware of any admin web interface for Scribe. And for implementation I was simply going to pop the scribe log4j appender into Tomcat and let it hit the local Scribe instance.

      I’ve sort of moved away from Scribe and I am experimenting more with Cloudera’s Flume project which seems a little bit more supported at the moment and has a nice web interface and a pretty decent community. We’re about to add another data center and hire another ops engineer so I should have time to implement and write about it more in the coming months.

      More about Flume here: http://www.cloudera.com/blog/2010/07/whats-new-in-cdh3b2-flume/

      • Sudhir

        We tested Flume recently and hit a whole bunch of blockers with flow isolations and interrupted exceptions and are hence looking at Scribe. I think in a couple of months Flume will be more usable

  • http://bluedeals.info Zack Dallas

    I think you made a good point here.

  • Mark

    Bummer that it runs as root. Is this by design?

    • http://blog.milford.io Nathan Milford

      Possibly. It can probably be run as another user. I’m kinda looking at Flume these days though :P

  • ciccio

    Oh my dog!
    This command is wrong:
    [user@host ]$ sed -i ‘s/dir_iter->filename/dir_iter->;leaf/’ src/file.cpp
    The correct is (without ‘;’):
    [user@host ]$ sed -i ‘s/dir_iter->filename/dir_iter->leaf/’ src/file.cpp

    o_O’/

    • http://blog.milford.io Nathan Milford

      Thanks! I’ll fix that right away :)