Building and Installing Python 2.7 RPMs on CentOS 5.7


I was asked today to install Python 2.7 on a CentOS based node and I thought I’d take this oportunity to add a companion article to my Python 2.6 article.

We’re all well aware that CentOS is pretty backwards when it comes to having the latest and greatest sotware packages and is particularly finicky when it comes to Python since so much of RHEL depends on it.

As a rule, I refuse to rush in and install anything in production that isn’t in a manageable package format such as RPM. I need to be able to predictably reproduce software installs across a large number of nodes.

The following steps will not clobber your default Python 2.4 install and will keep both CentOS and your developers happy.

So, here we go.

Continue reading

Share

Getting Brisk going on CentOS and rocking a Terasort.

So, I started playing with a beta of Brisk this weekend.

The Datastax guys are industrious, energentic and are very open to hearing from both the Cassandra and Hadoop communities.  You should hit them in #Datastax-Brisk on Freenode IRC.

I’ll post more on my benchmarks and tests later, I’m still getting comfortable with it, but it is still very familiar, already being a Hadoop and Cassandra user.

I need to setup the OpsCenter stuff which looks pretty cool and put some real data in it.

So far, my favorite thing:

INFO 23:36:22,093 Chose seed 192.168.x.x as jobtracker

Magic!

My current concern is how to deal with deletes in CFS (CassandraFS) as Hive (and Terasort for that matter) kicks up a lot of ephemeral data.  Cassandra doesn’t delete stuff instantly, so I imagine I’ll need to do some tweaking with GCGraceSeconds to find an optimal setting.

So, this is my quick 5 minute setup to get going and running benchmarks.

Continue reading

Share

Apache Cassandra 0.7 CentOS Quick Install (with Cassandra-Stress, MX4J & JNA)

I’m such a sad bastard.

I got stuck fixing a production issue and had to miss the inagural NYC Cassandra Meetup group :(

To attone, I figure I’d write a quickie Cassandra post.

Continue reading

Share

First Github Post: Hadoop Chef Cookbook

Over the last few months we’ve been migrating our infrastructure over to the Chef platform for infrastructure automation.  It is analogous to Puppet, which I’ve tinkered with in the past.

I’ll skip the debate over which is the better tool.  There has been lots of discussion all over about it.  Suffice it to say, we chose Chef for a myriad of reasons and this post isn’t a case study.

My first big chef project was migrating our Hadoop cluster on to it.

Continue reading

Share

New method for installing Python 2.6.4 (with mysql-python) on CentOS 5.5

So I wrote in an earlier post about alt-installing Python 2.6 from source on CentOS, which was easy enough.  But, this made it more difficult to maintain and deploy as well as add modules.  So, I was lucky enough to come across a nice little yum repository hosted by Rizwan Kassim (Geekymedia.com) that contained an RPM that would do the alt-install work for me :)

I’m aware that EPEL has a Python 2.6 package, but the Geekymedia RPMs have a whole flurry of modules you can add as well as an RPM for setuptools which will make your life immeasurably easier when running Python 2.4 and 2.6 side-by-side for installing python packages.

The only problems with the Geekymedia RPMs are that the binary packages are all 32-bit (I’m running servers here folks!) and I was unable to get the MySQL-python26 one to work right for me.

So, let’s get down to business.

Continue reading

Share

Daemonizing the Apache Hive Thrift server on CentOS

Earlier I showed you how to setup Hadoop, then how to setup Hive to use a MySQL-backed Metastore.

These notes presume that you have setup your Hive metastore to use MySQL. If you don’t you’ll only be able to have one Hive instance running at a time (so no CLI while the HWI or thrift server is a-runnin’)

Got carried away, I daemonized myself :P

Continue reading

Share

Installing Apache Hive with a MySQL Metastore in CentOS

Hive is a pretty nifty data warehousing extension of Hadoop that lets you dump structured data into HDFS and query it using a SQL-like language called HiveQL which runs all the map/reduce junk for you.

It’s pretty darn simple to install, but if you want to really free it up you need to do some tweaking.

Continue reading

Share

Alt-Installing Python 2.6 from source in CentOS

I work with some smart folks who need the latest stuff to get the job done. However, this can sometimes be a problem when you use a more conservative (read: stable) Linux distribution like CentOS.

You see, CentOS 5 comes bundled with Python 2.4 where Fedora or Ubuntu (or Linux Mint) ship with Python 2.6.

(nathan@citadel:~)$ cat /etc/issue
Linux Mint 9 Gloria - Main Edition \n \l
(nathan@citadel:~)$ python
Python 2.6.4 (r264:75706, Dec  7 2009, 18:45:15)
(nathan@test1:~)$ cat /etc/issue
CentOS release 5.5 (Final)
(nathan@test1:~)$ python
Python 2.4.3 (#1, Sep  3 2009, 15:37:37)

Continue reading

Share

Installing Apache Cassandra on CentOS

From Cassandra’s site:

The Apache Cassandra Project develops a highly scalable second-generation distributed database, bringing together Dynamo’s fully distributed design and Bigtable’s ColumnFamily-based data model.

It’s one of the more popular NoSQL data stores out there and at Outbrain we’ve been moving some parts of our service from MySQL to it.

I love it as an Ops guy because it just sorta works… I set it up, fire it up and it goes.  Mind that you need to set it up right in the beginning, but that is another thing all together and I won’t get into configuration and implementation, just deployment.

Continue reading

Share

Building Scribe from Silas Sewell’s source RPMs for CentOS / RHEL 5

Before, I wrote about how to build Scribe 2.1 and its’ dependencies on CentOS from source by hand.

I’m not a life long Linux admin, only been at it for a few years, so the process of discovery involved in getting it built made my hairline recede a bit…

The author just after running  ’make install’

However, you’re in luck if you want to avoid all that and keep your hair, if you’re willing to make a few compromises.

Silas Sewell was awesome enough to share his own source RPMS with the world!

Continue reading

Share