Building and Installing Python 2.7 RPMs on CentOS 5.7


I was asked today to install Python 2.7 on a CentOS based node and I thought I’d take this oportunity to add a companion article to my Python 2.6 article.

We’re all well aware that CentOS is pretty backwards when it comes to having the latest and greatest sotware packages and is particularly finicky when it comes to Python since so much of RHEL depends on it.

As a rule, I refuse to rush in and install anything in production that isn’t in a manageable package format such as RPM. I need to be able to predictably reproduce software installs across a large number of nodes.

The following steps will not clobber your default Python 2.4 install and will keep both CentOS and your developers happy.

So, here we go.

Continue reading

Share

Rolling Upgrades for Cassandra

From 0.7 on up you can do rolling upgrades of your cluster.

A few weeks back I went from 0.7 to 0.8. Upgrade went as smooth as silk. It is sofa king awesome.

Will upgrade to 1.0 after holidays so as to bask in the glory of snappy compression, read performance gains and the leveled compaction.

Most of my process was semi-automated via Chef, but the steps below expand to what I did.

Before you start, please make sure to check for changes in the cassandra.yaml. From 0.7 to 0.8, seed strategy became pluggable as well as two or three other changes. In 1.0, I haven’t looked yet but I presume there will be other changes related to the pluggable compaction and compressions.

Continue reading

Share

Building RPMs for and setting up StatsD and Graphite on CentOS.

A while back Etsy opensourced a little node.js daemon called StatsD that makes it easy for you to ‘Measure All the Things.’

In my current environment setting up graphs for the folks on the business team and on the dev team is difficult and time consuming as it has to funnel through ops. We’re a bottleneck :(

I’m hoping to implement StatsD to make graphing a service that most anyone can directly interact with and remove me and my team as the bottleneck.

Below are my notes for setting it up.

Continue reading

Share

In reference to Hadoop Appliances; or, how I’m an Open Source snob.

Let me prefix this rant/post by stating that I come from the more scrappy, ‘build it out from OSS’ sort of shop, so I am highly biased toward the approach of:

  • Thinking first about, and then building, your infrastructure/solution to fit your needs.
  • Knowing the software inside and out.
  • Relying on the community for the rest.

Over the converse ‘Enterprise’ approach of:

  • Building your infrastructure based on someone’s white paper on how you should build an infrastructure to do X.
  • Getting your sysadmins a set of meaningless certifications.
  • Ultimately relying on commercial support as your last point of escalation.

Yes, yes, yes… this is very snobby and I am in danger of sounding as irreverent as Ted Dzuiba.  I am also wholly conscious that OSS approach can be taken in the same extreme direction as the enterprise approach. in so much that everyone blindly follows the same design choices that Twitter or Facebook are doing (albeit better than anyone else) or are implementing everything in node.js or Ruby on Rails because that is what the hot-as-shit hipster developers are doing.

For me it comes down to having operational responsibility for your infrastructure, rather than a support contract.  But still, I’m young and work at a hot startup; when I’m CTO of a bank maybe my view will change :P

Continue reading

Share

Getting Brisk going on CentOS and rocking a Terasort.

So, I started playing with a beta of Brisk this weekend.

The Datastax guys are industrious, energentic and are very open to hearing from both the Cassandra and Hadoop communities.  You should hit them in #Datastax-Brisk on Freenode IRC.

I’ll post more on my benchmarks and tests later, I’m still getting comfortable with it, but it is still very familiar, already being a Hadoop and Cassandra user.

I need to setup the OpsCenter stuff which looks pretty cool and put some real data in it.

So far, my favorite thing:

INFO 23:36:22,093 Chose seed 192.168.x.x as jobtracker

Magic!

My current concern is how to deal with deletes in CFS (CassandraFS) as Hive (and Terasort for that matter) kicks up a lot of ephemeral data.  Cassandra doesn’t delete stuff instantly, so I imagine I’ll need to do some tweaking with GCGraceSeconds to find an optimal setting.

So, this is my quick 5 minute setup to get going and running benchmarks.

Continue reading

Share

In which I discourse on Java bloat and Cassandra Node Balancing.

So, I was hoping to write a little snippet of code to embed on my blog to allow people to get the token ranges for load balancing their cluster.

In Cassandra, when using the random partitioner, all keys are given a token (essentially an md5 of the Key) that is between 0 and 2^127 (0 through 170141183460469231731687303715884105728 for non-nerds). That range is known as the ring.

Each member node of the Cassandra cluster owns a range of those keys on the ring in the same vein you’d divide up a pie.

Continue reading

Share

Apache Cassandra 0.7 CentOS Quick Install (with Cassandra-Stress, MX4J & JNA)

I’m such a sad bastard.

I got stuck fixing a production issue and had to miss the inagural NYC Cassandra Meetup group :(

To attone, I figure I’d write a quickie Cassandra post.

Continue reading

Share

Advanced Hadoop NameNode and Hive Metastore Backup Scripts

World Backup Day was last Thursday and in its honor I uploaded a few of my backup scripts to my github repository.

I thought I’d start off with modified versions of the scripts I use in production at Outbrain to backup my Hadoop NameNode and Hive Metastore.

First:  OMFG WTF ARE YOU NOT BACKING UP YOUR NAMENODE AND HIVE METASTORE?

Second:  No really, WTF IS WRONG WITH YOU!?!

Continue reading

Share

First Github Post: Hadoop Chef Cookbook

Over the last few months we’ve been migrating our infrastructure over to the Chef platform for infrastructure automation.  It is analogous to Puppet, which I’ve tinkered with in the past.

I’ll skip the debate over which is the better tool.  There has been lots of discussion all over about it.  Suffice it to say, we chose Chef for a myriad of reasons and this post isn’t a case study.

My first big chef project was migrating our Hadoop cluster on to it.

Continue reading

Share

CheckGMail — Fixing 401 error with Google Apps accounts in Ubuntu/Linux Mint.

I’m a pretty heavy Gmail user and I’ve recently gotten everything moved over to my account (finally), but getting some of my ancillary non-Google stuff to play along has been quite difficult.

I just baught a Samsung/Google Nexus S (which I LOVE! and Gingerbread is fantastic) and paired my Google Apps account to it to find I cannot use Google Apps Google Checkout account to buy apps from the Android Market (!!!!@&@&), but that is a whole other article…

Anyhoo, I’ve found that the stock CheckGMail you get from the repos doesn’t seem to work and if you Google around you’ll find complains back to 2007 for various issues related to GMail or the project itself.  Various fixes are put forth requiring you to grab the latest subversion snapshot to replacing certain perl libraries and disabling this or that.

These are the fixes I used to get the program working again with my google apps account…

Continue reading

Share