Kicking the tires on Hadoop 0.23: Pseudo-Distributed mode.


Thought I’d play a little with Hadoop 0.23 (a.k.a YARN, MR2, NextGen Hadoop) and dump my notes here.

Gotta keep my skillz sharp y’all so I don’t become irrelephant. (Yes, that just happened.)

Below I just setup a pseudo-distributed mode setup and run some examples on it, nothing crazy.

I’m hoping to test and write more on how 0.23 differs from the main line 0.20.x, 1.0 and CDH3 releases as well as playing with the NameNode federation and using some other paradigms like MPI, Hama and Spark.

Continue reading

Share

Code Example: Linux + PyUSB & the Dream Cheeky Thunder/Storm USB Missile Launcher


Went to Staples the other day to grab some assorted accessories for work and I saw they had some Brookstone USB Desktop Missile Launchers in the clearence section, so I grabbed one.

What fun, I thought. Plugged it into my work desktop (running LinuxMint Debian Edition) only to find there were no linux drivers for this particular device.

This turned into a nice little weekend project :)

Continue reading

Share

Building and Installing Python 2.7 RPMs on CentOS 5.7


I was asked today to install Python 2.7 on a CentOS based node and I thought I’d take this oportunity to add a companion article to my Python 2.6 article.

We’re all well aware that CentOS is pretty backwards when it comes to having the latest and greatest sotware packages and is particularly finicky when it comes to Python since so much of RHEL depends on it.

As a rule, I refuse to rush in and install anything in production that isn’t in a manageable package format such as RPM. I need to be able to predictably reproduce software installs across a large number of nodes.

The following steps will not clobber your default Python 2.4 install and will keep both CentOS and your developers happy.

So, here we go.

Continue reading

Share

Cassandra NYC 2011 Talk on YouTube


Datastax posted my talk (see below)!

Continue reading

Share

SysDrink in 2012!

SysDrink is now rocking it’s own site, twitter account as well as sponsorship from Outbrain!

At the last SysDrink I was able to chat aimlessly with ops engineers at NYC’s top startups.

  • I got some good leads on scaling graphite 
  • Learned how others approach their orchastration tools
  • Learned NOT to use RVM in production.

It is always good when you put a bunch of enthusiastic engineers (who usually end up at the more fascinating infrastructures solving the harder problems) in a room together outside of a meetup/talk.

Meetups/Talks set agendas for conversations. People come to the SysDrink to socialize, network, vent, compare notes and swap war stories about diverse subjects that don’t always fit into a talk or meetup.

This is the value of a SysDrink.  If you’re not in NYC and would like to run a SysDrink in your area, ping me and I can set you up on the sysdrink.info calendar and twitter account to post events.

Sign up for the next NYC SysDrink here

Share

I’ll be talking Tuesday, 12/6/11, at the Cassandra NYC Conference.

Heads up, I’m going to be giving my Cassandra for Sysadmins’s talk at Cassandra NYC on Tuesday, December 6th.

Come by and say hello!

Share

Rolling Upgrades for Cassandra

From 0.7 on up you can do rolling upgrades of your cluster.

A few weeks back I went from 0.7 to 0.8. Upgrade went as smooth as silk. It is sofa king awesome.

Will upgrade to 1.0 after holidays so as to bask in the glory of snappy compression, read performance gains and the leveled compaction.

Most of my process was semi-automated via Chef, but the steps below expand to what I did.

Before you start, please make sure to check for changes in the cassandra.yaml. From 0.7 to 0.8, seed strategy became pluggable as well as two or three other changes. In 1.0, I haven’t looked yet but I presume there will be other changes related to the pluggable compaction and compressions.

Continue reading

Share

Code Example: Using Python Suds to Access the Bronto API.

For some internal notification system I was attempting to write a script that would occasionally clear a list in Bronto, our email delivery platform.

They have a lovely little SOAP API, but almost all of the examples were in PHP or Java. Since I am running this as a cronjob, and me being more of a Pythonist, I thought Python was a better place for me to implement this.

The Bronto team, while not terribly proficient in Python, were as helpful as they could be. My major stumbling blocks were getting the authentication mechanism to work correctly, then it took a while to discover how to properly pass arguments to the API.

Eventually I’ll do more stuff with it, but for now, I thought I’d publish this in case it might be useful for someone else using Python with thier API.

Code starts below. Mind you, I am not a developer, be kind :)

Continue reading

Share

Building RPMs for and setting up StatsD and Graphite on CentOS.

A while back Etsy opensourced a little node.js daemon called StatsD that makes it easy for you to ‘Measure All the Things.’

In my current environment setting up graphs for the folks on the business team and on the dev team is difficult and time consuming as it has to funnel through ops. We’re a bottleneck :(

I’m hoping to implement StatsD to make graphing a service that most anyone can directly interact with and remove me and my team as the bottleneck.

Below are my notes for setting it up.

Continue reading

Share

In reference to Hadoop Appliances; or, how I’m an Open Source snob.

Let me prefix this rant/post by stating that I come from the more scrappy, ‘build it out from OSS’ sort of shop, so I am highly biased toward the approach of:

  • Thinking first about, and then building, your infrastructure/solution to fit your needs.
  • Knowing the software inside and out.
  • Relying on the community for the rest.

Over the converse ‘Enterprise’ approach of:

  • Building your infrastructure based on someone’s white paper on how you should build an infrastructure to do X.
  • Getting your sysadmins a set of meaningless certifications.
  • Ultimately relying on commercial support as your last point of escalation.

Yes, yes, yes… this is very snobby and I am in danger of sounding as irreverent as Ted Dzuiba.  I am also wholly conscious that OSS approach can be taken in the same extreme direction as the enterprise approach. in so much that everyone blindly follows the same design choices that Twitter or Facebook are doing (albeit better than anyone else) or are implementing everything in node.js or Ruby on Rails because that is what the hot-as-shit hipster developers are doing.

For me it comes down to having operational responsibility for your infrastructure, rather than a support contract.  But still, I’m young and work at a hot startup; when I’m CTO of a bank maybe my view will change :P

Continue reading

Share