World Backup Day was last Thursday and in its honor I uploaded a few of my backup scripts to my github repository.
I thought I’d start off with modified versions of the scripts I use in production at Outbrain to backup my Hadoop NameNode and Hive Metastore.
First: OMFG WTF ARE YOU NOT BACKING UP YOUR NAMENODE AND HIVE METASTORE?
Second: No really, WTF IS WRONG WITH YOU!?!
Last week I attended both Hadoop World 2010 and Cloudera’s Hadoop Admin Training which gave me a long list of items to append to my already long list of things to do to make my Hadoop cluster add even more value for my employer.
The training was fantastic! I cannot recommend it enough, it was well worth the money. I was very lucky to have Eric Sammer teach it. That guy really knows his stuff inside and out, absolute rockstar.
(note: Cloudera’s certification exam is pretty subtle and tricky and, boy, do you need to really understand Hadoop to pass it, not just memorize facts)
I’ve had a cluster for about a year and I know my way around it, but the training solidified what I knew, taught me the whys behind the tips I’ve picked up around the web and filled in all the holes in between. Especially since we built our cluster with no notion of our usage of it. It was an unplanned beast.
Like I said, I’ve got a long list of tweaks to apply. As I work through them I hope to post here about them, as I’ve been hoping to post more here about most things. I’ve got a huge backlog of junk to post here. BTW, Outbrain is hiring me a Co-Pilot here in the NYC Office.