I thought I’d start off with modified versions of the scripts I use in production at Outbrain to backup my Hadoop NameNode and Hive Metastore.
First: OMFG WTF ARE YOU NOT BACKING UP YOUR NAMENODE AND HIVE METASTORE?
Second: No really, WTF IS WRONG WITH YOU!?!
If you lose your Namenode metadata, or your Hive Metastore data there is no hope for recovering your data.
In respect to the Namenode, you basically have a bunch of servers with opaque blocks of meaningless data.
For the Hive Metastore, you now only have a bunch of files on HDFS with no schema overlay.
You’d be wise to backup this data hourly and keep the snapshots for a week or more.
Put these scripts into cron to run hourly and they’ll take care of the rest. I’ll do a post some time about how to recover down the line.
These have been sanitized and modified a bit from what I use in production. I might have broke them a little in the process so you may need to play a bit: