Rolling Upgrades for Cassandra

From 0.7 on up you can do rolling upgrades of your cluster.

A few weeks back I went from 0.7 to 0.8. Upgrade went as smooth as silk. It is sofa king awesome.

Will upgrade to 1.0 after holidays so as to bask in the glory of snappy compression, read performance gains and the leveled compaction.

Most of my process was semi-automated via Chef, but the steps below expand to what I did.

Before you start, please make sure to check for changes in the cassandra.yaml. From 0.7 to 0.8, seed strategy became pluggable as well as two or three other changes. In 1.0, I haven’t looked yet but I presume there will be other changes related to the pluggable compaction and compressions.

So, per node, one at a time:

Make other nodes think this one is down, wait 10 seconds, then move on.

nodetool -h $(hostname) -p 8080 disablegossip

Cut off anyone from writing to this node.

nodetool -h $(hostname) -p 8080 disablethrift

Flush all memtables to disk

nodetool -h $(hostname) -p 8080 drain

For saftey make a snapshot.

nodetool -h $(hostname) -p 8080 snapshot

Stop cassandra.

/etc/init.d/cassandra stop

Protip: In abstract of the snapshot, this is the best method of shutting down a Cassandra node. While Cassandra has a crash-only design (i.e. safe to pull the plug), the preceding steps stops all the other nodes/clients from writing to it and flushes the memtables to disk making for a faster startup (don’t have to cruise through the CommitLog).

Now that Cassandra is down remove old jars, rpms, debs. Your data will not be touched.

yum erase apache-cassandra

Add new jars, rpms, debs.

yum install apache-cassandra08

Drop your new cassandra.yaml to /etc/cassandra/conf.

Fire it up and watch the log.

/etc/init.d/cassandra start ; tail -f /var/log/cassandra/cassandra.log

Wait a bit for the node to come back up and for the other nodes to see it.

Now, repeat through your nodes.

When done, before you run repair, move or add, on each node run:

nodetool -h $(hostname) -p 8080 scrub

Scrub is rebuilding the sstables to bring them up to date. It is essentially a major compaction, without compacting, so it is a bit expensive.

Run repair on your nodes to clean up the data.

nodetool -h $(hostname) -p 8080 repair

Drop your old snapshot when you’re through.

nodetool -h $(hostname) -p 8080 clearsnapshot

Now you’re done. Go forth and be merry :)

Share
  • Anonymous

    Most excellent. 

    You may want to ad -t to the snapshot and give it a name that can also be used with clearsnapshot. Otherwise clear will remove all for the KS. 

    Any interest in putting this in a script somewhere ?