Getting Brisk going on CentOS and rocking a Terasort.

So, I started playing with a beta of Brisk this weekend.

The Datastax guys are industrious, energentic and are very open to hearing from both the Cassandra and Hadoop communities.  You should hit them in #Datastax-Brisk on Freenode IRC.

I’ll post more on my benchmarks and tests later, I’m still getting comfortable with it, but it is still very familiar, already being a Hadoop and Cassandra user.

I need to setup the OpsCenter stuff which looks pretty cool and put some real data in it.

So far, my favorite thing:

INFO 23:36:22,093 Chose seed 192.168.x.x as jobtracker

Magic!

My current concern is how to deal with deletes in CFS (CassandraFS) as Hive (and Terasort for that matter) kicks up a lot of ephemeral data.  Cassandra doesn’t delete stuff instantly, so I imagine I’ll need to do some tweaking with GCGraceSeconds to find an optimal setting.

So, this is my quick 5 minute setup to get going and running benchmarks.

Continue reading

Share

Why I am very excited about DataStax’s Brisk.

DataStax (née Riptano) is to Cassandra as Cloudera is to Hadoop (or Redhat is to Linux).

Brisk is DataStax’s upcoming Cassandra/Hadoop hybrid distribution.  From thier site:

DataStax’ Brisk is an enhanced open-source Apache Hadoop and Hive distribution that utilizes Apache Cassandra for many of its core services. Brisk provides integrated Hadoop MapReduce, Hive and job and task tracking capabilities, while providing an HDFS-compatible storage layer powered by the Cassandra DB.

They added Cassandra as an option for the Hadoop storage layer, allowing you to bypass HDFS; however, the implications of that go a whole lot further.  You get the strengths of both systems here and lose some of the problems.

I’m pretty jazzed about this and I hope to convince my co-workers to give it a go. I’d like to tell you why.

Continue reading

Share