Showing posts with label Cassandra. Show all posts
Showing posts with label Cassandra. Show all posts

Tuesday, December 17, 2013

When a Delete is a Write!

So Cassandra -- a No-Sql database -- has a few peculiarities that might take newbies by surprise. One of them is that deletion involves a write!

Before we take a brief look at that, do you know the story of Cassandra from Greek mythology? Cassandra was so beautiful that Apollo wanted to have carnal knowledge of her. She refused. Consequently, she was cursed by Apollo with prophesying the truth, yet with no-one believing it. Personally, TheHackerCIO knows how she felt. All the time I tell the truth, but it seems that very few actually believe it. It's enough to drive one crazy.

I wonder at this choice of mascot for the NoSql database which trades off consistency for availability per Brewer's Conjecture (A.K.A., "The CAP Theorem"). Is it that Cassandra will always return the truth, but we the DBAs won't believe it?

Well, leaving off the speculation, let's return to the peculiarity mentioned before: how can a delete be a write operation?

Remember, Cassandra uses an immutable data model. Data just continues to be written out to represent all changes. One consequence of this is that updates and inserts really are interchangeable. They call this the Cassandra "UpSert," because if you insert and a row with that primary key already exists, then it simply becomes an update. Conversely, if you update a row and the primary key involved doesn't exist, Cassandra will simply insert it. That is, either way, you will "UpSert" a row.

Another consequence of the immutable data model is that delete operations are really just "marking for deletion." We're all familiar with this from the file-system, but to have a database that does this adds a few wrinkles. For instance, you now have to deal with "compaction," where the deleted data element no longer remains within the working set of data elements.

So, for people from the relational database world -- and aren't we all -- you need to spend a little time wrapping your head around the world of NoSql in general, and Cassandra in particular.

As you do so,

I Remain,


Wednesday, November 13, 2013

An Evening's Evangelism

Last night was spent playing hookey from the Geeky Book club. But only because a particularly special speaker was in town. Patrick McFadin, chief Evangelist for Apache Cassandra was speaking at DreamWorks in Glendale.

So, TheHackerCIO slogged through an hour and a half of LA traffic to get out to Glendale in time to see the talk. Not to mention hearing it.

Patrick is a good presenter, so the talk was well organized and interesting. His purpose was to convince us that C* [the semi-official abbreviation for Cassandra] was the best persistence tier for your application.

He predicated this on the tunable consistency available in C*; pointing out that if you were willing to specify ALL, and take the performance hit, you could construct the most consistent distributed database system possible. One where every node had to acknowledge before an operation completed.

The talk was too long to go too in-depth, but I was particularly interested by the architecture of writing all files out immutably. Even compaction is accomplished by reading in the fragmented files and writing a new compressed one. So, in theory, you could always recover -- even from programmatic database corruption. Ideally, you use a snapshot to do point-in-time recovery, followed by writing a script to extract "post-point-in-time" updates from the files and apply it where required.  

He mentioned that the joke among C* cognoscenti is that CQL has a UPSERT statement, because update and insert are so very similar. If a row doesn't exist, update will insert it and if it exists insert will replace the data in it! UPSERT is a fun way to remember this similarity of statements.

Patrick also pointed out that Netflix -- the poster boy for C* -- has just released the Chaos Monkey for C*! He challenged the Mainframe person attending to introduce the Chaos Monkey to the Mainframe systems, and see how they compare in terms of failover and availability.  If you don't know about the Chaos monkey, tomorrow I'll fill you in on it. Because it's important.

To summarize his talk, I liked his zinger the best: Use Oracle to count your money; Use Cassandra to make it.

I Remain,


Tuesday, November 5, 2013

Cassandra Last Night at the TRG

Cassandra was the topic at TRG last night.

That is to say, Apache Cassandra. I'm not clear why the project chose to refer to themselves by the name of a Greek prophetess who was doomed to always prophesying correctly, but also to never being believed.

Perhaps the the eventual consistency model?

Still, it doesn't seem like the greatest PR approach and it doesn't seem like the Big Data initiatives would like to think of their correct insights always being disregarded.

But such is the name of the product.

Our presenter, Adrian Rodriguez, did a nice hands-on tutorial where he built up a data model for a Social web application centered around dog photos. He provided a github account where the full blown application can be browsed.

He also pointed us to a very helpful consistency calculator website, where the implications of your consistency level choice are clearly shown: Cassandra Parameters for Dummies.

Adrian recommended the very sound policy of defining calls in quorum and then relaxing this only where necessary, in keeping with the dictum: "don't prematurely optimize."

I also liked his way of explaining that Cassandra databases grow out left to right, with everything attaching to the primary key as a new column, and with all the join overhead done upfront at update time in all the other relevant rows; in contrast to the Relational Model, where databases grow top to bottom as new rows are added. This is an excellent way for beginners to start wrapping their heads around this NoSql database.

Tonight is the Java Users group, so a report will be in order tomorrow on Groovy.

Full details of the presentation may be read here.

I Remain,