Comments on: Migrating from MongoDB to Cassandra https://www.fullcontact.com/blog/engineering/mongo-to-cassandra-migration/ Relationships, reimagined. Mon, 30 Jan 2023 10:49:57 +0000 hourly 1 https://wordpress.org/?v=6.8.3 By: Tom Samplonius https://www.fullcontact.com/blog/engineering/mongo-to-cassandra-migration/#comment-1234 Fri, 14 Feb 2014 07:26:00 +0000 https://www.fullcontact.com/?p=7615#comment-1234 In reply to benmccann.

(1) Adding shards to an existing non-sharded Mongo cluster is not trivial; (2) and you still have the write lock issue that can only be solved by over provisioning the disk IO (Mongo serializes writes, limting you to the speed of a single disk); (3) and Mongo has no built-in data center awareness in data placement (you have to manually place replicas to servers that you know are in different data centers).

But Mongo is simple otherwise.

]]>
By: Xorlev https://www.fullcontact.com/blog/engineering/mongo-to-cassandra-migration/#comment-1233 Fri, 14 Feb 2014 04:44:00 +0000 https://www.fullcontact.com/?p=7615#comment-1233 In reply to roycehaynes.

Thanks for commenting.

It really depends on the project. For our usage, Cassandra works admirably. Our “search target cache” (previously, HBase) utilizes CQL3 tables to ensure all the data about a particular unique key (e.g. “michael@fullcontact.com”) ends up in the same logical row for locality. Our profile cache (the one previously MongoDB) is basically key/value lookup. It works quite well for that.

There are likewise dozens of use cases were Cassandra is wholly inappropriate. I’m unsure which impedance mismatch you’re referring to, but I’m happy to answer any questions if I can.

Regarding MongoDB, it’s certainly possible that if we’d started sharding earlier we’d still be using it. MongoDB had a lot of other issues too over time, which lead us to look elsewhere when considering options. And again, availability is our primary concern, Cassandra wins in terms of operational simplicity (especially on AWS). Cassandra was designed with resilience in mind from the get go.

]]>
By: Xorlev https://www.fullcontact.com/blog/engineering/mongo-to-cassandra-migration/#comment-1232 Fri, 14 Feb 2014 04:23:00 +0000 https://www.fullcontact.com/?p=7615#comment-1232 In reply to benmccann.

Hi Ben,

Thanks for your comments. You’re right, without any other reasoning it would have made more sense to shard MongoDB and do the same kind of migration, the incidental complexity of the config servers wouldn’t have been the deciding factor. As I’d said in the article, availability is our #1 priority. Cassandra runs extremely well in fixed-size autoscaling groups. We run 1 ASG for each of the 3 zones we operate in

During our shakedown testing, we would terminate an entire zone of servers and see very little disruption. ASGs automatically replace terminated machines. Paired with Priam, they automatically re-replicate and rejoin the cluster.

Cassandra grants us operational simplicity. As a side-benefit, we’re a JVM shop and find it much easier to look under the hood than we ever did with MongoDB.

Thanks again for your comments.

]]>
By: roycehaynes https://www.fullcontact.com/blog/engineering/mongo-to-cassandra-migration/#comment-1231 Fri, 14 Feb 2014 04:20:00 +0000 https://www.fullcontact.com/?p=7615#comment-1231 It sounds like MongoDB would still be in use if you’d started sharding sooner than later. If sharding was in place early-on, I’d imagine the “switch” wouldn’t be necessary, especially if your aggregated model is well-designed. The problems you’d likely run into after sharding early is how best to shard data and getting your write quorum correct (i.e., how many nodes write vs how many nodes read and write).

]]>
By: benmccann https://www.fullcontact.com/blog/engineering/mongo-to-cassandra-migration/#comment-1230 Fri, 14 Feb 2014 02:35:00 +0000 https://www.fullcontact.com/?p=7615#comment-1230 In reply to Tom Samplonius.

Right, but why not just shard MongoDB then as opposed to migrating to a completely new datastore? It sounds like it’s because running config servers and mongos was deemed too complicated, but migrating datastores seems even more complicated. I’m wondering if there were additional considerations not mentioned.

]]>
By: Tom Samplonius https://www.fullcontact.com/blog/engineering/mongo-to-cassandra-migration/#comment-1229 Fri, 14 Feb 2014 02:30:00 +0000 https://www.fullcontact.com/?p=7615#comment-1229 In reply to benmccann.

Because the instances have a 2 TB max on SSD storage. See the article.

]]>
By: Leif Walsh https://www.fullcontact.com/blog/engineering/mongo-to-cassandra-migration/#comment-1228 Fri, 14 Feb 2014 00:59:00 +0000 https://www.fullcontact.com/?p=7615#comment-1228 Too bad TokuMX (http://www.tokutek.com/products/tokumx-for-mongodb) wasn’t available when you were having
MongoDB problems. Concurrency, compression, and general performance are
exactly what it was built to solve, and you wouldn’t have had to change
your MongoDB apps at all.

]]>
By: benmccann https://www.fullcontact.com/blog/engineering/mongo-to-cassandra-migration/#comment-1227 Fri, 14 Feb 2014 00:45:00 +0000 https://www.fullcontact.com/?p=7615#comment-1227 I don’t fully understand the reasoning behind the migration. You said you had locking problems, but fixed that with SSDs. So what was it that you were trying to address?

]]>
By: Xorlev https://www.fullcontact.com/blog/engineering/mongo-to-cassandra-migration/#comment-1226 Tue, 07 Jan 2014 03:29:00 +0000 https://www.fullcontact.com/?p=7615#comment-1226 In reply to Opsy.

I hadn’t seen this, apologies.

We’d started using LUKS well after we’d converted to SSDs. On SSDs, we didn’t have an issue with disk IO bottlenecks, I’d have to drag a node out of cold stage to check what RA was set to.

We didn’t convert from Mongo due to lack of read/write speed (the latencies were actually extremely excellent — 7-9ms 99.5th), it was a strategic decision on how we wanted to evolve our architecture as well as allowing for near-linear scalability of our database infrastructure. I don’t doubt Mongo would have worked pretty well, but Cassandra on RAID disks fits our usage better and at a much nicer price point (with more RAM overall to boot).

As a side benefit, we’re a JVM shop and Cassandra is written in Java. Makes it a lot easier to become experts.

]]>
By: Opsy https://www.fullcontact.com/blog/engineering/mongo-to-cassandra-migration/#comment-1225 Tue, 24 Dec 2013 22:55:00 +0000 https://www.fullcontact.com/?p=7615#comment-1225 Wouldn’t high readahead have hurt your MongoDB deployment just as bad? Had you tried all the tuning with your MongoDB system that you did for C*?

]]>