Running more than one ndbd on a machine

Personally, I'm not a fan of more than one ndbd per machine...

Diamond Notes ยป Fun with Running a Cluster on Two Servers
Others might argue with this, but I would never put the SQL nodes on the same servers as the ndbd nodes for production. Some say you can run multiple ndbd nodes on the same server and I am more comfortable with that since I can lock the ndbd daemon into memory and know its not going to change (my ndbd nodes on those two servers have been at exactly 71.3% since I started them up. If I had servers for the ndbd nodes that had 16+ gigs of RAM I might start allocating 4 gigs of RAM to a ndbd daemon with 3+ daemons per node. My understanding is that this helps keep the transactional logs for the nodes under control. When you do a ndbd node restart it takes less time for a node to get up and running because of the smaller files to read. I might be mistaken and its too late for me to look it up :) Anyone got other reasons or maybe (if I am right) someone can elaborate.


First of all, I'm very excited to see that Cluster is being used for MogileFS here. I've actually been thinking it might be fun to write a MogileFS::Store::NDB class to use NDB/Perl ... but that's another story.

One of the things that's nice about NDB is that it spread across mutliple machines. Now, granted, in development we can't always do this. I run NDB on my laptop all the time, so I certainly feel the pain there. But with a multi-node design, I say take advantage of it. People on other architecture are always asking how they can spread the load more easily across multiple machines, and here you can. If you need 8 data nodes, get 8 machines.

Practically, there is another reason. If you put 3 data nodes on a single physical machine, then if that machine crashes, you have not a single node failure, but 3 nodes failing at the same time. Although there is no specific reason that this won't work, it's also essentially an edge case and not really tested all that well. Other people may disagree with me here, especially as currently running multiple ndbd's on a single box is the only way to take advantage of more that 2 CPU cores, but I'm just not a fan. For something like this, go get a bunch of 2 CPU boxes.

To address the larger question, though, you can certainly spread the write load and the redo log burden by having more data nodes, and having the data distributed across more data nodes will certainly make the log recovery shorter on any one given node. However, there is a price to pay here in query latency. The more data nodes you have, the more nodes your data might be on. If you are not using the NDB API (or any of the NDB/Connectors) then there is no optimized node selection going on. (I've heard someone is working on an Optimized TC Selection patch for mysqld, but it's not in the mainline yet) So if you go from 2 nodes to 4 nodes, you went from a 100% chance that the TC will have your data and not have to ask someone else, to a 50% chance. At 8 nodes you are at a 25% chance that the TC selection will select a node with your data. That's for primary key operations. If you're doing a scan, then your data is going to (most likely) be on all of the nodes, which can be good or bad. But if you're doing MogileFS queries, perhaps the extra milliseconds of latency isn't a concern in this case and you'd rather have faster node recovery. I'd test that hypothesis out and see how much better the recovery is.

Another potential reason to have more data nodes even with the higher latency costs (and extra network traffic as that many more nodes have to talk to that many more nodes) is scalability. Of course, more nodes mean more scaling. But as of now ADDING a data node is not an online operation. Adding more memory can be done in a rolling fashion. So if you think you might need 8 data nodes worth of CPU processing, go ahead and get 8 data nodes with 4G of RAM a piece. Then as you need to store more data, stick in more RAM. Before long, you'll have a nice 256G system... :)

SO... as with everything cluster, there is certainly room on both side of the argument as to what's best here. And as always, testing is the best bet to see how it maps to your environment.

(Also, it's 4AM at the moment, so please forgive me if this rambles a little.)

0 Comments

Comment on this post