RAMCloud
ramcloud.stanford.eduAt a glance, this looks a lot like the underlying software running Texas Memory Systems' (now IBM) RamSan products, except built to run on off-the-shelf hardware of the user's choice, which is quite a nice development.
As a use-case example, those units have been in use for at least ten years by CCP Games [http://community.eveonline.com/news/news-channels/press-rele... and http://community.eveonline.com/news/dev-blogs/apocrypharrrrr...] as extreme-low-latency database storage for a fairly large cluster. At the time there were jokes of the staff needing to get special clearances just to see demo hardware before making their decision and eventual purchase due to restrictions related to their use by government and military customers.
Maybe I'm oversimplifying it or looking at it wrong, but I wanted to share my 2c while I was thinking about it. I'll take a better look later.
Samsung recently announced a 2nd Gen TSV ( 3D ) DDR4 Memory, it will be only 2 years before we see 256GB Memory per DIMM or more. We could easily have 1TB per DIMM by 2020, meaning a single 1U server would have 4TB Memory ( Assuming Intel decide to support that much memory...... )
We have in less then a decade, all of a sudden remove most of our I/O bottleneck. From NAND SSD, to Intel Xpoint, and Giant amount of memory all within 10 years.
> RAMCloud replicates all data on nonvolatile secondary storage such as disk or flash, so no data is lost if servers crash or the power fails
If I have a write-heavy application, for instance a time-series database, does it mean that eventually I'm going to consume all the RAM, then page cache, and eventually my throughput is going to be disk IO bound?
Secondary store is more like a backup, not for spilling.
Naming aside, is the write throughput into the RAMCloud constrained by the cumulative disk IO write speed? If my sustained write speed is greater than total disk throughput, I either have to drop writes or slow down producers, correct?
Usually yes, that is a limit you would mostly not want to exceed. It would slow down producers (not ack newer writes until older backup copies are flushed to disk).
The backup copies of the data is stored on a configurable amout of _other_ systems in the cluster. They receive it to memory, and then the write is acked to the client. Then the backup is written asyncronous to disk on those systems.
You could configure that backups are not written to disk at all, if you have so much memory in the cluster (and do not care for that additional level of durability). Or the size of the buffer of unflushed data which would cover some longer write spikes.
The point is more that the backup copies become "free" once they are written to disk. Otherwise the cluster could only store 25% of it's memory size if you want 3 backup copies.
I'm not 100% familiar with the internals, but I think it's safe to say that the persistence part is asynchronous in order to reach the low latency goals of the system.
Here they compare RAMCloud to Redis https://ramcloud.atlassian.net/wiki/display/RAM/Redis+vs.+RA...
RAMdrives are back, baby! And maybe going mainstream too!