Amazon DynamoDB Update – Global Tables and On-Demand Backup
aws.amazon.comLooks like a direct competition to Microsoft Azure Cosmos DB. Anyone care to point out the top 3 differences?
Maybe sorta kinda -- just a heads-up that CosmosDB can talk multiple API protocols:
I've used the MongoDB version, and it works ok for the basic CRUD stuff. I cant comment on aggregations, etc.- MongoDB - Cassandra - Graph API - Azure tables - DocumentDBNot sure from a quick peek if DynamoDB does the same...
It seems this service is targeted towards specific applications where write traffic is somewhat independent between 2 regions. However, you can shoot yourself in foot, if 2 regions are trying to update the same row, because of eventual consistency.
>> You can now create tables that are automatically replicated across two or more AWS Regions, with full support for multi-master writes, with a couple of clicks. <<
Hamm, If I understand it correctly, REGIONS are no longer heavily isolated.
Both features, global tables and automatic backups, seem like automation of features you could build yourself.
Global tables use DynamoDB Streams put items from Table A to Table B using conditional put (A Timestamp > B Timestamp).
Automated backup may be DynamoDB Streams or an automated setup of Datapipelines export to s3.
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuid...
I would say this offers global capabilities while preserving regional isolation. you're composing the global table out of regional tables (which replicate between each other), and you're still using regional endpoints. this means that a failure with one region's table does not have impact on the other.
this model does mean that certain things, like conditional expressions, are less useful/useless because you could very well execute a conditional expression against stale date.
If replication is synchronous it will cost you time.
If replication is asynchronous it will cost you freshness.
I'm not entirely sure what's implemented here, either way you need to pick. It's not magic.
Jeff Barr says in this article exactly how it works: when you make a write in one region, it streams the updates to the other regions. Presumably this is built on top of the DynamoDB Streams system. It tags the item update with a timestamp, and in the event of a conflict the most recent wins.
So in the case of a partition where the update stream is delayed, other regions can read and write to their local copy of the tables. They'll be replicated when the partition is healed. You should design around the possibility of conflicts, so that the 'last writer wins' resolution does what you want and leaves the system in a consistent state.
It would be nice if it emitted an event when write-write conflicts occur, so you can monitor that and possibly add your own conflict resolution logic to patch things up afterward. It doesn't appear like this exists.
Yes, you sacrifice freshness. When healthy, regions are behind by ~1s. If you need consistent writes then you can use one as the 'primary' region and do all your writes there and use the others as read replicas. But it's better to design your system to not require full consistency and do reads and writes in the local region.
(Disclaimer: I work for Amazon, but not in AWS and have nothing to do with this project.)
Thanks for the info. Custom resolution in lambda with vector clocks would be fantastic. Maybe in the future?
No need to be uncertain, the article outlines this pretty clearly:
"Behind the scenes, DynamoDB implements multi-master writes and ensures that the last write to a particular item prevails. When you use Global Tables, each item will include a timestamp attribute representing the time of the most recent write. Updates are propagated to other Regions asynchronously via DynamoDB Streams and are typically complete within one second"
In terms of access you're still touching a regional table, and while you can make use of things like conditional expressions they are limited to the regional table's view of the data.