Ask HN: Going from Redis to MySQL. Good idea?
We are building a SaaS backend for restaurants using Rails. We integrate directly with POS, so each POS keeps sending customer orders that we store for later processing. we have this POS integration going at about 1,000 locations which send us about 3 Million individual customer orders on monthly basis.
for this write-heavy app, we store all orders in redis which is working beautifully. We are growing at incredible pace, we keep adding new restaurants with hundreds of locations that keep sending us crazy amount of data. Except there is one problem -- redis keeps running out of memory every month! As, everything which doesn't have to be in memory is in memory.
This is why we contemplating to switch to mysql. As we really don't need to keep all data in memory. here are we numbers of current redis database:
used_memory_human:39.83G
dbsize: 34706870
Here is what we store in redis as Hash: id - integer
location_id - integer
stored_at - timestamp
token - string
transaction_no - integer
menu_items - string(comma seprated list of all menu items that customer ordered along with their price & Qty)
order_amount - decimal
order_subtotal_amount - decimal
order_amount_payable - decimal
order_datetime - timestamp
employee_id - integer
employee_name - string
pos_type - string
post_version - string
restaurant_id - integer
So, looking for some advice on:1. moving from redis to mysql is good idea? how will it effect us in long run as we will need to keep updating our indexes & partition scheme to cater to huge demand.
2. What other databases(relational or non-relational) would be suited for this use case than redis?
3. Or we are all wrong, as redis is made for storing this type of data. so, we just keep using redis & upgrading our machines every month? Yes, I think it's a great idea. First of all, 3 million rows a month is not that much from a database point of view - if evenly distributed (though I'm sure it's not), it would be about 1 per second. (It is impressive to me from a business point of view though). You don't need Redis's throughput, and you are hitting Redis's limits with respect to memory. 1) This is _exactly_ the use case which relational databases were built and optimized for (fixed schema, OLTP, presumably some analytic queries). You might want to normalize the data a little (e.g. the menu_items, employee_name, pos_type & pos_version), but you don't have to - though this would enable faster & richer querying. On your schema changes: I bet you'll be much happier using a relational database which takes care of this stuff for you, then you will be implementing it yourself in Redis. You probably won't need partitioning either. If you do end up needing partitioning, I would guess your DB is "trivial" to partition by restaurant. I suspect you'll also find that a relational database stores the data more efficiently, particularly if you normalize a few things. Your data will probably fit into RAM again, but a relational database can cope even when it doesn't. 2) Any relational database would be a great match for this, and would allow rich querying (e.g. by employee, by date, by restaurant). My personal bias is to prefer Postgres. 3) I don't see what Redis is getting you here. It doesn't support rich querying, is memory bound, and you don't need the throughput it promises. I'll try and answer your 3 questions. 1. As with most things it depends, but I'll make the mistake of assuming you don't need all of this data in memory and you won't be iterating over the entire set of data on a regular (<1m) basis. In that case, it most likely makes sense to move to a relational database. You should ask yourself what kind of queries you're running on the data and pick something appropriate. But for the data you're listing, then yeah - go with a relational DB. 2. Again "It depends". Postgresql is the most popular alternative to MySQL and I'd say at this point is more in vogue with the developer community. It's supported on AWS RDS and Heroku has a great postgres.app for OSX. Your data is so simple that I'd encourage you to start there. In the future when you have specific needs then you can research more appropriate database technologies. 3. From what you've shown us then you should most likely move to a relational database. The size should shrink dramatically, too. If you're iterating over the entire dataset regularly though, you should _definitely_ use a relational database over Redis, because a relational database lets you push processing onto the server. With redis you'll have to fetch every row over the network; with a SQL database you'll typically only retrieve the final summary data. The OP has 40GB of data, so to query that even once per minute with Redis would effectively max out 10Gb ethernet. Of course, the real win is that something that would be a table scan in Redis could be a indexed query in SQL. Doing a table scan over 40GB of data is always going to be painful. >With redis you'll have to fetch every row over the network; >with a SQL database you'll typically only retrieve the final >summary data. Redis supports filter features native, so it basically works as every remote service. Do you mean HSCAN with MATCH? Thanks for reminding me about that. I think it's so limited that the point still stands though. You can actually push all the processing to Redis with Lua without having to fetch a single "row" over the network... I'm not recommending this as a rule of thumb - on the contrary, you could easily tie down your (any type of) DB with application logic - but in some cases it makes sense. Any relational DB will work but I would look at Postgres before going with MySQL. Without going into any level of technical depth, I'll offer my two cents. "Write-heavy" means your application processes thousands of records per minute. Three million orders on a monthly basis doesn't look like too big of a deal. Any database should be able to manage that easily. Redis would have been my last pick to do that. Redis I only use when to keep high-volatile non-essential data in memory, not much else. Key in picking your database is how your data is being used. Are the most recent data records accessed most often and the rest almost never? MongoDB and an RDBMS will do quite nicely. Do you really expect _extremely_ heavy growth, a NoSQL datastore might be better. Are DBA's in short supply? Use a managed datastore. As a big fan of NoSQL, I'd say; be cautious when using NoSQL datastores. Any SQL database will do the majority of workloads quite nicely and offer you with plenty of tools to do any type of query you might need to. NoSQL databases do analytical queries usually quite poorly; separating OLTP and OLAP is painful and costly for smaller apps, only for that reason it is best to avoid them in most circumstances. Above all; use what you can run cheapest. The available skillset should play a role too, if most engineers are familiar with SQL Server, use SQL Server... I generally use MySQL or PostgreSQL on RDS (AWS) given DBA's are hard to find. I use DynamoDB when I have extremely high dataloads. I use MongoDB when developing a common app for NodeJS because NodeJS simply works very well with MongoDB… I consider other NoSQL datastores only when processing many millions of records per day. For all the rest (and majority of use cases); pick an RDBMS. // In production I've used Oracle DB, MySQL, PostgreSQL, MongoDB, DynamoDB, Cassandra, Google Cloud Datastore, Redis, RedShift and Elasticsearch. Hi Gary4gar,
Read the points that you stated in your question. If you focus on growth (clearly your wins are going to bring more data updates as your increase the customer base) you should look into a solution that does not tie you down with data model and also does not expect you to worry about sharding in the future. Having done sharding and when one keeps growing the data volume and having to do shards it becomes painful and hurts the development flow. Been there many time and done it quite a bit myself. (Am a technical guys myself, http://www.linkedin.com/in/sukantag) Look for a NoSQL solution which will help in the flexi-schema model. Helps you add different types of data models relatively easily. Check out Aerospike (http://www.aerospike.com), the product scales like scaling problems have disappeared from this planet. All of the characteristics that you would need based on what you have stated is in the product. Best of all it is free for startups. Just give it shot. Open Source server and client libraries in multiple programming languages. (Am a big "C" and Python guys so like them client libs a lot). Lots of startups and many large software companies use them. If you like it and works for your application tell others, if not tell us what did not work. I ain't sellin and you ain't buyin; Ciao 1,2) As so many have pointed out already, this type of data is what relational databases excel at manipulating. Unless you already have a MySQL server running, I'd recommend you try PostgreSQL. It offers great performance and scalability. 3) Redis is a great tool in its own right but it's more of a key-value data store than a full fledged database. It's fantastic for caching, tasks queuing or cross application communication. If you don't want to worry about database/server administration and scaling issues, I'd recommend looking at Cloudant. It's quite inexpensive, takes virtually no administration effort, scales automatically, and is fast and flexible. You could continue to use Redis as a write cache, although a message queue or something like Kafka might be better for that purpose, and of course as a read cache, while using Cloudant or some other "real" database for permanent storage. (On a side note, you described your application as "write-heavy" but you also said you get "about 3 Million individual customer orders on monthly basis" — which works out to just slightly more than one record to insert per second, on average, which doesn't seem "write-heavy" to me. I understand that you get orders submitted in batches, rather than evenly distributed, but it still doesn't seem especially write heavy. At any rate, a database like Cloudant can take as many writes as you want to throw at it.) (Disclaimer: I don't work for Cloudant; I am a satisfied customer; I don't use Cloudant for everything.) TL;DR: you don't have to use one or another, use both! Your use case, running a restaurant is certainly a mixed use case. You'll certainly want to query the latest orders if this is going from the waiter to the kitchen. But yesterday's orders aren't that much important. As others have mentioned, this data is exactly the kind of data you would store in a relational database. Just be sure to not query for the last 10 orders or something like that. I'm not a real database expert, but I've made this mistake before and as far as I know ordering by date usually means a full table scan. RDBMS shine when you just store your data and run interesting queries on that data whenever you want to answer a question about that data. Redis is the other way around. If you want to query your data, you'll have to think about your question ahead of time and store it in an appropriate data structure. What I was wondering is, do you shard your data in different databases? I would guess, every customer should have their on If you have an index on the date, then selecting the last 10 orders will be very cheap, and there's no reason not to do that in a relational database - it should be near-instantaneous. The trade-off here is that every index you add makes insertion a little bit slower, so you don't normally want to just add an index for every query. Ok, thank you. Now you got me thinking, what did I do wrong with this Oracle 9 DB several years ago? :/ Perhaps you selected the last ten orders for a particular customer, or something like that. In that case, the query planner would have to choose whether to scan the entire table, or to scan the timestamp index and then filter by customer. If you had a lot of customers, it is reasonable to choose the table scan. Of course, for that query, you could create an index on customer_id & timestamp, and then it would be near-instant again! Why did you choose Redis in the first place? Redis does have its own (official but not at a stable release yet) cluster implementation. (http://redis.io/topics/cluster-spec) I'm not sure how production ready it is - has anyone here tried it? Twitter probably runs the largest Redis cluster in the world[1]. It's certainly doable, but depends how much engineering resources you're willing to throw at the problem. 1 - http://youtu.be/rP9EKvWt0zo Edit: I should point out that this was before Redis Clustering was built into core. I'm trying to say the problem is solvable, but difficult. Regarding 3 - I don't know if you (or anyone for that matter) are wrong or right about this (or anything), but there are ways to use Redis w/o upgrading to bigger machines. Redis cluster, as mentioned in @bendmorris comment, is one way to go about it. Alternatively, if you want to continue using Redis - with the current or a slimmer version of the data model - without having to worry about machine upgrades or clusters, then I recommend that you look at http://redislabs.com/redis-cloud (disclaimer - I work at Redis Labs). Our managed service is built using production-proven Redis clusters that allow your database to scale well beyond the limits of any single node, without you having do anything or change a line of code. I wouldn't even consider using MySQL. Postgres (aka PostgreSQL) is an industrial-strength open-source RDBMS. I can't comment on the non-relational options as I've never used them, but for your data volume I would expect Postgres to be adequate. MySQL is also an industrial-strength open-source RDBMS. Postgres might a better option, but MySQL is worth considering as well. Would be helpful if you shared what your reads are like. Are they limited to a single customer or span the whole db? How fast do you need the "read-side" [or some portion of it] to be aware of the new writes?
Also remember that there's nothing stopping you from using multiple databases. I.e. if you run complex analytically queries that are just impossible to execute even in soft real time it's very reasonable to have a separate database for that does that very well. I don't have an opinion re redis to MySQL, but it seems like right now the cheapest options is to simply buy a server with 256+ GB of RAM. These are pretty common by now. I agree and perhaps see if you can shard between multiple servers. There's no easy answer i don't think. it depends. If you decide it makes sense to switch to an rdbms, you should probably consider postgres rather than MySQL. Why not MySQL? Just the small list of flaws that we hit trying to use mysql (there's dozens more, and we're not including all the "mysql silently inserts invalid data" gotchas): no check constraints views with aggregates are too slow to be used no expression indexes triggers don't fire on cascaded actions no window functions can't set default values to be the result of a function no transactional DDL doesn't have multiple databases, just schemas misnamed as databases rollbacks are orders of magnitude slower than other RDBMS, and if it is interrupted it can corrupt the database functions can't use prepare/execute, so no dynamic SQL in functions subqueries are broken: can't modify and select from the same table functions can't be called recursively (seriously? is this 1962?) triggers can't alter the table they are firing against stored procedures can't be invoked from prepare/execute "slow query" log has a completely useless resolution of seconds Agree with most of these issues but your last claim is only true for older versions of MySQL. Newer versions do support microsecond resolution[1]. Mysql does have a much better durability solution -galera cluster IMO... Does Postgres have something similar? Galera is not a durability solution. Yes, postgresql has multi-master synchronous replication. It is also not a durability solution. Because he heard it was trendy on the internet, and doesn't know that the world's largest SQL deployments are all MySQL and Oracle. Yes, and you should only use Windows desktops and program in PHP. Generally never improve! Riak[0] could also potentially be a good fit for your use case. Unlike Redis, Riak is distributed. I've heard of several companies in the POS space using it. Use a document store like mongoDB, use redis for frequently used data only, Mysql is also great, maria db would be better or percona. I think redis should have current data only which may be needed to process or crunch daily sales or so etc. I think Apache Cassandra DB is what you are looking for. Apache: The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages. http://planetcassandra.org/getting-started-with-apache-cassa... Here's another vote. Definitely move away from Redis for the reasons already stated. Choosing MySQL or Postgres depends on your dev team's relative comfort with each. mysql, mariadb or postgresql would be ok. >> menu_items - string(comma seprated list of all menu items that customer ordered along with their price & Qty) you may prefer to rework the menu_items as an independent table depending on your uses cases and if you don't have strong reason against. >> menu_items - string(comma seprated list of all menu items that customer ordered along with their price & Qty) > you may prefer to rework the menu_items as an independent table depending on your uses cases and if you don't have strong ganything against. I wonder if storing this in an array in postgres would be a good idea? I think redis is excellent choice for evil writes.
Redis has many kinds of configuration[1]
Do you use the base package configuration or you add additional config options? You could try FoundationDB, with or without the SQL layer. 2. MongoDB. >moving from redis Yes. It is a very poor choice for your needs. >to mysql No. It is a very poor choice in general. >What other databases Postgresql obviously. I'm honestly shocked that people are still considering mysql in 2014. Postgres has great developer ergonomics but can be extremely challenging operationally at scale. Also: finding good Postgres DBAs is difficult. Shocked for what?
Mysql is a good choice for many cases.
Please you should explain why mysql is better than postgres? >Mysql is a good choice for many cases Name one. Mysql is horribly broken. Mysql is fast and easy to scale and make durable, what features of database do you really use that is not met by, fast, durable and scalable... Sounds more like perhaps some sort of belief rather than data? That does not answer the question. Postgresql is faster than mysql, easier to scale than mysql, and is already durable without having to do anything to it. Saying "it is almost as good as postgresql" does not mean it has a use case. MySQL is a very good database. Especially if you use MariaDB. No, it isn't. No amount of saying that will make it true. It is a database that is missing vital functionality, and which considers data integrity not to be important. How is this correct, mysql has synchronous replication. does Postgres have something equivalent to galera? It is correct because mysql is missing tons of important functionality, like I said. You even responded to my other comment listing a small portion of it. No, postgresql does not have an equivilent to galera. Postgresql's synchronous multi-master replication doesn't break constantly and require manual intervention like galera's buggy patch does. I don't think you are to this point yet, but a RDBMS under heavy write load can run into a lot of headaches too, requiring a very carefully designed schema and proper disk sub-system (lots of IOPS), plus the need to have a solid DBA to maintain it. However, non of that is a vote against moving to the RDBMS, so I do think you should seriously consider Postgres or MySQL though as it seems like a good fit. What I would consider though is keeping your Redis up front for accepting/processing the records in real time coming in and then batching them into your RDBMS. This provides you the ability to do reporting and other analytics through the RDBMS, but utilize Redis as a buffer for the writes. It also gives you the ability to massage the data into the specific schema that best fits the usage of the data in the RDBMS, but allowing the most efficient schema coming into Redis from the POS systems. It adds complexity though so that would have to be weighed.