Ask HN: How do you architect daily digest emails?
Hi HN, I'm searching how to build a system to daily digest emails, on top of an existing legacy database / application.
Does it make sense to send daily digests to every user of your app every day? I think that iterate through each user will be very computation expensive.
Then if you need to aggregate some data (maybe with SQL JOIN) or other business logic, in order to build the email content, and you keep doing that for each user, your resource usage will be so high.
Beside then there's more issues: e.g. users may live all over the world and not have the same timezone, so the time of "daily" digest can't be the same for everyone.
Do you have a strategy for this daily digest? Is there some learning resources you could point to?
Muito Obrigado! > I think that iterate through each user will be very computation expensive. You can preload the data you need in bulk. Let's say you have a query will give you a mapping of User ID -> Product IDs. You run that query first (for all the users) and cache the result in memory (this is where an ORM will probably be counter-productive and I suggest you convert the result to primitive types like a dictionary to save memory). It's a huge query however it's also a single query, so the database can internally optimize it and it shouldn't be too big of a problem. You repeat this for all the data you think you'll need (it's fine if you get a bit extra, the optimization of fetching it in bulk makes up for it). You can even reuse existing caches your app might be using. If the data you need is already in Memcached/Redis as a result of another process you could just fetch it from there directly and avoid hitting the database at all. Now that you have all that data in memory, you do the actual processing in the code. Compute and memory capacity is relatively cheap compared to engineering efforts to optimize it further (especially if it involves rearchitecting your database layout or denormalizing certain data) and you can go even cheaper if you outsource this process to a bare-metal server which is more cost-effective for raw compute power than a cloud provider. Honestly I asked my question without any expectation. I thought most people scoff at it because it make me seem as a newbie/stupid person. Instead you gave me a interesting and useful reply. I now have much more leads to dig deeper in that topic. Thank you!! I originally upvoted your question without replying because I expected there to be a right solution for this and I was hoping some experts will chime in. Looking back at it I'm not sure if there is a "right" solution for this (or maybe there is and the experts are still laughing) but at least here's my take on it and how I would approach the problem. Whether it's the "right" solution or not is up for debate, but at least it will get you started. When attempting to solve a problem I recommend writing the worst, most hacky solution you can as long as it gets the job done, and then see if you can optimize it incrementally (by fetching data in bulk, etc). Throwing hardware at the problem is also a valid solution to at least delay it (sometimes the "delay" can be measured in years in which case the hacky and terrible solution would've made you tons of money in the meantime so you still end up winning).