Settings

Theme

Durable Objects in Production

linc.sh

134 points by geelen 5 years ago · 57 comments

Reader

_ahs0 5 years ago

> I've come away from this experience with a fairly firm belief that the "Stateful Worker" model is genuinely inspired—I can now start to see how they could model virtually every data problem we face, and replacing almost every piece of infrastructure we currently use. That's potentially revolutionary, but only time and experience will tell whether it genuinely outperforms existing alternatives. But for our first foray, this was an unbridled success.

realmod 5 years ago

What a great write up! Looks very promising. Right now, I'm wondering how everything will be priced, especially, the compute-time. Because if the price is low, you would be able to replace a lot of complicated servers with easier and more scalable workers.

According to [0], they will - obviously - charge for both compute-time and storage operations, and with the price of storage operations expected to be around Workers KV. Assuming that compute-time is charged at minimum at-or-higher than Workers Unbound, using workers for chat-rooms and other WebSocket stuff would be unfeasible. Workers Unbound costs $12.50 per MM-GB-sec - given that the server is 128 MB (the current fixed memory size) - the price per-worker-second would be at least $0.0000016 per connected-worker. It could get expensive fast.

[0] https://news.ycombinator.com/item?id=24616775

  • kentonv 5 years ago

    We're definitely going to figure out some sort of pricing for WebSockets that doesn't charge full price for idle time, but we haven't nailed it down yet.

  • a_imho 5 years ago

    It could get expensive fast.

    Also with paid Workers and no stop loss what happens if someone decide to ddos your app? I searched for this and come away with no clear answers.

    Also wondering where will Worker KV fit in? From what I gather Durable Objects are strictly superior if pricing will be comparable.

    • kentonv 5 years ago

      Workers KV is still better in many use cases. Durable Objects are the right choice when you need strong consistency. KV is the right choice when you want world-wide low latency access to the same data. Note that these two advantages are fundamentally opposed; it is physically impossible to simultaneously have strong consistency and worldwide low-latency access to a single piece of data. So, this will always be a trade-off.

      Note that you could build KV on top of Durable Objects, by implementing your own caching and replication in application logic running in Durable Objects. On the other hand, you can't implement Durable Objects on top of KV; once you've lost strong consistency, it's hard (impossible?) to get it back. So in that sense, Durable Objects are "strictly superior". But in a practical sense, you probably don't really want to do the work to implement your own KV store on top of Durable Objects; it's probably better to just use KV.

      • Scarbutt 5 years ago

        What's the size limit of a single 'Durable Object' ?

        • kentonv 5 years ago

          There's no hard limit, but given that a single durable object is single-threaded, storing a huge amount of data in a single object may make it hard to access that data. Also, the system may be less likely to migrate huge objects to move them closer to their users. So, we recommend aiming for small, fine-grained objects, kilobytes to megabytes in size. But there's nothing fundamentally preventing an object from growing to multiple gigabytes.

    • eastdakota 5 years ago

      We don’t charge for malicious traffic like DDoS. Compares favorably to other cloud providers who do.

      • a_imho 5 years ago

        It is reassuring to hear that, could this feature in the docs somewhere? For example when I'm researching this issue this is what comes up as the first result.

        https://community.cloudflare.com/t/how-to-protect-cloudflare...

        Workers Free Tier fail modes are straightforward and preferable for some of the scenarios I would use them for, but KV is only enabled by using Bundled.

      • nmjohn 5 years ago

        You don't charge for worker invocations on a L7 DDoS? How do you determine which requests to charge for and which to not charge for?

        Or is the claim your DDoS protection is good and accurate enough that there are 0 worker invocations to charge for because they all get blocked?

        • kentonv 5 years ago

          We aim to block attack traffic. If we fail to block an attack and you get charged for it, file a support request to ask for a credit.

    • brabel 5 years ago

      From what I understand after going through the docs, KV is currently the only way to store data in Workers, and Durable Objects are going to be the new alternative.

      KV is eventually consistent, appropriate for low-value data that is read a lot , written infrequently... Durable Objects will provide consistency, at the cost of not having the very low latency of KV because it has to run in a single location instead of on the edges. So there seems to be room for both solutions.

      About DDOS, I believe Cloudfare is a leader in ddos-protection, so I would hope they include protection in all their Workers (pls someone correct me if I'm wrong).

AnthOlei 5 years ago

I read your write up, but I’m not entirely clear on one part: how do cloudflare workers handle websocket connections? Are they automatically terminated after the worker spends too much time active? If so, doesn’t handling the WS handshake have a lot more overhead than a fetch call? Maybe I’m misunderstanding something here. One of my biggest issues with serverless is its inability to handle websockets in a sane way, so this would be huge.

  • jkarneges 5 years ago

    My understanding is a Durable Object is similar to a long running JS app in a respawnable/relocatable container (imagine a Kubernetes deployment of 1 pod). There is always exactly one instance, somewhere. And because the instance is long running, WebSockets become more practical.

    Note that because Durable Objects are long running computations, they are stateful and deployments are disruptive (clients disconnected). So even though you could potentially put them in the "serverless" category, the deployment experience isn't quite the same as short-lived serverless functions / lambdas.

    The real novelty of Durable Objects appears to be their intended usage of fine granularity (and the underlying tech that enables this). For example, if you were building a chat room service, you could have 1 Durable Object per room. Of course, you could conceivably build a chat room service running 1 Node.js process per chat room on traditional VMs or containers, but that probably wouldn't scale well.

  • greg-m 5 years ago

    Hey, I'm the PM at Cloudflare for WebSockets on Workers. Support is still in beta, so we're still working through the timeout details here.

    With the Workers Bundled plan (https://developers.cloudflare.com/workers/platform/pricing#b...), your WebSocket connection will stay open until your 50ms of CPU time expires. On Unbound (https://blog.cloudflare.com/introducing-workers-unbound/), which does not have a CPU time limit, your WebSocket connection will stay alive as long as it remains active and your Worker doesn't exceed its memory limits. If the connection goes idle, it may be terminated. We're currently considering an idle timeout on the order of 1-10 minutes.

    • dwwoelfel 5 years ago

      I'm having some trouble understanding the pricing of Workers Unbound. How much would it cost to keep one websocket connection open for a month?

      • kentonv 5 years ago

        We haven't published pricing for WebSockets yet. Obviously directly applying the Workers Unbound duration-based pricing wouldn't work very well; we'll figure out something better.

  • kentonv 5 years ago

    WebSockets are a feature of Durable Objects.

    Workers handle WebSockets in a pretty straightforward way. The server-side API is literally the same WebSocket JavaScript API as in browsers. sock.addEventListener("message", callback), etc.

    But if you aren't using Durable Objects, then WebSockets on Workers aren't particularly useful, because there's no way to contact the specific Worker instance that is handling a particular client's WebSocket session, in order to send a message down.

    Durable Objects fixes exactly that. Now you can have worker instances that are named, so you can call back to them.

    Here's a complete demo that uses WebSockets to implement chat: https://github.com/cloudflare/workers-chat-demo/blob/main/ch...

    > Are they automatically terminated after the worker spends too much time active?

    We're still tweaking timeouts and limits, but in general you should be able to keep a WebSocket alive long-term, and reconnect any time it drops (which any WebSocket application has to do anyway, the internet being unreliable).

    > If so, doesn’t handling the WS handshake have a lot more overhead than a fetch call?

    Not sure what you mean here. A WS handshake in itself isn't terribly expensive. I suppose a typical application will have to establish some state after the socket opens, and that could be expensive, but it depends on the app.

minxomat 5 years ago

I've reached out via the Beta form because I'm so damn excited for that. I have longed tried to map a project of mine (serverless game server for TGC based on workers) onto the available storage tech (KV), but consistency guarantees just weren't where I needed them to be. Hope they'll find a slot soon.

Edit:

(If someone from CF has questions, you can reach me at hn@elasticwaffle.com)

My use case is pretty much exactly the one described in the blog post (over on CF) for multiplayer games. Using DO for ongoing 1v1 matches in a trading-card-game style turn-by-turn strategy game. Players would play against a simple AI (which itself is a worker) or other players. The game is served from Workers Static Sites. But multiplayer required stronger coordination and the delay from KV writes to reach all PoPs is a deal-breaker. DO solve not only that, but the interface (just being classes) allows the game design to melt with the backend design.

  • geelenOP 5 years ago

    Beta invites are definitely flowing now, but I'd recommend pinging someone on Twitter or elsewhere with a quick description of what you're going to build. I think they had so many signups they're worried about opening the floodgates, but on a case-by-case basis there's plenty of room. Maybe?

  • throwaway894345 5 years ago

    I’d love to hear more about your use case and the consistency limitations you’re running into!

    • minxomat 5 years ago

      Feel free to reach out (email in parent post).

      • haneefmubarak 5 years ago

        I'm on the Cloudflare Workers team and while I can't help you get access any quicker than the usual route through the form, I'd still love to hear about your specific usecase(s) and limitations. Is it okay if I email you?

        • fiddlerwoaroof 5 years ago

          Does there happen to be a local simulator one could use to build out an app?

          • haneefmubarak 5 years ago

            I don't know about a full-blown local sim, but perhaps https://blog.cloudflare.com/trailblazing-a-development-envir... is close enough to what you have in mind?

            Also perhaps try playing around in https://cloudflareworkers.com/

            I don't believe Workers _Durable Objects_ specifically is available to play around with outside of the beta, however - apologies if that's what you actually wanted to play with. We're working hard on building and perfecting it into a product that everyone will be able to use soon, so keep an eye on this space.

            Is there a specific usecase you have in mind? I'd love to hear about it!

            • fiddlerwoaroof 5 years ago

              I had a little e-vite app I was trying to build with the KV store, but the consistency model wasn’t a great fit. I’ve sent an email in for the Durable Objects beta, but it’d be nice to have access to a simulator of some sort just so I can see if it’s a better fit.

              • haneefmubarak 5 years ago

                Let's talk consistency models and more! Is there a good email I could reach you at - or alternatively, could you send me an email with your requirements (haneef@)?

          • jitl 5 years ago

            I started building one of these as a weekend project on top of SQLite for funsies but I abandoned it during the election chaos. Oh well.

        • minxomat 5 years ago

          Yeah, go ahead.

mcintyre1994 5 years ago

This is a great write up, thanks for sharing it! I’ve built out a similar websockets on AWS setup at work to what they show at the end, and it’s definitely not as nice as this looks to work with. That was really just an MVP to push live results, I’d like to extend it to do logs/progress eventually but I’ll definitely evaluate Durable Objects before adding anything to that solution because this looks way better and cleaner.

Is there any sort of run it locally for testing story yet? In theory paid localstack supports websocket API gateways on the AWS side, though I haven’t played with that yet either so not sure how good it is. Looking at the API being used and the fact it’s all dynamic JS land, it looks like maybe you could inject some implementations in to run the websockets and store some state locally?

swyx 5 years ago

@glen - hey great piece!

Wondering if you've also tried the newer real-time sync services like replicache.dev or roomservice.dev for this usecase? since it seems to do the same thing except client-side. also on server-side curious if you've evaluated Temporal.io

  • geelenOP 5 years ago

    I have not! It was never such a pressing issue that I investiaged a dedicated solution. But given that we're already using Cloudflare, and that this is a solution to a much bigger problem (global coordination + actual storage), but it works really nicely for our smaller usage, it felt like a nice way to dip a toe into this new model.

gauravphoenix 5 years ago

it would be so great to see support for other languages like Java, Python, Go etc...

jdndbfbf 5 years ago

Are there any large companies that use cloudflare workers in production?

  • sfeng 5 years ago

    The list on their website is: 23andMe, Broadcom, Codepen, Discord, Doordash, Glossier, Marketo, Maxmind, npm, and ProPublica. That said it's common that companies won't want to be featured publicly in that way, so the real list is longer.

    Obviously Cloudflare is also a big user of Workers in production as well.

  • mattweinberg 5 years ago

    I run an agency and we’ve implemented Cloudflare Workers in production for very large companies, including doing the implementation for two of the big companies listed in the sibling comments on this thread.

    It works very well and their CLI tool Wrangler is easy to integrate into CI/CD. We’ll probably use it for more. Happy to answer questions people have: matt@happycog.com

  • jgrahamc 5 years ago

    Yes: https://www.cloudflare.com/case-studies/?usecase=Deploy+cust...

    From our Q3 earnings call:

    "Turning to Cloudflare Workers, it's incredibly exciting to see how the platform is taking off. In Q3, more than 27,000 developers wrote and deployed their first Cloudflare Workers. That's up from 15,000 a year ago. History proves with new computing platforms, the more developers they have, the more quickly they improved and the more likely they are to win. Looking at GitHub and other sources of data on developer engagement, we believe more developers right deploy real applications and code on Cloudflare Workers every month than every other edge computing platforms combined. So what are they building?

    - One of the most viewed publications during the 2020 elections used Cloudflare Workers to power their elections news platform and ensure it scaled during the unprecedented spike in traffic last Tuesday as well as Wednesday and today.

    - A popular health foods company uses Workers to power their online ordering system.

    - An online marketing firm working with major brands uses Workers to customize content on a per visitor basis.

    - A publicly traded electronics testing firm use Workers to bridge their on-premise and cloud-based infrastructure.

    - An innovative start-up is using Workers to power an online crypto scavenger hunt.

    - And one of the largest online learning platforms uses Workers to deliver their customized content during this time of skyrocketing demand.

    It's great to see more use cases every quarter, but I think we're just scratching the service. Most use cases today have focused on performance. Over time, I expect those use cases will pale in comparison to what is a much bigger opportunity, helping customers manage the challenges of compliance. As governments around the world increasingly insist on data localization and data regency, sending all your users' data back to AWS feeds for processing will become unacceptable. What our largest, most sophisticated, most compliant sensitive customers are looking to Workers for is as a way to manage this increasingly complex regulatory environment. That's why during Cloudflare's Birthday Week, our announcement of Durable Objects may have been one of the most important edge computing developments you may have missed. Durable Objects allows developers to define a data structure and store it safely on our network close to users that need to access it in order to ensure performance and consistency. It also allows developers to define where that data can move across our network and where it cannot, such as this user's data may never leave the EU or this user's data may never leave Brazil.

    Given Cloudflare's network spans more than 200 cities in more than 100 countries worldwide, Durable Objects provides fine-grained control over where data is stored and processed. That functionality is critical for the increasingly complex compliance challenges that face every global company today. In other words, the future of edge computing will be defined as much by intelligent edge storage as it is by computing. And while others are still working to launch for edge computing platforms, we have products like Durable Object in market that are defining that future today."

    • ignoramous 5 years ago

      We run censorship resistant proxies on Workers neatly domain fronted (well, IP fronted) by Cloudflare IPs [0]. It works so well.

      > Given Cloudflare's network spans more than 200 cities in more than 100 countries worldwide...

      I think, only the enterprise customers can truly claim benefit of all 200 PoPs. Free/Pro/Business(?) plans aren't necessarily routed to all 200. If that's not the case, then it doesn't match our interaction with Cloudflare's support.

      ---

      That said, I absolutely love Workers. It is quite easily the best value for money of any edge computing platform. This blog post drives those arguments home: https://medium.com/@zackbloom/serverless-pricing-and-costs-a...

      [0] On the flip side, because of CNAME Flattening, it is hard to block privacy eroding solutions such as these: https://www.cloudflare.com/apps/google-analytics

rawoke083600 5 years ago

PS. Live Demo link at the bottom of page is broken.

mrkeen 5 years ago

So is it CA or CP? No, just C.

  • ithkuil 5 years ago

    Being "partition tolerant" just means the other aspect holds: CP means "consistent in case of a partition", while AP means "available in case of a partition".

    A CP system cannot be available in both partitions, since by definition the two partitions cannot communicate (otherwise they wouldn't be partitions) and thus it's logically impossible for two clients in two different partitions to affect a shared state consistently. Thus at least in one partition the service will be unavailable.

    Forcing all state changes go through a single "master^Wmain" node (for some key/shared) is a simple way to achieve CP.

    This main node can change over time, but in case if a partition it can never move to side of the partition which has no quorum for a master election.

  • eastdakota 5 years ago

    CP. Sacrifices some Availability in the form of latency if accessed from outside local region.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection