Ask HN: Are you using actors in production? Why/Why not?
In the realm of distributed computing, it seems like there are folks that believe very strongly in the actor pattern. However, they generally seem to be pretty niche? That said, this perspective could be entirely due to my bubble, and so I was curious to hear from folks that are using them in production, and how well they’ve worked out in practice.
Or, if you’ve evaluated using them in production, but decided against it, I’d love to hear about that as well! At Discord our entire real time system is built ontop of Elixir. Everything is an "actor" in that model. Every single Discord server, websocket connection, voice call, screenshare, etc... distributed using a consistent hash ring. It's an incredibly great model for these things. We've been able to scale this system from hundreds to hundreds of millions of users with very little changes to the underlying architecture. I love the Discord developer blog. You folks really know your stuff. I have been using Akka in production for several years now and it's one of the best things I ever did. The actor model itself is really good, it makes you think about problems differently, and in my opinion it's better than traditional OOP. The only issue I have had with Akka is with the tooling around metrics/logging of async calls. The rest, what can I say, it's in production and I am barely on call because of issues- certainly not because I write bug free code, but because I believe the framework is really solid, while the actor model makes you think in a different and probably more simplified way which might help with developing simpler solutions. I have also the feeling that it's niche, because of simpler frameworks like Springboot that offer most of what you need out of the box and it doesn't require additional "learning". Yeah the debugging story around async stuff is not ideal, but I don’t what part of the stack is really at fault there. I want what Tokio recently got. I agree with the niche use case. Using them in production here for 4 years (Akka) with our own actors that dynamically spawn each other depending on the load (along with some messages sent with timers). You usually have to think about how to correctly handle lost messages or full mailboxes, but once you do, it usually works reliably. I think actors are more of a low level concurrency Lego block you can use to build higher level stuff with (just like akka streams, etc.) If you want to use them, I strongly encourage types actors, so unlike Erlang or Elixir and more like Akka or in the Pony language: it really helps with not missing messages because the compiler can make sure you don’t forget anything. Akka monitoring is not great without paying for the monitoring tool, though. How much of the value in actors do you think is related to the awesomeness of the Akka framework specifically? I only ask since it seems like there’s a decent correlation between people using Akka, and those that identify as using “actors”. In general, I’m curious whether there’s significant value in someone who is using Node or Python (for example), to look for a way to use the actor pattern. Or if actors are primarily great when used along with platforms that specifically “elevate” this concurrency model? (e.g. Erlang/Elixir, Java/Scala, .NET) Imho its not so much the actor pattern but the single writer principle. actors are a tool which is not to used solely. Need encapsulate state? Actors can help. Need concurrency? Perhaps futures are a better fit. the question you ask is problatic at it sees actors as a single solution. Let me second this. For example in Go, the simplest way to use a channel is to queue work to it from several endpoints. Then another Go routine can pull it off, usually becoming the exclusive writer of whatever the next step the computation is. So even though Go has this green/micro thread behavior behind the scenes (go routines), it’s main innovation is around the concurrency safe queue (the channel) so those threads can communicate. Now with actors you can imagine more of a pattern where each holds its own state and messages cause them to modify themselves or send messages to other actors. This behaves more like AI in a game simulation and is useful if independence rather than coordination is a good solution to the problem. For lots of services this is not the case as the fastest path through all the queues is the best path and the problem isn’t about deciding what to do independently or hiding state within the actor but throughput through known paths. So that’s the difference. If you are tagging video with ML you want queues and throughput, if you are reacting to routing networking messages based on simple rules (erlang) you want actors. I _really_ appreciate this reply! Naively speaking, when I look at actors, I just can’t think of why they’d be superior to coordinated queues/channels, except for within an arbitrarily configurable, rules-based system (e.g. game NPC behavior, network routing). But I couldn’t help but assume this was dramatically oversimplifying the problem/missing the point. If someone were using Go routines and channels, what do you think is the “clear” moment that they would benefit from actors? When the sequence of steps in a workflow are non-deterministic? Because they scale to more than one node is a very important feature. True! But you don’t need actors to enable horizontal scaling? We use them both implicitly (underlying Akka Streams) and explicitly (both classic untyped actors and typed actors). For the explicit actors, we use them to model state machines around I/O (e.g. stateful protocols). I've found that the application area is fairly niche, as many patterns of async work are clearer through queues, futures, or streams. The use of actors is insulated from the rest of the application code through interfaces that use futures or streams. But, internally, if you have to manage complex state where events can occur at any time and a mail-box like/internal queue is sufficient, then they tend to be easy to understand... once the initial ramp-up period is over. Additionally, I've found them to require very little maintenance as developers tend to get to 100% _flow_ test coverage without a lot of difficulty. Are you using Akka for your explicit actors as well? And could you share the distinction between typed and untyped actors? Is that related to how the caller addresses/accesses methods on that actor? Yes, we use Akka for all the actors. Akka has two types of actors: typed and untyped. Typed actors allow the compiler to type check messages, while untyped (or classic actors) perform runtime checking. This also means references to actors can be typed, so if you have multiple implementations of actors that implement the same protocol, you can substitute between the different actors and verify the protocol at compile time. https://doc.akka.io/docs/akka/current/typed/from-classic.htm... I had played with Erlang before using Scala/Akka, so untyped actors were a familiar experience. My team decided to use typed actors going forward after an Akka version update since that seems to be the strategic direction of Akka (and it does help to have the compiler complain if we try sending a message that the actor doesn't understand). Ah! That makes total sense. Thanks so much for that additional context. I worked on a project where a device was to be paired to Bluetooth Low Energy device by a non-technical user and send that data somewhere through 3G dongle. It had to be untouched, update its own software, work all the time, recover when there was any problem (lost BLE connection, poor internet coverage, sync data, take a shot at a deliver-once scheme, handle connection errors, handle device swap, and a variety of other issues - I had actual, literal, nightmares related to UTC and time zones - We used the actor model for that project because it made it easier to deal with exceptions. Even things that usually worked didn't at some point, and to take into account all that could go wrong would have made it much more diffcult because there was always something new that went wrong. I use Akka.NET in production. It’s been good and I don’t have any major concerns, but I wonder sometimes about depending on a third-party library in case it becomes obsolete. One difficulty coming from non-actor systems is getting a callback or response from messages. By default, actor messages are outgoing-only so the idea of a callback or response message needs to be implemented on top. We run Flink for stream processing and Spark for batch processing, both of which are backed by Akka (and therefore the actor model). Very dumb question: when you say backed by Akka, what exactly do you mean? That you’ve configured a “binding” between Flint/Spark and Akka, which spins up actor instances (Java/Scala classes?) whenever some events occur? Or that Flint/Spark use Akka behind the scenes? They both use Akka behind the scenes. I don’t know the internal details exactly, but they’re both capable of showing you the processing pipeline you’ve asked for as a DAG visualization. I think what’s happening is something like each node on that DAG is an actor, receiving data from the previous node and sending it along to the next. You don’t write that though, you just write map/filter/etc. flink uses akka for control plane kind of stuff. data plane stuff, like piping data between tasks, doesn't. check out https://flink.apache.org/2019/06/05/flink-network-stack.html for some (maybe outdated) details. I'm using both Scala/Akka and Elixir in production. Once you get to better know the paradigm, you'll never ever want anyrhing else. And if by niche you understand resilient backends, yes, it's pretty niche then :) Any insights that you’d want to share with someone not using Java/Scala or Erlang/Elixir? I’d love to get the benefits of actors in Node, Python or Go, but I’m not entirely sure what I’m missing yet. Not very sure, but actors kinda make sense in multi-threaded environments, so you can forget about Node and Python. In Go, you can replace easily actors with goroutines and channels. Life is but a stage, and we are all actors upon it. It's impossible to make many useful statements regarding architecture unless there is proper context. Goals, resources, constraints, expectations. Use the right tool for the job. Then how do any of these distributed systems tools/frameworks write a meaningful or concise landing page? Making informed trade-offs are obviously important. But in a world of saturated information, it seems like there’s even more value in being hyper clear about what the purpose of a thing is? And even better, what it’s _not_. Unfortunately, many of these tools claim to be broadly applicable, in a way that makes it hard to understand their true purpose, without also investing significant time into them. That seems kind of unfortunate. Yeah, well, the logical reaction to that is to use minimal tools and unix and actually measure a supposed fault before attempting to work on it. First get it working, then get it working well. Usually the overriding business architectural considerations are achieving functionality within reasonable time and money, using existing HR, and without creating too much technical debt. Characteristics of specific components are generally not high on the list. This can be different in high availability systems, safety related systems, long-term stability fault-recovery-required systems and other special use cases. But for 99% of enterprise, HTTP @ 10Mbps + solid tools like the filesystem or sqlite3 will do it. Scaling becomes easy because you're on 100% known interfaces, so swapping any given component out is trivial, ie. future-proofing exists by default. Interfaces before implementation. For more wisdom tidbits try https://github.com/globalcitizen/taoup ;)