Settings

Theme

Java Virtual Threads: A Case Study

infoq.com

167 points by mighty_plant a year ago · 198 comments

Reader

pron a year ago

Virtual threads do one thing: they allow creating lots of threads. This helps throughput due to Little's law [1]. But because this server here saturates the CPU with only a few threads (it doesn't do the fanout modern servers tend to do), this means that no significant improvements can be provided by virtual threads (or asynchronous programming, which operates on the same principle) while keeping everything else in the system the same, especially since everything else in that server was optimised for over two decades under the constraints of expensive threads (such as the deployment strategy to many small instances with little CPU).

So it looks like their goal was: try adopting a new technology without changing any of the aspects designed for an old technology and optimised around it.

[1]: https://youtu.be/07V08SB1l8c

  • stelfer a year ago

    It goes deeper than Little's Law. Every decent textbook on introductory queuing theory has the result that on a normalized basis, fast server > multi-server > multi-queue. That analysis admits almost arbitrary levels of depth of analysis and still holds true.

    Your observation that computing architectures have chased fast server for decades is apt. There's a truism in computing that those who build systems are doomed to relearn the lessons of the early ages of networks, whether they studied them in school or not. But kudos to whoever went through the exercise again.

  • jayceedenton a year ago

    I guess at least their work has confirmed what we probably already knew intuitively: if you have CPU-intensive tasks, without waiting on anything, and you want to execute these concurrently, use traditional threads.

    The advice "don't use virtual threads for that, it will be inefficient" really does need some evidence.

    Mildly infuriating though that people may read this and think that somehow the JVM has problems in its virtual thread implementation. I admit their 'Unexpected findings' section is very useful work, but the moral of this story is: don't use virtual threads for this that they were not intended for. Use them when you want a very large number of processes executing concurrently, those processes have idle stages, and you want a simpler model to program with than other kinds of async.

    • pron a year ago

      I'll put it this way: to benefit from virtual threads (or, indeed, from any kind of change to scheduling, such as with asynchronous code) you clearly need 1. some free computational resources and 2. lots of concurrent tasks. The server here could perhaps have both with some changes to its deployment and coding style, but as it was tested -- it had neither. I'm not sure what they were hoping to achieve.

  • hitekker a year ago

    This take sounds reasonable to me. But I'm not an expert, and I'd be curious to hear an opposing view if there's one.

    • michaelt a year ago

      Standard/OS threads in Java use about a megabyte of memory per thread, so running 256 threads uses about 256 MB of memory before you've even started allocating things on the heap.

      Virtual threads are therefore useful if you're writing something like a proxy server, where you want to allow lots of concurrent connections, and you want to use the familiar thread-per-connection programming model.

      • layer8 a year ago

        Only address space of 1 MB is reserved (which can still be a problem), actual memory usage is limited to the memory pages that are actually accessed by the program within that address space.

    • kaba0 a year ago

      He is as much of an expert as it gets, as he is the leader of the Loom project.

    • binary132 a year ago

      Greenlets ultimately have to be scheduled onto system threads at the end of the day unless you have a lightweight thread model of some sort supported by the OS, so it’s a little bit misleading depending on how far down the stack you want to think about optimizing for greenlets. You could potentially have a poor implementation of task scheduling for some legacy compatibility reason, however. I guess I’d be curious about the specifics of what pron is discussing.

      • troupo a year ago

        Even though yes, in the end you have to map onto system threads, there are still quite a fee things you can do. But this is infeasible for Java, unfortunately.

        For example, in Erlang the entire VM is built around green threads with a huge amount of guarantees and mechanisms: https://news.ycombinator.com/item?id=40989995

        When your entire system is optimized for green threads, the question of "it still needs to map onto OS threads" loses its significance

        • binary132 a year ago

          I really don’t think it’s useful to be this nonspecific. You could give an example of what a Java greenlet cannot do or how it cannot be optimized, for example. If your whole point is actually just “I prefer the semantics of BEAM threads”, then just say that.

          • troupo a year ago

            Those semantics are exactly what cannot be done in Java for many reasons (including legacy code etc.).

            And yes, those semantics are important, but sadly most people stop at "yay we have green threads now" and then a null pointer exception kills their entire app, or the thread that handles requests, or...

            • binary132 a year ago

              So let’s be clear, your point is that you find the API of non-BEAM greenlets less useful, not that they’re somehow necessarily less efficient. Right?

        • MaxBarraclough a year ago

          > When your entire system is optimized for green threads, the question of "it still needs to map onto OS threads" loses its significance

          How's that? What about parallelism?

cayhorstmann a year ago

I looked at the replication instructions at https://github.com/blueperf/demo-vt-issues/tree/main, which reference this project: https://github.com/blueperf/acmeair-authservice-java/tree/ma...

What "CPU-intensive apps" did they test with? Surely not acmeair-authservice-java. A request does next to nothing. It authenticates a user and generates a token. I thought it at least connects to some auth provider, but if I understand it correctly, it just uses a test config with a single test user (https://openliberty.io/docs/latest/reference/config/quickSta...). Which would not be a blocking call.

If the request tasks don't block, this is not an interesting benchmark. Using virtual threads for non-blocking tasks is not useful.

So, let's hope that some of the tests were with tasks that block. The authors describe that a modest number of concurrent requests (< 10K) didn't show the increase in throughput that virtual threads promise. That's not a lot of concurrent requests, but one would expect an improvement in throughput once the number of concurrent requests exceeds the pool size. Except that may be hard to see because OpenLiberty's default is to keep spawning new threads (https://openliberty.io/blog/2019/04/03/liberty-threadpool-au...). I would imagine that in actual deployments with high concurrency, the pool size will be limited, to prevent the app from running out of memory.

If it never gets to the point where the number of concurrent requests significantly exceeds the pool size, this is not an interesting benchmark either.

pansa2 a year ago

Are these Virtual Threads the feature that was previously known as “Project Loom”? Lightweight threads, more-or-less equivalent to Go’s goroutines?

  • giamma a year ago

    Yes, at EclipseCon 2022 an Oracle manager working on the Helidon framework presented their results replacing the Helidon core, which was based on Netty (and reactive programming) with Virtual Threads (using imperative programming). [1].

    Unfortunately the slides from that presentation were not uploaded to the conference site, but this article summarizes [2] the most significant metrics. The Oracle guy claimed that by using Virtual Threads Oracle was able to implement, using imperative Java, a new engine for Helidon (called Nima) that had identical performance to the old engine based on Netty, which is (at least in Oracle's opinion) the top performing reactive HTTP engine.

    The conclusion of the presentation was that based on Oracle's experience imperative code is much easier to write, read and maintain with respect to reactive code. Given the identical performance achieved with Virtual Threads, Oracle was going to abandon reactive programming in favor of imperative programming and virtual threads in all its products.

    [1] https://www.eclipsecon.org/2022/sessions/helidon-nima-loom-b...

    [2] https://medium.com/helidon/helidon-n%C3%ADma-helidon-on-virt...

  • pgwhalen a year ago

    Yes. It's not that the feature was previously known under a different name - Project Loom is the OpenJDK project, and Virtual Threads are the main feature that has come out of that project.

  • tomp a year ago

    They're not equivalent to Go's goroutines.

    Go's goroutines are preemptive (and Go's development team went through a lot of pain to make them such).

    Java's lightweight threads aren't.

    Java's repeating the same mistakes that Go made (and learned from) 10 years ago.

    • unscaled a year ago

      I would put it more charitably as "Java Virtual Threads are new and have not seen massive use and optimization yet".

      This is crucial, because Java wouldn't necessarily require the same optimizations Go needed.

      Making Virtual Threads fully preemptive could be useful, but it's probably not as crucial as it was for Go.

      Go does not have a native mechanism to spawn OS threads that are separate from the scheduler pool, so if you want to run a long CPU-heavy task, you can only run it on the same pool as you run your I/O-bound Goroutines. This could lead to starvation, and adding partial preemption and later full preemption was a neat way to solve that issue.

      On the other hand, Java still has OS threads, so you can put those long-running CPU-bound tasks on a separate thread-pool. Yes, it means programmers need to be extra careful with the type of code they run on Virtual Threads, but it's not the same situation as Go faced: in Java they DO have a native escape hatch.

      I'm not saying a preemptive scheduler won't be helpful at Java, but it just isn't as direly needed as it was with Go. One of the most painful issues with Java Virtual Threads right now is thread pinning when a synchronized method call is executed. Unfortunately, a lot of existing Java code is heavily using synchronized methods[1], so it's very easy to unknowingly introduce a method call that pins an OS thread. Preemeptive could solve this issue, but it's not the only way to solve it.

      ---

      [1] One of my pet peeves with the Java standard library is that almost any class or method that was added before Java 5 is using synchronized methods excessively. One of the best examples is StringBuffer, the precursor of StringBuilder, where all mutating methods are synchronized, as if it was a common use case to build a string across multiple threads. I'm still running into StringBuffers today in legacy codebases, but even newer codebases tend to use synchronized methods over ReentrantLocks or atomic operations, since they're just so easy to use.

    • jayd16 a year ago

      Virtual threads could be scheduled pre-emptively but currently the scheduler will wait for some kind of thread sleep to schedule another virtual thread. That's just a scheduler implementation detail and the spec is such that a time slice scheduler could be implemented.

      • tomp a year ago

        Yes, but the problem is that the spec is such that preemptive blocking doesn't need to be implemented.

        That means that Java programmers have to be very careful when writing code, lest they block the entire underlying (OS) thread!

        Again, Go already went through that experience. It was painful. Java should have learned and implemented it from the start

        • SureshG a year ago

          > That means that Java programmers have to be very careful when writing code

          From JEP 444:

          The scheduler does not currently implement time sharing for virtual threads. Time sharing is the forceful preemption of a thread that has consumed an allotted quantity of CPU time. While time sharing can be effective at reducing the latency of some tasks when there are a relatively small number of platform threads and CPU utilization is at 100%, it is not clear that time sharing would be as effective with a million virtual threads.

          Also, in this scenario, i think the current scheduler (ForkJoin Pool) will use managed blocker to compensate those pinned carrier threads.

        • jayd16 a year ago

          I don't know. The language already has Thread.Yield. If your use case is such that you have starvation and care about it, it seems trivial to work around.

          Still, an annoying gotcha if it hits you unexpectedly.

      • nimish a year ago

        This is a really unfortunate gotcha that's not at all obvious. Does it kick preemption up a layer to the OS then?

        • Jtsummers a year ago

          The "not at all obvious" gotcha is described in the documentation near the top, under the heading "What is a Virtual Thread?":

          https://docs.oracle.com/en/java/javase/21/core/virtual-threa...

          > Like a platform thread, a virtual thread is also an instance of java.lang.Thread. However, a virtual thread isn't tied to a specific OS thread. A virtual thread still runs code on an OS thread. However, when code running in a virtual thread calls a blocking I/O operation, the Java runtime suspends the virtual thread until it can be resumed. The OS thread associated with the suspended virtual thread is now free to perform operations for other virtual threads.

          It's not been hidden at all in their presentation on virtual threads.

          The OS thread that the virtual thread is mounted to can still be preempted, but that won't free up the OS thread for another virtual thread. However, if you use them for what they're intended for this shouldn't be a problem. In practice, it will be because no one can be bothered to RTFM.

          • nimish a year ago

            All that says is the Java runtime will suspend on blocking IO, not that it _only_ suspends on blocking IO.

            > what they're intended for

            Java prides itself on careful and deliberate changes to eliminate foot guns, but this seems like a pretty major restriction. Usually these kinds of cooperative threads are called fibers or something else to distinguish them from truly preempt-able threads.

            Expecting developers to read the minutiae of documentation (there's another restriction around synchronized blocks) is a fool's errand TBH. Principle of least surprise, etc.

          • cstrahan a year ago

            As a sibling comment points out, there's nothing in what you quoted that logically implies that blocking I/O is the only reason for a virtual thread to be suspended.

            The best info I could find was this blog post:

            https://blogs.oracle.com/javamagazine/post/going-inside-java...

            "Virtual threads, however, are handled differently than platform threads. None of the existing schedulers for virtual threads uses time slices to preempt virtual threads."

            The next handful of paragraphs are also interesting.

  • Skinney a year ago

    Yes

exabrial a year ago

What is the virtual thread / event loop pattern seeking to optimize? Is it context switching?

A number of years ago I remember trying to have a sane discussion about “non blocking” and I remember saying “something” will block eventually no matter what… anything from the buffer being full on the NIC to your cpu being at anything less than 100%. Does it shake out to any real advantage?

  • gregopet a year ago

    It's a brave attempt to release the programmer from worrying or even thinking about thread pools and blocking code. Java has gone all in - they even cancelled a non-blocking rewrite of their database driver architecture because why have that if you won't have to worry about blocking code? And the JVM really is a marvel of engineering, it's really really good at what it does, so what team to better pull this off?

    So far, they're not quite there yet: the issue of "thread pinning" is something developers still have to be aware of. I hear the newest JVM version has removed a few more cases where it happens, but will we ever truly 100% not have to care about all that anymore?

    I have to say things are already pretty awesome however. If you avoid the few thread pinning causes (and can avoid libraries that use them - although most of not all modern libraries have already adapted), you can write really clean code. We had to rewrite an old app that made a huge mess tracking a process where multiple event sources can act independently, and virtual threads seemed the perfect thing for it. Now our business logic looks more like a game loop and not the complicated mix of pollers, request handlers, intermediate state persisters (with their endless thirst for various mappers) and whatnot that it was before (granted, all those things weren't there just because of threading.. the previous version was really really shitily written).

    It's true that virtual threads sometimes hurt performance (since their main benefit is cleaner simpler code). Not by much, usually, but a precisely written and carefully tuned piece of performance critical code can often still do things better than automatic threading code. And as a fun aside, some very popular libraries assumed the developer is using thread pools (before virtual threads, which non trivial Java app didn't? - ok nobody answer that, I'm sure there are cases :D) so these libraries had performance tricks (ab)using thread pool code specifics. So that's another possible performance issue with virtual threads - like always with performance of course: don't just assume, try it and measure! :P

    • pragmatick a year ago

      > although most of not all modern libraries have already adapted

      Unfortunately kafka, for example, has not: https://github.com/spring-projects/spring-kafka/commit/ae775...

    • haspok a year ago

      Just a side note, async JDBC was a thing way before Loom came about, and it failed miserably. I'm not sure why, but my guess would be is that most enterprise software is not web-scale, so JDBC worked well as it was.

      Also, all the database vendors provided their drivers implementing the JDBC API - good luck getting Oracle or IBM contribute to R2DBC.. (Actually, I stand corrected: there is an Oracle R2DBC driver now - it was released fairly recently though.)

      EDIT: "failed miserably" is maybe too strong - but R2DBC certainly doesn't have the support and acceptance of JDBC.

      • vbezhenar a year ago

        R2DBC allows to efficiently maintain millions of connections to the database. But what database supports millions of connections? Not postgres for sure, and probably no other conventional database. So using reactive JDBC driver makes little sense, if you're going to use 1000 connections, 1000 threads will do just fine and bring little overhead. Those who use Java, don't care about spending 100 more MB of RAM when their service already eats 60GB.

        • merb a year ago

          Reactive drivers were not about 1000 connections, they were about reusing a single connection better, by queuing a little bit more efficient over a single connection. Reactive programming is not about parallelism, it’s about concurrency.

          • vbezhenar a year ago

            It is not possible to reuse a single connection better, if we're talking about postgres. You must conduct the transaction over a single connection and you cannot mix different transactions simultaneously over a single connection. That's the way postgres wire protocol works. I think that there's some rudimentary async capabilities, but they don't change anything fundamentally.

            It might be different for some exotic databases, but I don't see any reason why ordinary JDBC driver couldn't reuse single TCP connection for multiple logical JDBC connections in this case.

      • frevib a year ago

        It could also be that there just isn’t enough demand for a non-blocking JDBC. For example, Postgresql server is not coping very well with lots of simultaneous connections, due to it’s (a.o.) process-per-connection model. From the client-side (JDBC), a small thread poool would be enough to max out the Postgresql server. And there is almost no benefit of using non-blocking vs a small thread pool.

        • haspok a year ago

          I would argue the main benefit would be that the threadpool that the developer would create anyway would instead be created by the async database driver, which has more intimate knowledge about the server's capabilities. Maybe it knows the limits to the number of connections, or can do other smart optimizations. In any case, for the developer it would be a more streamlined experience, with less code needed, and better defaults.

          • frevib a year ago

            I think we’re confusing async and non-blocking? Non-blocking is the part what makes virtual threads more efficient than threads. Async is the programming style; e.g. do things concurrently. Async can be implemented with threads or non-blocking, if the API supports it. I was merely arguing that a non-blocking JDBC has little merit as the connections to a DB are limited. Non-blocking APIs are only beneficial when there are lots, > 10k connections.

            JDBC knows nothing about the amount of connections a server can handle, but to try so many connections until it won’t connect any more.

            | In any case, for the developer it would be a more streamlined experience, with less code needed, and better defaults.

            I agree it would be best not to bother the dev with what is going on under the hood.

    • exabrial a year ago

      Thank you for a A very candid response I enjoyed reading it!

      My question is though: Why even do alleged “non-blocking” _at all_? What are people trying to optimize against?

      • jandrewrogers a year ago

        The short answer is that blocking is expensive due to the overhead of the implied context switch and poor locality. As computers become faster, a larger percentage of the CPU time is dedicated to context-switching overhead and non-blocking architectures eliminate that. For applications like databases where this problem is more severe, the difference in throughput between a blocking architecture and a non-blocking architecture can be 10x on the same hardware, so it is a very important optimization if you want your software to have performance that is competitive.

        A modern thread-per-core shared-nothing architecture takes this even further and tries to eliminate blocking at the hardware level for the same basic reason.

    • immibis a year ago

      So... What is it seeking to optimize? Why did you need a thread pool before but not any more? What resource was exhausted to prevent you from putting every request on a thread?

      • chipdart a year ago

        > So... What is it seeking to optimize?

        The goal is to maximize the number of tasks you can run concurrently, while imposing on the developers a low cognitive load to write and maintain the code.

        > Why did you need a thread pool before but not any more?

        You still need a thread pool. Except with virtual threads you are no longer bound to run a single task per thread. This is specially desirable when workloads are IO-bound and will expectedly idle while waiting for external events. If you have a never-ending queue of tasks waiting to run, why should you block a thread consuming that task queue by running a task that stays idle while waiting for something to happen? You're better off starting the task and setting it aside the moment it awaits for something to happen.

        > What resource was exhausted to prevent you from putting every request on a thread?

        • riku_iki a year ago

          > why should you block a thread

          if creating gazillion threads on modern hardware is super cheap why not? I have transparency and debuggability what threads are running, can check stacktrace of each and what are they blocked on.

          virtual threads adds lots of magic under the hood, and if there will be some bug or lib in your infra with no vthreads support it is absolutely not clear how to debug it.

          • chipdart a year ago

            > if creating gazillion threads on modern hardware is super cheap why not?

            Virtual threads are a performance improvement over threads, no matter how cheap to create threads are. Virtual threads run on threads. If threads become cheaper to create, so do virtual threads. They are not mutually exclusive.

            Virtual threads are on top of that a developer experience improvement. Code is easier to write and maintain.

            Virtual threads improve throughput because the moment a task is waiting for anything like IO, the thread is able to service any other task in the queue.

            • riku_iki a year ago

              > Virtual threads are on top of that a developer experience improvement. Code is easier to write and maintain.

              except now you need to prove somehow that all 100 libs in your project support virtual threads.

              > Virtual threads improve throughput because the moment a task is waiting for anything like IO, the thread is able to service any other task in the queue.

              from reading similar discussions, linux for example doesn't have true IO async API, you just push lock of Java thread to lock of thread in the kernel

          • stoperaticless a year ago

            Each thread adds overhead.

            Some usage types don’t care, some do.

            From what I gather virtual threads are an alternative to “callback-hell” (js) or async coloring (python).

            • riku_iki a year ago

              > Some usage types don’t care, some do.

              I suspect if you care about threads overhead, you won't pick Java, because there will be overhead in other areas too

              > From what I gather virtual threads are an alternative to “callback-hell” (js) or async coloring (python).

              there is also existing ExecutorService and futures in Java

              • stoperaticless a year ago

                > there is also existing ExecutorService and futures in Java

                Yes, virtual threads are an alternative also to those. (Kind of)

                • riku_iki a year ago

                  And my frustration is that java had that API for 20 years, it is used everywhere and absolutely battle tested, and now they are adding those virtual threads which break third party libs and make JVM more complicated with various degradations in exchange of benefits most will not notice..

      • gregopet a year ago

        It's mainly trying to make you not worry about how many threads you create (and not worry about the caveats that come with optimising how many threads you create, which is something you are very often forced to do).

        You can create a thread in your code and not worry whether that thing will then be some day run in a huge loop or receive thousands of requests and therefore spend all your memory on thread overhead. Go and other languages (in Java's ecosystem there's Kotlin for example) employ similar mechanisms to avoid native thread overhead, but you have to think about them. Like, there's tutorial code where everything is nice & simple, and then there's real world code where a lot of it must run in these special constructs that may have little to do with what you saw in those first "Hello, world" samples.

        Java's approach tries to erase the difference between virtual and real threads. The programmer should have to employ no special techniques when using virtual threads and should be able to use everything the language has to offer (this isn't true in many languages' virtual/green threads implementations). Old libraries should continue working and perhaps not even be aware they're being run on virtual threads (although, caveats do apply for low level/high performance stuff, see above posts). And libraries that you interact with don't have to care what "model" of green threading you're using or specifically expose "red" and "blue" functions.

        • giamma a year ago

          You will still have to worry, too many virtual threads will imply too much context switching. However, virtual threads will be always interruptable on I/O, as they are not mapped to actual o.s. threads, but rather simulated by the JVM which will executed a number of instructions for each virtual thread.

          This gives the chance to the JVM to use real threads more efficiently, avoiding that threads remain unused while waiting on I/O (e.g. a response from a stream). As soon as the JVM detects that a physical thread is blocked on I/O, a semaphore, a lock or anything, it will reallocate that physical thread to running a new virtual thread. This will reduce latency, context switch time (the switching is done by the JVM that already globally manages the memory of the Java process in its heap) and will avoid or at least largely reduce the chance that a real thread remains allocated but idle as it's blocked on I/O or something else.

          • frant-hartm a year ago

            What do you mean by context switching?

            My understanding is that virtual threads mostly eliminate context switching - for N CPUs JVM creates N platform threads and they run virtual threads as needed. There is no real context switching apart from GC and other JVM internal threads.

            A platform thread picking another virtual thread to run after its current virtual thread is blocked on IO is not a context switch, that is an expensive OS-level operation.

            • giamma a year ago

              The JVM will need to do context switching when reallocating the real thread that is running a blocked virtual thread to the next available virtual thread. It won't be CPU context switching, but context switching happens at the JVM level and represents an effort.

              • frant-hartm a year ago

                Ok. This JVM-level switching is called mounting/un-mounting of the virtual thread and is supposed to be several orders of magnitude cheaper compared to normal context switch. You should be fine with millions of virtual threads.

            • anonymousDan a year ago

              Does Java's implementation of virtual threads perform any kind of work stealing when a particular physical thread has no virtual threads to run (e.g. they are all blocked on I/O)?

              • mike_hearn a year ago

                It does. They get scheduled onto the ForkJoinPool which is a work stealing pool.

            • immibis a year ago

              "they run virtual threads as needed" - so when one virtual thread is no longer needed and another one is needed, they switch context, yes?

              • frant-hartm a year ago

                This is called mounting/un-mounting and is much cheaper than a context switch.

                • immibis a year ago

                  This is a type of context switch. You are saying dollars are cheaper than money.

                  • peeters a year ago

                    It's been a really long time since I dealt with anything this low level, but in my very limited and ancient experience when people talk about context switching they're talking specifically about the userland process yielding execution back to the kernel so that the processor can be reassigned to a different process/thread. Naively, if the JVM isn't actually yielding control back to the kernel, it has the freedom to do things in a much more lightweight manner than the kernel would have to.

                    So I think it's meaningful to define what we mean by context switch here.

                    • giamma a year ago

                      When a real thread is allocated from a virtual thread to another, the JVM needs to save into the heap the stack of the first virtual thread and restore from the heap the stack of the second virtual thread, see slide 13 of [1]. This is in fact called mounting/unmounting as already pointed out, and occurs via Java Continuation, but from the JVM perspective this is a context switch. It's called JVM and the V stands for Virtual, so yes, it's not the kernel doing it, but it's happening, and it's more frequent the more virtual threads you have in your application.

                      [1] https://www.eclipsecon.org/sites/default/files/slides/JavaLo...

                    • immibis a year ago

                      swapcontext(3) is a userland context switch, and is named so.

        • immibis a year ago

          It seems that the answer to the question was "memory". Stack allocations, presumably. You have answered by telling us that virtual threads are better than real threads because real threads suck, but you didn't say why they suck or why virtual threads don't suck in the same way.

          • mike_hearn a year ago

            Real threads don't suck but they pay a price for generality. The kernel doesn't know what software you're going to run, and there's no standards for how that software might use the stack. So the kernel can't optimize by making any assumptions.

            Virtual threads are less general than kernel threads. If you use a virtual thread to call out of the JVM you lose their benefits, because the JVM becomes like the kernel and can't make any assumptions about the stack.

            But if you are running code controlled by the JVM, then it becomes possible to do optimizations (mostly stack related) that otherwise can't be done, because the GC and the compiler and the threads runtime are all developed together and work together.

            Specifically, what HotSpot can do moving stack frames to and from the heap very fast, which interacts better with the GC. For instance if a virtual thread resumes, iterates in a loop and suspends again, then the stack frames are never copied out of the heap onto the kernel stack at all. Hotspot can incrementally "pages" stack frames out of the heap. Additionally, the storage space used for a suspended virtual thread stack is a lot smaller than a suspended kernel stack because a lot of administrative goop doesn't need to be saved at all.

          • brabel a year ago

            OS Threads do not suck, they're great. But they are expensive to create as they require a syscall, and they're expensive to maintain as they consume quite a bit of memory just to exist, even if you don't need it (due to how they must pre-allocate a stack which apparently is around 2MB initially, and can't be made smaller as in most cases you will need even more, so it would make most cases worse).

            Virtual Threads are very fast to create and allocate only the memory needed by the actual call stack, which can be much less than for OS Threads.

            Also, blocking code is very simple compared to the equivalent async code. So using blocking code makes your code much easier to follow. Check out examples of reactive frameworks for Java and you will quickly understand why.

            • kllrnohj a year ago

              > and they're expensive to maintain as they consume quite a bit of memory just to exist, even if you don't need it (due to how they must pre-allocate a stack which apparently is around 2MB initially,

              I'm not familiar with windows, but this certainly isn't the case on Linux. It only costs 2mb-8mb of virtual address space, not actual physical memory. And there's no particular reason to believe the JVM can have a list of threads and their states more efficiently than the kernel can.

              All you really save is the syscall to create it and some context switching costs as the JVM doesn't need to deal with saving/restoring registers as there's no preemption.

              The downside though is you don't have any preemption, which depending on your usage is a really fucking massive downside.

              • brabel a year ago

                > And there's no particular reason to believe the JVM can have a list of threads and their states more efficiently than the kernel can.

                Of course there is. The JVM is able to store the current stack for the Thread efficiently in the pre-allocated heap. Switching execution between Virtual Threads is very cheap. Experiments show you can have millions of VTs, but only a few thousand OS Threads.

                I don't know why you think preemption is a big downside?! The JVM only suspends a Thread at safe points and those are points where it knows exactly when to resume. I don't believe there's any downsides at all.

              • Someone a year ago

                > The downside though is you don't have any preemption, which depending on your usage is a […] massive downside.

                Nobody is taking OS threads away, so you can choose to use them when they better fit your use case.

      • jmaker a year ago

        Briefly: The cost of spawning schedulable entities, memory and the time to execution. Virtual threads, i.e., fibers, entertain lightweight stacks. You can spawn as many as you like immediately. Your runtime system won’t go out of memory as easily. In addition, the spawning happens much faster in user space. You’re not creating kernel threads, which is a limited and not cheap resource, whence the pooling you’re comparing it to. With virtual threads you can do thread per request explicitly. It makes most sense for IO-bound tasks.

      • davidgay a year ago

        A thread per request has a high risk of overcommitting on CPU use, leading to a different set of problems. Virtual threads are scheduled on a fixed-size (based on number of cores) underlying (non-virtual) thread pool to avoid this problem.

        • immibis a year ago

          Why can't virtual threads overcommit CPU use? If I have 4 CPUs and 4000 virtual threads running CPU-bound code, is that not overcommit? A system without overcommit would refuse to create the 5th thread.

          • detinho a year ago

            I think parent is saying overcommit with OS threads. 4k requests = 4k OS threads. That would lead to the problems parent is talking about.

            • immibis a year ago

              Why wouldn't 4k virtual threads lead to the same problems?

              • troupo a year ago

                Because they don't create 4k real threads, and can be scheduled on n=CPU Cores OS threads

                • immibis a year ago

                  4k "real" threads can also be scheduled on 4 CPU cores. What's the difference?

                  • troupo a year ago

                    Real threads are extremely expensive both in terms of memory and CPU time compared to virtual threads. I think the main issue is not even that but context switching when switching threads which is also very expensive.

                    Virtual threads usually require significantly fewer resources to spawn and run. And, if the underlying system is implemented with them in mind, they can use fewer context switches, and possibly even fewer cache misses etc.

      • gifflar a year ago

        This article nicely describes the differences between threads and virtual threads: https://www.infoq.com/articles/java-virtual-threads/

        I think it’s definitely worth a read.

      • twic a year ago

        The memory overhead of threads.

  • fzeindl a year ago

    Does it shake out to any real advantage?

    To put it shortly: Writing single-threaded blocking code is far easier for most people and has many other benefits, like more understandable and readable programs: https://www.youtube.com/watch?v=449j7oKQVkc

    The main reason why non-blocking IO with it's style of intertwining concurrency and algorithms came along is that starting a thread for every request was too expensive. With virtual threads that problem is eliminated so we can go back to writing blocking code.

    • nlitened a year ago

      > is far easier for most people

      I’d say that writing single-threaded code is far easier for _all_ people, even async code experts :)

      Also, single-threaded code is supported by programming language facilities: you have a proper call stack, thread-local vars, exceptions bubbling up, structured concurrency, simple resource management (RAII, try-with-resources, defer). Easy to reason and debug on language level.

      Async runtimes are always complicated, filled with leaky abstractions, it’s like another language that one has to learn in addition, but with a less thought-out, ad-hoc design. Difficult to reason and debug, especially in edge cases

      • bheadmaster a year ago

        > Async runtimes are always complicated, filled with leaky abstractions, it’s like another language that one has to learn in addition, but with a less thought-out, ad-hoc design. Difficult to reason and debug, especially in edge cases

        Async runtimes themselves are simply attempts to bolt-on green threads on top of a language that doesn't support them on a language level. In JavaScript, async/await uses Promises to enable callback-code to interact with key language features like try/catch, for/while/break, return, etc. In Python, async/await is just syntax sugar for coroutines, which are again just syntax sugar for CPS-style classes with methods split at each "yield". Not sure about Rust, but it probably also uses some Rust macro magic to do something similar.

        • derriz a year ago

          Indeed. Async runtimes/sytles are attempts to provide a more readable/useable syntax for CPS[1]. CPS originally had nothing to do with blocking/non-blocking or multi-threading but arose as a technique to structure compiler code.

          Its attraction for non-blocking coding is that it allows hiding the multi-threaded event dispatching loop. But as the parent comment suggests, this abstraction is extremely leaky. And in addition, CPS in non-functional languages or without syntactic sugar has poor readability. Improving the readability requires compiler changes in the host language - so many languages have added compiler support to further hide the CPS underpinnings of their async model.

          I've always felt this was a big mistake in our industry - all this effort not only in compilers but also in debuggers/IDE - building on a leaky abstraction. Adding more layers of leaky abstractions has only made the issue worse. Async code, at first glance, looks simple but is a minefield for inexperienced/non-professional software engineers.

          It's annoying that Rust switched to async style - the abstraction leakiness immediately hits you, as the "hidden event dispatching loop" remains a real dependency even if it's not explicit in the code. Thus libraries using asycn cannot generally be used together although last time i looked, tokio seems to have become the de-facto standard.

          [1] https://en.wikipedia.org/wiki/Continuation-passing_style

          • kaba0 a year ago

            I absolutely agree that the virtual/green thread style is much better, more ergonomic, less likely to be correct, etc, but I can’t fault Rust’s choice, given it being a low-level language without a fat runtime, making it possible to be called into from other runtimes. What the JVM does is simply not possible that way.

        • logicchains a year ago

          >Async runtimes themselves are simply attempts to bolt-on green threads on top of a language that doesn't support them on a language level.

          Haskell supports async code while also supporting green threads on a language level, and the async code has most of the same issues as async code in any other languages.

          • whateveracct a year ago

            What problems exactly? Haskell has a few things that imo it does better than most languages in this area:

            - All IO is non-blocking by default.

            - FFI support for interruptible.

            - Haskell threads can be preempted externally - this allows you to ensure they never leak. Vs a goroutine that can just spin forever if it doesn't explicitly yield.

            - There are various stdlib abstractions for building concurrent programs in a compositional way.

            • kbolino a year ago

              > Haskell threads can be preempted externally - this allows you to ensure they never leak. Vs a goroutine that can just spin forever if it doesn't explicitly yield.

              Goroutines are preemptible by the runtime (since https://go.dev/doc/go1.14#runtime) but they're still not addressable or killable through the language itself.

              • whateveracct a year ago

                The GHC runtime has lots of cool concurrency features.

                Async exceptions as a way to pass messages (and kill threads!)

                Allocation limits for threads.

                Software Transactional Memory.

        • dwattttt a year ago

          > Not sure about Rust, but it probably also uses some Rust macro magic to do something similar.

          Much the same as JavaScript I understand, but no macros; the compiler turns them into Futures that can be polled

      • xxs a year ago

        >I’d say that writing single-threaded code is far easier for _all_ people, even async code experts :)

        While 'async' is just a name, underneath it's epoll - and the virtual threads would not perform better than a proper NIO (epoll) server. I dont consider myself an 'async expert' but I have my share of writing NIO code (dare say not terrible at all)

        • kaba0 a year ago

          Virtual threads literally replace the “blocking” IO call issued by the user, by a proper NIO call, mounting the issuer virtual thread when it signals.

    • chipdart a year ago

      > To put it shortly: Writing single-threaded blocking code is far easier for most people and has many other benefits, like more understandable and readable programs:

      I think you're missing the whole point.

      The reason why so many smart people invest their time on "virtual threads" is developer experience. The goal is to turn writing event-driven concurrent code into something that's as easy as writing single-threaded blocking code.

      Check why C#'s async/await implementation is such a huge success and replaced all past approaches overnight. Check why node.js is such a huge success. Check why Rust's async support is such a hot mess. It's all about developer experience.

      • kitd a year ago

        I think he was making the same point as you: writing for virtual threads is like writing for single-threaded blocking code.

      • written-beyond a year ago

        As someone who has written multiple productions services with Async Rust, that are under constant load, I disagree. I've had team members who have only written in C, pick up and start building very comprehensive and performant services in Rust in a matter of days.

        How do you developers spew such strong opinions without taking a moment to think about what you're about to say. Rust cannot be directly compared to C#, Java or even Go.

        You don't get a runtime or a GC with rust. The developer experience is excellent, you get a lot of control over everything you're building with it. Yes it's not as magical as languages and runtimes like you've mentioned, but the fact that I can at anytime rip those abstractions off and make my service extremely lightweight and performant is not something those languages will allow you to do.

        And this is coming from someone who's written non blocking services before Async rust was a thing with just MIO.

        The very fact Rust gets mentioned between these languages should be a tribute to the efforts of it's maintainers and core team. The amount of tooling and features they've added into the language gives developers of every realm liberty to try and build what they want.

        Honestly, you can hold whatever opinion you want on any language but your comparison really doesn't make sense.

    • Nullabillity a year ago

      > To put it shortly: Writing single-threaded blocking code is far easier for most people. [snip] With virtual threads that problem is eliminated so we can go back to writing blocking code.

      This is the core misunderstanding/dishonesty behind the Loom/Virtual Threads hype. Single-threaded blocking code is easy, yes. But that ease comes from being single-threaded, not from not having to await a few Futures.

      But Loom doesn't magically solve the threading problem. It hides the Futures, but that just means that you're now writing a multi-threaded program, without the guardrails that modern Future-aware APIs provide. It's the worst of all worlds. It's the scenario that gave multi-threading such a bad reputation for inscrutable failures in the first place.

  • chipdart a year ago

    > What is the virtual thread / event loop pattern seeking to optimize? Is it context switching?

    Throughput.

    Some workloads are not CPU-bound or memory-bound, and spend the bulk of their time waiting for external processes to make data available.

    If your workloads are expected to stay idle while waiting for external events, you can switch to other tasks while you wait for those external events to trigger.

    This is particularly convenient if the other tasks you're hoping to run are also tasks that are bound to stay idle while waiting for external events.

    One of the textbook scenarios that suits this pattern well is making HTTP requests. Another one is request handlers, such as the controller pattern used so often in HTTP servers.

    Perhaps the poster child of this pattern is Node.js. It might not be the performance king and might be single-threaded, but it features in the top spots in performance benchmarks such as TechEmpower's. Node.js is also highly favoured in function-as-a-service applications, as it's event-driven architecture is well suited for applications involving a hefty dose of network calls running on memory- and CPU-constrained systems.

  • kevingadd a year ago

    One of the main reasons to do virtual threads is that it allows you to write naive "thread per request" code and still scale up significantly without hitting the kind of scaling limits you would with OS threads.

    • hashmash a year ago

      The problem with the naïve design is that even with virtual threads, you risk running out of (heap) memory if the threads ever block. Each task makes a bit of progress, allocates some objects, and then lets another one do the same thing.

      With virtual threads, you can limit the damage by using a semaphore, but you still need to tune the size. This isn't much different than sizing a traditional thread pool, and so I'm not sure what benefit virtual threads will really have in practice. You're swapping one config for another.

      • dikei a year ago

        > The problem with the naïve design is that even with virtual threads, you risk running out of (heap) memory if the threads ever block.

        The key with virtual threads is they are so light weight that you can have thousands of them running concurrently: even when they block for I/O, it doesn't matter. It's similar to light weight coroutine in other language like Go or Kotlin.

      • imtringued a year ago

        What you are complaining about has nothing to do with thread pools or virtual threads. You're pointing out the fact that more parallelism will also need more hardware and that a finite hardware budget will need a back pressure strategy to keep resource consumption within a limit. While you might be correct that "sizing a traditional thread pool" is a back pressure strategy that can be applied to virtual threads, the problem with it is that IO bound threads will prevent CPU bound threads from making progress. You don't want to apply back pressure based on the number of tasks. You want back pressure to be in response to resource utilization, so that enough tasks get scheduled to max out the hardware.

        This is a common problem with people using Java parallel streams, because they by default share a single global thread pool and the way to use your own thread pool is also extremely counterintuitive, because it essentially relies on some implicit thread local magic to choose to distribute the stream in the thread pool that the parallel stream was launched on, instead of passing it as a parameter.

        It would be best if people came up with more dynamic back pressure strategies, because this is a more general problem that goes way beyond thread pools. In fact, one of the key problems of automatic parallelization is deciding at what point there is too much parallelization.

      • initplus a year ago

        The benefits from virtual threads come from the simple API that it presents to the programmer. It's not a performance optimization.

        • hashmash a year ago

          But that same benefit was always available with platform threads -- a simple API. What is the real gain by using virtual threads? It's either going to be performance or memory utilization.

          • groestl a year ago

            It's combining the benefits from async models (state machines separated from os threads, thus more optimal for I/O bound workload), with the benefits from proper threading models (namely the simpler human interface).

            Memory utilization & performance is going to be similar to the async callback mess.

            • hashmash a year ago

              Why is an async model better than using OS threads for an I/O bound workload? The OS is doing async stuff internally and shielding the complexity with threads. With virtual threads this work has shifted to the JVM. Can the JVM do threads better than the OS?

              • mrsilencedogood a year ago

                "Why is an async model better than using OS threads for an I/O bound workload?"

                Because evented/callback-driven code is a nightmare to reason about and breaks lots of very basic tools, like the humble stack trace.

                Another big thing for me is resource management - try/finally don't work across callback boundaries, but do work within a virtual thread. I recently ported a netty-based evented system to virtual threads and a very long-standing issue - resource leakage - turned into one very nice try/finally block.

              • zokier a year ago

                > Can the JVM do threads better than the OS?

                Yes. The JVM has far more opportunities for optimizing threads because it doesn't need to uphold 50 years of accumulated invariants and compatibility that current OSes do, and JVM has more visibilty on the application internals.

              • adgjlsfhk1 a year ago

                it can do a much better job because there isn't a security boundary. OS thread scheduling requires sys calls and invalidate a bunch of cache to prevent timing leaks

          • CrimsonRain a year ago

            Create 100k platform threads and you'll find out.

          • lichtenberger a year ago

            Throughput. The code can be "suspended" on a blocking call (I/O, where the platform thread usually is wasted, as the CPU has nothing to do during this time). So, the platform thread can do other work in the meantime.

      • packetlost a year ago

        Yeah, and it's generally good to be RAM limited instead of CPU, no? The alternative is blowing a bunch of time on syscalls and OS scheduler overhead.

        Also the virtual threads run on a "traditional" thread pool to my understanding, so you can just tweak the number of worker threads to cap the total concurrency.

        The benefit is it's overall more efficient (in the general case) and lets you write linear blocking code (as opposed to function coloring). You don't have to use it, but it's nice that it's there. Now hopefully Valhalla actually makes it in eventually

        • hashmash a year ago

          The OS scheduler is still there (for the carrier threads), but now you've added on top of that FJ pool based scheduler overhead. Although virtual threads don't have the syscall overhead when they block, there's a new cost caused by allocating the internal continuation object, and copying state into it. This puts more pressure on the garbage collector. Context switching cost due to CPU cache thrashing doesn't go away regardless of which type of thread you're using.

          I've not yet seen a study that shows that virtual threads offer a huge benefit. The Open Liberty study suggests that they're worse than the existing platform threads.

          • zokier a year ago

            > The OS scheduler is still there (for the carrier threads), but now you've added on top of that FJ pool based scheduler overhead.

            Ideally carrier threads would be pinned to isolated cpu cores, which removes most aspects of OS scheduler from the picture

          • zokier a year ago

            > I've not yet seen a study that shows that virtual threads offer a huge benefit.

            Not exactly Java virtual threads, but a study on how userland threads beat kernel threads.

            https://cs.uwaterloo.ca/~mkarsten/papers/sigmetrics2020.html

            For quick results, check figures 11 and 15 from the (preprint) paper. Userland threads ("fred") have ~50% higher throughput while having orders of magnitude better latency at high load levels, in a real-world application (memcached).

          • packetlost a year ago

            The study says there's surprising performance problems with Java's virtual thread implementation. Their test of throughput was also hilarious, they put 2000 OS threads vs 2000 virtual threads: most of the time OS threads don't start falling apart until 100k+ threads. You can architect an application such that you can handle 200k simultaneous connections using platform-thread-per-core, but it's harder to reason about than the linear, blocking code that virtual threads and async allow for.

            > Context switching cost due to CPU cache thrashing doesn't go away regardless of which type of thread you're using.

            Except it's not a context switch? You're jumping to another instruction in the program, one that should be very predictable. You might lose your cache, but it will depend on a ton of factors.

            > there's a new cost caused by allocating the internal continuation object, and copying state into it.

            This is more of a problem with the implementation (not every virtual thread language does it this way), but yeah this is more overhead on the application. I assume there's improvements that can be made to ease GC pressure, like using object pools.

            Usually virtual threads are a memory vs CPU tradeoff that you typically use in massively concurrent IO-bound applications. Total throughput should take over platform threads with hundreds of thousands of connections, but below that probably perform worse, I'm not that surprised by that.

            • electroly a year ago

              > Except it's not a context switch? You're jumping to another instruction in the program, one that should be very predictable. You might lose your cache, but it will depend on a ton of factors.

              Java virtual threads are stackful; they have to save and restore the stack every time they mount a different virtual thread to the platform thread. They do this by naive[0] copying of the stack out to a heap allocation and then back again, every time. That's clearly a context switch that you're paying for; it's just not in the kernel. I believe this is what the person you're replying to is talking about.

              [0] Not totally naive. They do take some effort to copy only subsets of the stack if they can get away with it. But it's still all done by copies. I don't know enough to understand why they need to copy and can't just swap stack pointers. I think it's related to the need to dynamically grow the stack when the thread is active vs. having a fixed size heap allocation to store the stack copy.

      • immibis a year ago

        Async does exactly the same by the way.

  • pron a year ago

    No, it optimises hardware utilisation by simply allowing more tasks to concurrently make progress. This allows throughput to reach the maximum the hardware allows. See https://youtu.be/07V08SB1l8c.

  • duped a year ago

    imo the biggest difference between "virtual" threads in a managed runtime and "os" threads is that the latter uses a fixed size stack whereas the former is allowed to resize, it can grow on demand and shrink under pressure.

    When you spawn an OS thread you are paying at worst the full cost of it, and at best the max depth seen so far in the program, and stack overflows can happen even if the program is written correctly. Whereas a virtual thread can grow the stack to be exactly the size it needs at any point, and when GC runs it can rewrite pointers to any data on the stack safely.

    Virtual/green/user space threads aka stackful coroutines have proven to be an excellent tool for scaling concurrency in real programs, while threads and processes have always played catchup.

    > “something” will block eventually no matter what…

    The point is to allow everything else to make progress while that resource is busy.

    ---

    At a broader scale, as a programming model it lets you architect programs that are designed to scale horizontally. With the commodization of compute in the cloud that means it's very easy to write a program that can be distributed as i/o demand increases. In principle, a "virtual" thread could be spawned on a different machine entirely.

  • frevib a year ago

    They indeed optimize thread context switching. Taking the thread on and off the CPU is becoming expensive when there are thousands of threads.

    You are right that everything blocks, even when going to L1 cache you have to wait 1 nanoseconds. But blocking in this context means waiting for “real” IO like a network request or spinning disk access. Virtual threads take away the problem that the thread sits there doing nothing for a while as it is waiting for data, before it is context switched.

    Virtual threads won’t improve CPU-bound blocking. There the thread is actually occupying the CPU, so there is no problem of the thread doing nothing as with IO-bound blocking.

  • kbolino a year ago

    The hardware now is just as concurrent/parallel as the software. High-end NVMe SSDs and server-grade NICs can do hundreds to thousands of things simultaneously. Even if one lane does get blocked, there are other lanes which are open.

  • lmm a year ago

    > I remember saying “something” will block eventually no matter what… anything from the buffer being full on the NIC to your cpu being at anything less than 100%.

    Nope. You can go async all the way down, right to the electrical signals if you want. We usually impose some amount of synchronous clocking/polling for sanity, at various levels, but you don't have to; the world is not synchronised, the fastest way to respond to a stimulus will always be to respond when it happens.

    > Does it shake out to any real advantage?

    Of course it does - did you miss the whole C10K discussions 20+ years ago? Whether it matters for your business is another question, but you can absolutely get a lot more throughput by being nonblocking, and if you're doing request-response across the Internet you generally can't afford not to.

bberrry a year ago

I don't understand these benchmarks at all. How could it possibly take virtual threads 40-50 seconds to reach maximum throughput when getting a number of tasks submitted at once?

LinXitoW a year ago

From my very limited exposure to virtual threads and the older solution (thread pools), the biggest hurdle was the extensive use of ThreadLocals by most popular libraries.

In one project I had to basically turn a reactive framework into a one thread per request framework, because passing around the MDC (a kv map of extra logging information) was a horrible pain. Getting it to actually jump ship from thread to thread AND deleting it at the correct time was basically impossible.

Has that improved yet?

  • joshlemer a year ago

    I faced this issue once. I solved it by creating a wrapping/delegating Executor, which would capture the MDC from the scheduling thread at schedule-time, and then at execute-time, set the MDC for the executing thread, and then clear the MDC after the execution completes. Something like...

        class MyExecutor implements Executor {
            private final Executor delegate;
            public MyExecutor(Executor delegate) {
                this.delegate = delegate;
            }
            @Override
            public void execute(@NotNull Runnable command) {
                var mdc = MDC.getCopyOfContextMap();
                delegate.execute(() -> {
                    MDC.setContextMap(mdc);
                    try {
                        command.run();
                    } finally {
                        MDC.clear();
                    }
                });
            }
        }
  • vbezhenar a year ago

    What do you mean by hurdle? ThreadLocals work just fine with virtual threads.

    • brabel a year ago

      It's not recommended though.

      See https://openjdk.org/jeps/429

      If you keep ThreadLocal variables, they get inherited by child Threads. If you make many thousands of them, the memory footprint becomes completely unacceptable. If the memory used by ThreadLocal variables is large, it also makes it more expensive to create new Threads (virtual or not), so you lose most advantages of Virtual Threads by doing that.

      • bberrry a year ago

        I don't think that's correct. ThreadLocals should behave just like on regular OS threads, the difference is that you can suddenly create millions of them.

        You used to be able to depend on OS threads getting reused because you were pooling them. You can do the same with virtual threads if you wish and you will get the same behavior. The difference is we ought to spawn new threads per task now.

        Side note, you have to specifically use InheritableThreadLocal to get the inheritance behavior you speak of.

  • bberrry a year ago

    If you are already in a reactive framework, why would you change to virtual threads? Those frameworks pool threads and have their own event loop so I would say they are not suitable for virtual thread migration.

    • brabel a year ago

      Yes, if you're happy with the reactive frameworks there's no reason to migrate. Most people, however, would love to remove their complexities from their code bases. Virtual Threads are much, much easier to program with. There's downsides, like not being able to easily limit concurrency, having to implement your own timeout mechanisms etc. but that will probably be provided by a common lib sooner or later which hopefully provides identical features to reactive frameworks, while being much, much simpler.

      • munksbeer a year ago

        I've not looked too deeply. We use the eventloop model, and we're guaranteed that data is only mutated by a single unit of work at a time which means you don't need to use any concurrent data types, volatile etc. This is great for micro performance.

        Does the same apply to virtual threads?

        Edit: I think I answered my own question. Java virtual threads have the same memory model as regular java threads so yes, I need to use the same semantics. That rules replacing the eventloop model for us.

davidtos a year ago

I did some similar testing a few days ago[1]. Comparing platform threads to virtual threads doing API calls. They mention the right conditions like having high task delays, but it also depends on what the task is. Threads.sleep(1) performs better on virtual threads than platform threads but a rest call taking a few ms performs worse.

[1] https://davidvlijmincx.com/posts/virtual-thread-performance-...

taspeotis a year ago

My rough understanding is that this is similar to async/await in .NET?

It’s a shame this article paints a neutral (or even negative) experience with virtual threads.

We rewrote a boring CRUD app that spent 99% of its time waiting the database to respond to be async/await from top-to-bottom. CPU and memory usage went way down on the web server because so many requests could be handled by far fewer threads.

  • jsiepkes a year ago

    > My rough understanding is that this is similar to async/await in .NET?

    Well somewhat but also not really. They are green threads like async/await, but it's use is more transparent, unlike async/await.

    So there are no special "async methods". You just instantiate a "VirtualThread" where you normally instantiate a (kernel) "Thread" and then use it like any other (kernel) thread. This works because for example all blocking IO API will be automatically converted to non-blocking IO underwater.

  • peteri a year ago

    It's a different model. Microsoft did work on green threads a while ago and decided against continuing.

    Links:

    https://github.com/dotnet/runtimelab/issues/2398

    https://github.com/dotnet/runtimelab/blob/feature/green-thre...

    • pjmlp a year ago

      It should be pointed out, that the main reason they didn't go further was because of added complexity in .NET, when async/await already exists.

      > Green threads introduce a completely new async programming model. The interaction between green threads and the existing async model is quite complex for .NET developers. For example, invoking async methods from green thread code requires a sync-over-async code pattern that is a very poor choice if the code is executed on a regular thread.

      Also to note that even the current model is complex enough to warrant a FAQ,

      https://devblogs.microsoft.com/dotnet/configureawait-faq

      https://github.com/davidfowl/AspNetCoreDiagnosticScenarios/b...

      • neonsunset a year ago

        This FAQ is a bit outdated in places, and is not something most users should worry about in practice.

        JVM Green Threads here serve predominantly back-end scenarios, where most of the items on the list are not of concern. This list also exists to address bad habits that carried over from before the tasks were introduced, many years ago.

        In general, the perceived want of green threads is in part caused by misunderstanding of that one bad article about function coloring. And that one bad article about function coloring also does not talk about the way you do async in C#.

        Async/await in C# in back-end is a very easy to work with model with explicit understanding where a method returns an operation that promises to complete in the future or not, and composing tasks[0] for easy (massive) concurrency is significantly more idiomatic than doing so with green threads or completable futures that existed in Java before these. And as evidenced by adoption of green threads by large scale Java projects, turns out the failure modes share similarities except green threads end up violating way more expectations and the code author may not have any indication or explicit mechanism to address this, like using AsyncLocal.

        Also one change to look for is "Runtime Handled Tasks" project in .NET that will replace Roslyn-generated state machine code with runtime-provided suspension mechanism which will only ever suspend at true suspension points where task's execution actually yields asynchronously. So far numbers show at least 5x decrease in overhead, which is massive and will bring performance of computation heavy async paths in line with sync ones:

        https://github.com/dotnet/runtimelab/blob/feature/async2-exp...

        Note that you were trivially able to have millions of scheduled tasks even before that as they are very lightweight.

        [0]: e.g. sending requests in parallel is just this

            using var http = new HttpClient() {
                BaseAddress = new("https://news.ycombinator.com/news")
            };
        
            var requests = Enumerable
                .Range(1, 4)
                .Select(n => $"?p={n}")
                .Select(http.GetStringAsync);
        
            var pages = await Task.WhenAll(requests);
        • no_wizard a year ago

          I take your point about the aforementioned article[0][1] being a popular reference when discussing async / await (and to a lesser extent, async programming in modern languages more generally) I think its popularity is highlighting the fact that it is a pain point for folks.

          Take for instance Go. It is well liked in part, because its so easy to do concurrency with goroutines, and they're easy to reason about, easy to call, easy to write, and for how much heavy weight they're lifting, relatively simple to understand.

          The reason Java is getting alot of kudos here for their implementation of green threads is exactly the same reason people talk about Go being an easy language to use for concurrency: It doesn't gate code behind specialized idioms / syntax / features that are only specific to asynchronous work. Rather, it largely utilizes the same idioms / syntax as synchronous code, and therefore is easier to reason about, adopt, and ultimately I think history is starting to show, to use.

          Java is taking an approach paved by Go, and ultimately I think its the right choice, because having worked extensively with C# and other languages that use async / await, there are simply less footguns for the average developer to hit when you reduce the surface area of having to understand async / sync boundaries.

          [0]: https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...

          [1]: HN discussion: https://news.ycombinator.com/item?id=8984648

          • neonsunset a year ago

            Green Threads increase the footgun count as methods which return tasks are rather explicit about their nature. The domain of async/await is well-studied, and enables crucial patterns that, like in my previous example, Green Threads do nothing to improve the UX of in any way. This also applies to Go approach which expects you to use Channels, which have their own plethora of footguns, even for things trivially solved by firing off a couple of tasks and awaiting their result. In Go, you are also expected to use explicit synchronization primitives for trivial concurrent code that require no cognitive effort in C# whatsoever. C# does have channels that work well, but turns out you rarely need them when you can just write simple task-based code instead.

            I'm tired of this, that one article is bad, and incorrect, and promotes straight-up harmful intuition and probably sets the industry in terms of concurrent and asynchronous programming back by 10 years in the same way misinterpreting Donald Knuth's quote did in terms of performance.

            • kaba0 a year ago

              That’s a very simplistic view. Especially that java does/will provide “structured concurrency” as something analogous to structured control flow, vs gotos.

              Also, nothing prevents you from building your own, more limited but safer (the two always come together!) abstraction on top, but you couldn’t express Loom on async as the primitive.

        • ffsm8 a year ago

          I don't think that this would be a good showcase for Virtual Threads. The "async" API for Java is CompletableFutures, right? thats been stable for something like 10 years, so no real change since Java 8.

          You'd jsut have to define a ThreadPool with n Threads before, where each request would've blocked one pending thread. Now it just keeps going.

          So your equivalent Java example should've been something like this, but again: the completeable futures api is pretty old at this point.

              @HttpExchange(value = "https://news.ycombinator.com")
              interface HnClient {
                  @GetExchange("news?p={page}")
                  CompletableFuture<String> getNews(@PathVariable("page") Integer page);
              }
          
              @RequiredArgsConstructor
              @Service
              class HnService {
                  private final HnClient hnClient;
                  List<String> getNews() {
                      var requests = IntStream.rangeClosed(1, 4)
                                              .boxed().map(hnClient::getNews).toList();
                      return requests.stream().map(CompletableFuture::join).toList();
                  }
              }
          • vips7L a year ago

            Structured concurrency is still being developed: https://openjdk.org/jeps/453

            Also, I wouldnt consider that the equivalent Java code. That is all Spring and Lombok magic. Just write the code and just use java.net.HttpClient.

            • ffsm8 a year ago

              > and just use java.net.HttpClient.

              No.

              • no_wizard a year ago

                it might be obvious to others, but why the 'No'?

                • vips7L a year ago

                  The standard http client doesn’t have as great of UX as other community libs. Most of us (including me) don’t like to use it.

                  That being said, imo you can’t call something equivalent when doing a bunch of spring magic. This disregards that OPs logic isn’t equivalent at all. It waits for each future 1 by 1 instead of doing something like CompletableFuture.allOf or in JS: Promise.all.

      • jayd16 a year ago

        It would break a lot of the native interop and UI code devx of the language. Java was never as nice in those categories so it had less to lose going this path.

  • devjab a year ago

    > My rough understanding is that this is similar to async/await in .NET?

    Not really. What C# does is sort of similar but it has the disadvantages of splitting your code ecosystem into non-blocking/blocking code. This means you can “accidentally” start your non-blocking code. Something which may cause your relatively simple API to consume a ridiculous amount of resources. It also makes it much more complicated to update and maintain your code as it grows over the years. What is perhaps worse is that C# lacks an interruption model.

    Java’s approach is much more modern but then it kind of had to be because the JVM already supported structured concurrency from Kotlin. Which means that Java’s “async/await” had to work in a way which wouldn’t break what was already there. Because Java is like that.

    I think you can sort of view it as another example of how Java has overtaken C# (for now), but I imagine C# will get an improved async/await model in the next couple of years. Neither approach is something you would actually chose if concurrency is important to what you build and you don’t have a legacy reason to continue to build on Java/C# . This is because Go or Erlang would be the obvious choice, but it’s nice that you at least have the option if your organisation is married to a specific language.

    • za3faran a year ago

      I would not argue that golang is the obvious choice for concurrency. Java's approach is actually superior to golang's. It takes it a step further by offering structured concurrency[1].

      Kotlin's design had no bearing on Java's or the JVM's implementation.

      C# has an interruption model through CancellationToken as far as I'm aware.

      [1] https://openjdk.org/jeps/453

    • jayd16 a year ago

      It's foolish to say that green threads are strictly better and ignore async/await as something outdated. It can do a lot that green threads can't.

      For example, you can actually share a thread with another runtime.

      Cooperative threading allows for implicit critical sections that can be cumbersome in preemptive threading.

      Async/await and virtual threads are solving different problems.

      > What is perhaps worse is that C# lacks an interruption model

      Btw, You'd just use OS threads if you really needed pre-emptively scheduled threads. Async tasks run on top of OS threads so you get both co-opertive scheduling within threads and pre-emptive scheduling of threads onto cores.

      • devjab a year ago

        > It's foolish to say that green threads are strictly better and ignore async/await as something outdated

        I’m not sure I said outdated, but I can see what you mean by how I called Javas approach “more modern”. What I should have called Javas approach was “correctly designed”.

        C#’s async/await isn’t all terrible as you point out, but it’s designed wrong from the bottom up because computation should always be blocking by default. The fact that you can accidentally start running your code asynchronous is just… Aside from trapping developers with simple mistakes, it’s also part of what has lead to the ecosystem irrecoverably being split into two.

        I was actually a little surprised to see Microsoft make their whole .Net to .Net core without addressing some of the glaring issues with it, when that massive disruption process uprooted everything anyway.

        • jayd16 a year ago

          What do you think about the Structured Concurrency library Java is working with things like fork() and join()? Is that incorrectly designed? Why do you think there's a call for that if virtual threads serves every use case?

    • troupo a year ago

      Erlang, not Go, should be the obvious choice for concurrency, but it's impossible to retrofit Erlang's concurrency onto existing systems.

      • toast0 a year ago

        As an Erlang person, from reading about Java's Virtual Threads, it feels like it should get a significant portion of the Erlang concurrency story.

        With virtual threads, it seems like if you don't hit gotchas, you can spawn a thead, and run straight through blocking code and not worry about too many threads, etc. So you could do thread per connection/user chat servers and http servers and what not.

        Yes, it's still shared memory, so you can miss out on the simplifying effect of explicit communication instead of shared memory communication and how that makes it easy to work with remote and local communication partners. But you can build a mailbox system if you want (it's not going to be as nice as built in one, of course). I'm not sure if Java virtual threads can kill each other effectively, either.

        • troupo a year ago

          Erlang's concurrency story isn't green threads.

          It's (with caveats, of course):

          - a thread crashing will not bring the system down

          - a thread cannot hog all processing time as the system ensures all threads get to run. The entire system is re-entrant and execution of each thread can be suspended to let other threads continue

          - all CPU cores can and will be utilized transparently to the user

          - you can monitor a thread and if it crashes you're guaranteed to receive info on why and how it crashed

          - immutable data structures play a huge part of it, of course, but the above is probably more important

          That's why Go's concurrency is not that good, actually. Goroutines are not even half-way there: an error in a goroutine can panic-kill your entire program, there are no good ways to monitor them etc.

          • kbolino a year ago

            Neither an error nor a recovered-from panic will cause a Go program to crash; only an unrecovered panic does that.

            The bigger problem with Go in this regard is how easy it is to cause a panic thanks to nil.

            • troupo a year ago

              In Erlang even a nil will not lead to an unrecovered panic (if it happens in the process aka green thread).

              Go made half a step in the right direction with goroutines, but never committed fully

              • kbolino a year ago

                Each has its tradeoffs. I had a case that cropped up more than once where RabbitMQ kept on trucking even though the process for an important queue had crashed; had it propagated all the way to the server itself it may have been easier to diagnose and fix (I'm assuming there's something like defer or finally in Erlang to ensure the mnesia database was synced properly on exit). Instead, I had to monitor for this condition and periodically run some command-line trickery to fix it (without ever really knowing why it happened). This was years ago, maybe RabbitMQ handles that better now.

                The Go authors are adamant that goroutines not be addressable (from without) or identifiable (from within). This is diametrically opposed to Erlang, where processes are meant to be addressed/identified. I can't say I've ever found a case where a problem couldn't be solved due to this constraint in Go, but it does complicate some things.

      • morsch a year ago

        Isn't that Akka?

    • szundi a year ago

      Maybe C# is going to have a new asynv await model but the fragmentation of libs and codes cannot be undone probably.

      Java has the power that they make relatively more decisions about the language and the libs that they don’t have to fix later. That’s a great value if you’re not building throw-away software but SaaS or something that has to live long.

    • kaba0 a year ago

      > This is because Go or Erlang would be the obvious choice

      Why go? It has a quite anemic standard library for concurrent data structures, compared to java and is a less expressive , and arguably worse language on any count, verbosity included.

    • delusional a year ago

      From what I recall, and this is a while ago so bare with me, Java Virtual Threads still have a lot of pitfalls where the promise of concurrency isn't really fulfilled.

      I seem to remember that is was some pretty basic operations (like maybe read or something) that caused the thread not to unmount, and therefore just block the underlying os thread. At that point you've just invented the world's most complicated thread pool.

      • mike_hearn a year ago

        Reading from sockets definitely works. It'd be pretty useless if it didn't.

        Some operations that don't cause a task switch to another virtual thread are:

        - If you've called into a native library and back into Java that then blocks. In practice this never happens because Java code doesn't rely on native libraries or frameworks that much and when it does happen it's nearly always in-and-out quickly without callbacks. This can't be fixed by the JVM, however.

        - File IO. No fundamental problem here, it can be fixed, it's just that not so many programs need tens of thousands of threads doing async file IO.

        - If you're holding a lock using 'synchronized'. No fundamental problem here, it's just annoying because of how HotSpot is implemented. They're fixing this at the moment.

        In practice it's mostly the last one that causes issues in real apps. It's not hard to work around, and eventually those workarounds won't be needed anymore.

      • za3faran a year ago

        You're referring to thread pinning, and this is being addressed.

  • kimi a year ago

    It's more like Erlang threads - they appear to be blocking, so existing code will work with zero changes. But you can create a gazillion of them.

  • he0001 a year ago

    > My rough understanding is that this is similar to async/await in .NET?

    The biggest difference is that C# async/await code is rewritten by the compiler to be able to be async. This means that you see artifacts in the stack that weren’t there when you wrote the code.

    There are no rewrites with virtual threads and the code is presented on the stack just as you write it.

    They solve the same problem but in very different ways.

    • pansa2 a year ago

      > They solve the same problem but in very different ways.

      Yes. Async/await is stackless, which leads to the “coloured functions” problem (because it can only suspend function calls one-by-one). Threads are stackful (the whole stack can be suspended at once), which avoids the issue.

    • jayd16 a year ago

      There is overlap but they really don't solve the same problem. Cooperative threading has its own advantages and patterns that won't be served by virtual threads.

      • he0001 a year ago

        What patterns does async/await solve which virtual threads don’t?

        • jayd16 a year ago

          If you need to be explicit about thread contexts because you're using a thread that's bound to some other runtime (say, a GL Context) or you simply want to use a single thread for synchronization like is common in UI programming with a Main/UI Thread, async/await does quite well. The async/await sugar ends up being a better devx than thread locking and implicit threading just doesn't cut it.

          In Java they're working on a structured concurrency library to bridge this gap, but IMO, it'll end up looking like async/await with all its ups and downs but with less sugar.

          • he0001 a year ago

            What’s stopping you from using a single thread for synchronization?

            • jayd16 a year ago

              You can use virtual threads running on a single OS thread and that will work but then everything will be on that one thread. You'll have synchronization but you'll also always be blocking on that one thread as well.

              Async/await is able to achieve good UX around explicitly defining what goes on your Main thread and what goes elsewhere. Its trivial to mix UI thread and background thread code by bouncing between synchronization contexts as needed.

              When the threading model is implicit its impossible to have this control.

        • neonsunset a year ago

          "Green Threads" as implemented in Java is a solution that solves only a single problem - blocking/multiplexing.

          It does not enable easy concurrency and task/future composition the way C#/JS/Rust do, which offer strictly better and more comprehensive model.

          • za3faran a year ago

            Structured concurrency[1] offers task composition and more.

            [1] https://openjdk.org/jeps/453

          • he0001 a year ago

            What do you mean? It implements the Future/Task interface and you can definitely use that. In fact you can’t tell the difference from a virtual thread vs a platform one, and it’s available everywhere. I for one thinks it’s much easier to use than the async/await pattern as I don’t need any special syntax to use it.

  • fulafel a year ago

    Can you expand on how the benefit in your rewrite came about? Threads don't consume CPU when they're waiting for the DB, after all. And threads share memory with each other.

    (I guess scaling to ridiculous levels you could be approaching trouble if you have O(100k) outstanding DB queries per application server, hope you have a DB that can handle millions of oustanding DB queries then!)

    • segfaltnh a year ago

      In large numbers the cost of switching between threads does consume CPU while they're waiting for the database. This is why green threads exist, to have large numbers of in flight work executing over a smaller number of OS threads.

      • fulafel a year ago

        When using OS threads, there's no switching when they are waiting for a socket (db connection). The OS knows to wake the thread up only when there's something new to see on the connection.

        • kbolino a year ago

          Both sides of a sleep/awake transition with conventional blocking system calls involve heavyweight context switches: the CPU protection level changes and the thread registers get saved out or loaded back in.

  • xxs a year ago

    >My rough understanding is that this is similar to async/await in .NET?

    No, the I/O is still blocking with respect to the application code.

tzahifadida a year ago

Similarly the power of golang concurrent programming is that you write non-blocking code as you write normal code. You don't have to wrap it in functions and pollute the code but moreover, not every coder on the planet knows how to handle blocking code properly and that is the main advantage. Most programming languages can do anything the other languages can do. The problem is that not all coders can make use of it. This is why I see languages like golang as an advantage.

  • jillesvangurp a year ago

    Kotlin embraced the same thing via co-routines, which are conceptually similar to go routines. It adds a few useful concepts around this though; mainly that of a co-routine context which encapsulates that a tree of co-routine calls needs some notion of failure handling and cancellation. Additionally, co-routines are dispatched to a dispatcher. A dispatcher can be just on the same thread or actually use a thread pool. Or as of recent Java versions a virtual thread pool. There's actually very little point in using virtual threads in Kotlin. They are basically a slightly more heavy weight way of doing co-routines. The main benefit is dealing with legacy blocking Java libraries.

    But the bottom line with virtual threads, go-routines, or kotlin's co-routines is that it indeed allows for imperative code style code that is easy to read and understand. Of course you still need to understand all the pitfalls of concurrency bugs and all the weird and wonderful way things can fail to work as you expect. And while Java's virtual threads are designed to work like magic pixie dust, it does have some nasty failure modes where a single virtual thread can end up blocking all your virtual threads. Having a lot of synchronized blocks in legacy code could cause that.

    • tzahifadida a year ago

      Kotlin is not a language I learned so I will avoid commenting.

      However, the use of JAVA for me is for admin backend or heavy weight services for enterprises or startups I coded for, so for my taste I can't use it without spring or jboss, etc.. , and in that way I think simplicity went out the window a long long time ago :) It took me years to learn all the quirks of these frameworks... and the worse thing about it is that they keep changing every few months...

      • jillesvangurp a year ago

        Kotlin makes a lot of that stuff easier to deal with and there is also a growing number of things that work without Java libraries. Or even the JVM. I use it with Spring Boot. But we also have a lot of kotlin-js code running in a browser. And I use quite a few multiplatform libraries for Kotlin that work pretty much anywhere. I've even written a few myself. It's pretty easy to write portable code in Kotlin these days.

        For example ktor works on the JVM but you can also build native applications with it. And I use ktor client in the browser. When running in the browser it uses the browser fetch API. When running on the jvm you can configure it to use any of a wide range of Java http clients. On native it uses curl.

  • juyjf_3 a year ago

    Can we stop pretending Erlang does not exist?

    Go is a next-gen trumpian language that rejects sum types, pattern matching, non-nil pointers, and for years, generics; it's unhinged.

    • seabrookmx a year ago

      While I generally agree with your take that it's a regression in PL design, there's no need to be inflammatory. There's lots of good software written in it.

      > pretending Erlang does not exist

      For better or worse it doesn't to most programmers. The syntax is not nearly as approachable as GoLang. Luckily Elixir exists.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection