Java Virtual Threads: A Case Study

167 points by mighty_plant 2 years ago · 198 comments

Reader

pron 2 years ago

Virtual threads do one thing: they allow creating lots of threads. This helps throughput due to Little's law [1]. But because this server here saturates the CPU with only a few threads (it doesn't do the fanout modern servers tend to do), this means that no significant improvements can be provided by virtual threads (or asynchronous programming, which operates on the same principle) while keeping everything else in the system the same, especially since everything else in that server was optimised for over two decades under the constraints of expensive threads (such as the deployment strategy to many small instances with little CPU).

So it looks like their goal was: try adopting a new technology without changing any of the aspects designed for an old technology and optimised around it.

[1]: https://youtu.be/07V08SB1l8c

stelfer 2 years ago

It goes deeper than Little's Law. Every decent textbook on introductory queuing theory has the result that on a normalized basis, fast server > multi-server > multi-queue. That analysis admits almost arbitrary levels of depth of analysis and still holds true.
Your observation that computing architectures have chased fast server for decades is apt. There's a truism in computing that those who build systems are doomed to relearn the lessons of the early ages of networks, whether they studied them in school or not. But kudos to whoever went through the exercise again.
jayceedenton 2 years ago

I guess at least their work has confirmed what we probably already knew intuitively: if you have CPU-intensive tasks, without waiting on anything, and you want to execute these concurrently, use traditional threads.
The advice "don't use virtual threads for that, it will be inefficient" really does need some evidence.
Mildly infuriating though that people may read this and think that somehow the JVM has problems in its virtual thread implementation. I admit their 'Unexpected findings' section is very useful work, but the moral of this story is: don't use virtual threads for this that they were not intended for. Use them when you want a very large number of processes executing concurrently, those processes have idle stages, and you want a simpler model to program with than other kinds of async.
- pron 2 years ago
  
  I'll put it this way: to benefit from virtual threads (or, indeed, from any kind of change to scheduling, such as with asynchronous code) you clearly need 1. some free computational resources and 2. lots of concurrent tasks. The server here could perhaps have both with some changes to its deployment and coding style, but as it was tested -- it had neither. I'm not sure what they were hoping to achieve.
hitekker 2 years ago

This take sounds reasonable to me. But I'm not an expert, and I'd be curious to hear an opposing view if there's one.
- michaelt 2 years ago
  
  Standard/OS threads in Java use about a megabyte of memory per thread, so running 256 threads uses about 256 MB of memory before you've even started allocating things on the heap.
  Virtual threads are therefore useful if you're writing something like a proxy server, where you want to allow lots of concurrent connections, and you want to use the familiar thread-per-connection programming model.
  - layer8 2 years ago
    
    Only address space of 1 MB is reserved (which can still be a problem), actual memory usage is limited to the memory pages that are actually accessed by the program within that address space.
- kaba0 2 years ago
  
  He is as much of an expert as it gets, as he is the leader of the Loom project.
- binary132 2 years ago
  
  Greenlets ultimately have to be scheduled onto system threads at the end of the day unless you have a lightweight thread model of some sort supported by the OS, so it’s a little bit misleading depending on how far down the stack you want to think about optimizing for greenlets. You could potentially have a poor implementation of task scheduling for some legacy compatibility reason, however. I guess I’d be curious about the specifics of what pron is discussing.
  - troupo 2 years ago
    
    Even though yes, in the end you have to map onto system threads, there are still quite a fee things you can do. But this is infeasible for Java, unfortunately.
    For example, in Erlang the entire VM is built around green threads with a huge amount of guarantees and mechanisms: https://news.ycombinator.com/item?id=40989995
    When your entire system is optimized for green threads, the question of "it still needs to map onto OS threads" loses its significance
    
    binary132 2 years ago
    
    I really don’t think it’s useful to be this nonspecific. You could give an example of what a Java greenlet cannot do or how it cannot be optimized, for example. If your whole point is actually just “I prefer the semantics of BEAM threads”, then just say that.
    
    troupo 2 years ago
    
    Those semantics are exactly what cannot be done in Java for many reasons (including legacy code etc.).
    And yes, those semantics are important, but sadly most people stop at "yay we have green threads now" and then a null pointer exception kills their entire app, or the thread that handles requests, or...
    
    binary132 2 years ago
    
    So let’s be clear, your point is that you find the API of non-BEAM greenlets less useful, not that they’re somehow necessarily less efficient. Right?
    
    MaxBarraclough 2 years ago
    
    > When your entire system is optimized for green threads, the question of "it still needs to map onto OS threads" loses its significance
    How's that? What about parallelism?

cayhorstmann 2 years ago

I looked at the replication instructions at https://github.com/blueperf/demo-vt-issues/tree/main, which reference this project: https://github.com/blueperf/acmeair-authservice-java/tree/ma...

What "CPU-intensive apps" did they test with? Surely not acmeair-authservice-java. A request does next to nothing. It authenticates a user and generates a token. I thought it at least connects to some auth provider, but if I understand it correctly, it just uses a test config with a single test user (https://openliberty.io/docs/latest/reference/config/quickSta...). Which would not be a blocking call.

If the request tasks don't block, this is not an interesting benchmark. Using virtual threads for non-blocking tasks is not useful.

So, let's hope that some of the tests were with tasks that block. The authors describe that a modest number of concurrent requests (< 10K) didn't show the increase in throughput that virtual threads promise. That's not a lot of concurrent requests, but one would expect an improvement in throughput once the number of concurrent requests exceeds the pool size. Except that may be hard to see because OpenLiberty's default is to keep spawning new threads (https://openliberty.io/blog/2019/04/03/liberty-threadpool-au...). I would imagine that in actual deployments with high concurrency, the pool size will be limited, to prevent the app from running out of memory.

If it never gets to the point where the number of concurrent requests significantly exceeds the pool size, this is not an interesting benchmark either.

pansa2 2 years ago

Are these Virtual Threads the feature that was previously known as “Project Loom”? Lightweight threads, more-or-less equivalent to Go’s goroutines?

giamma 2 years ago

Yes, at EclipseCon 2022 an Oracle manager working on the Helidon framework presented their results replacing the Helidon core, which was based on Netty (and reactive programming) with Virtual Threads (using imperative programming). [1].
Unfortunately the slides from that presentation were not uploaded to the conference site, but this article summarizes [2] the most significant metrics. The Oracle guy claimed that by using Virtual Threads Oracle was able to implement, using imperative Java, a new engine for Helidon (called Nima) that had identical performance to the old engine based on Netty, which is (at least in Oracle's opinion) the top performing reactive HTTP engine.
The conclusion of the presentation was that based on Oracle's experience imperative code is much easier to write, read and maintain with respect to reactive code. Given the identical performance achieved with Virtual Threads, Oracle was going to abandon reactive programming in favor of imperative programming and virtual threads in all its products.
[1] https://www.eclipsecon.org/2022/sessions/helidon-nima-loom-b...
[2] https://medium.com/helidon/helidon-n%C3%ADma-helidon-on-virt...
- draven 2 years ago
  
  I wanted to know more, and it seems this is the presentation: https://www.youtube.com/watch?v=aYY04D4_OQA
pgwhalen 2 years ago

Yes. It's not that the feature was previously known under a different name - Project Loom is the OpenJDK project, and Virtual Threads are the main feature that has come out of that project.
tomp 2 years ago

They're not equivalent to Go's goroutines.
Go's goroutines are preemptive (and Go's development team went through a lot of pain to make them such).
Java's lightweight threads aren't.
Java's repeating the same mistakes that Go made (and learned from) 10 years ago.
- unscaled 2 years ago
  
  I would put it more charitably as "Java Virtual Threads are new and have not seen massive use and optimization yet".
  This is crucial, because Java wouldn't necessarily require the same optimizations Go needed.
  Making Virtual Threads fully preemptive could be useful, but it's probably not as crucial as it was for Go.
  Go does not have a native mechanism to spawn OS threads that are separate from the scheduler pool, so if you want to run a long CPU-heavy task, you can only run it on the same pool as you run your I/O-bound Goroutines. This could lead to starvation, and adding partial preemption and later full preemption was a neat way to solve that issue.
  On the other hand, Java still has OS threads, so you can put those long-running CPU-bound tasks on a separate thread-pool. Yes, it means programmers need to be extra careful with the type of code they run on Virtual Threads, but it's not the same situation as Go faced: in Java they DO have a native escape hatch.
  I'm not saying a preemptive scheduler won't be helpful at Java, but it just isn't as direly needed as it was with Go. One of the most painful issues with Java Virtual Threads right now is thread pinning when a synchronized method call is executed. Unfortunately, a lot of existing Java code is heavily using synchronized methods[1], so it's very easy to unknowingly introduce a method call that pins an OS thread. Preemeptive could solve this issue, but it's not the only way to solve it.
  ---
  [1] One of my pet peeves with the Java standard library is that almost any class or method that was added before Java 5 is using synchronized methods excessively. One of the best examples is StringBuffer, the precursor of StringBuilder, where all mutating methods are synchronized, as if it was a common use case to build a string across multiple threads. I'm still running into StringBuffers today in legacy codebases, but even newer codebases tend to use synchronized methods over ReentrantLocks or atomic operations, since they're just so easy to use.
- jayd16 2 years ago
  
  Virtual threads could be scheduled pre-emptively but currently the scheduler will wait for some kind of thread sleep to schedule another virtual thread. That's just a scheduler implementation detail and the spec is such that a time slice scheduler could be implemented.
  - tomp 2 years ago
    
    Yes, but the problem is that the spec is such that preemptive blocking doesn't need to be implemented.
    That means that Java programmers have to be very careful when writing code, lest they block the entire underlying (OS) thread!
    Again, Go already went through that experience. It was painful. Java should have learned and implemented it from the start
    
    SureshG 2 years ago
    
    > That means that Java programmers have to be very careful when writing code
    From JEP 444:
    The scheduler does not currently implement time sharing for virtual threads. Time sharing is the forceful preemption of a thread that has consumed an allotted quantity of CPU time. While time sharing can be effective at reducing the latency of some tasks when there are a relatively small number of platform threads and CPU utilization is at 100%, it is not clear that time sharing would be as effective with a million virtual threads.
    Also, in this scenario, i think the current scheduler (ForkJoin Pool) will use managed blocker to compensate those pinned carrier threads.
    
    jayd16 2 years ago
    
    I don't know. The language already has Thread.Yield. If your use case is such that you have starvation and care about it, it seems trivial to work around.
    Still, an annoying gotcha if it hits you unexpectedly.
  - nimish 2 years ago
    
    This is a really unfortunate gotcha that's not at all obvious. Does it kick preemption up a layer to the OS then?
    
    Jtsummers 2 years ago
    
    The "not at all obvious" gotcha is described in the documentation near the top, under the heading "What is a Virtual Thread?":
    https://docs.oracle.com/en/java/javase/21/core/virtual-threa...
    > Like a platform thread, a virtual thread is also an instance of java.lang.Thread. However, a virtual thread isn't tied to a specific OS thread. A virtual thread still runs code on an OS thread. However, when code running in a virtual thread calls a blocking I/O operation, the Java runtime suspends the virtual thread until it can be resumed. The OS thread associated with the suspended virtual thread is now free to perform operations for other virtual threads.
    It's not been hidden at all in their presentation on virtual threads.
    The OS thread that the virtual thread is mounted to can still be preempted, but that won't free up the OS thread for another virtual thread. However, if you use them for what they're intended for this shouldn't be a problem. In practice, it will be because no one can be bothered to RTFM.
    
    nimish 2 years ago
    
    All that says is the Java runtime will suspend on blocking IO, not that it _only_ suspends on blocking IO.
    > what they're intended for
    Java prides itself on careful and deliberate changes to eliminate foot guns, but this seems like a pretty major restriction. Usually these kinds of cooperative threads are called fibers or something else to distinguish them from truly preempt-able threads.
    Expecting developers to read the minutiae of documentation (there's another restriction around synchronized blocks) is a fool's errand TBH. Principle of least surprise, etc.
    
    cstrahan 2 years ago
    
    As a sibling comment points out, there's nothing in what you quoted that logically implies that blocking I/O is the only reason for a virtual thread to be suspended.
    The best info I could find was this blog post:
    https://blogs.oracle.com/javamagazine/post/going-inside-java...
    "Virtual threads, however, are handled differently than platform threads. None of the existing schedulers for virtual threads uses time slices to preempt virtual threads."
    The next handful of paragraphs are also interesting.
Skinney 2 years ago

Yes

exabrial 2 years ago

What is the virtual thread / event loop pattern seeking to optimize? Is it context switching?

A number of years ago I remember trying to have a sane discussion about “non blocking” and I remember saying “something” will block eventually no matter what… anything from the buffer being full on the NIC to your cpu being at anything less than 100%. Does it shake out to any real advantage?

gregopet 2 years ago

It's a brave attempt to release the programmer from worrying or even thinking about thread pools and blocking code. Java has gone all in - they even cancelled a non-blocking rewrite of their database driver architecture because why have that if you won't have to worry about blocking code? And the JVM really is a marvel of engineering, it's really really good at what it does, so what team to better pull this off?
So far, they're not quite there yet: the issue of "thread pinning" is something developers still have to be aware of. I hear the newest JVM version has removed a few more cases where it happens, but will we ever truly 100% not have to care about all that anymore?
I have to say things are already pretty awesome however. If you avoid the few thread pinning causes (and can avoid libraries that use them - although most of not all modern libraries have already adapted), you can write really clean code. We had to rewrite an old app that made a huge mess tracking a process where multiple event sources can act independently, and virtual threads seemed the perfect thing for it. Now our business logic looks more like a game loop and not the complicated mix of pollers, request handlers, intermediate state persisters (with their endless thirst for various mappers) and whatnot that it was before (granted, all those things weren't there just because of threading.. the previous version was really really shitily written).
It's true that virtual threads sometimes hurt performance (since their main benefit is cleaner simpler code). Not by much, usually, but a precisely written and carefully tuned piece of performance critical code can often still do things better than automatic threading code. And as a fun aside, some very popular libraries assumed the developer is using thread pools (before virtual threads, which non trivial Java app didn't? - ok nobody answer that, I'm sure there are cases :D) so these libraries had performance tricks (ab)using thread pool code specifics. So that's another possible performance issue with virtual threads - like always with performance of course: don't just assume, try it and measure! :P
- pragmatick 2 years ago
  
  > although most of not all modern libraries have already adapted
  Unfortunately kafka, for example, has not: https://github.com/spring-projects/spring-kafka/commit/ae775...
- haspok 2 years ago
  
  Just a side note, async JDBC was a thing way before Loom came about, and it failed miserably. I'm not sure why, but my guess would be is that most enterprise software is not web-scale, so JDBC worked well as it was.
  Also, all the database vendors provided their drivers implementing the JDBC API - good luck getting Oracle or IBM contribute to R2DBC.. (Actually, I stand corrected: there is an Oracle R2DBC driver now - it was released fairly recently though.)
  EDIT: "failed miserably" is maybe too strong - but R2DBC certainly doesn't have the support and acceptance of JDBC.
  - vbezhenar 2 years ago
    
    R2DBC allows to efficiently maintain millions of connections to the database. But what database supports millions of connections? Not postgres for sure, and probably no other conventional database. So using reactive JDBC driver makes little sense, if you're going to use 1000 connections, 1000 threads will do just fine and bring little overhead. Those who use Java, don't care about spending 100 more MB of RAM when their service already eats 60GB.
    
    merb 2 years ago
    
    Reactive drivers were not about 1000 connections, they were about reusing a single connection better, by queuing a little bit more efficient over a single connection. Reactive programming is not about parallelism, it’s about concurrency.
    
    vbezhenar 2 years ago
    
    It is not possible to reuse a single connection better, if we're talking about postgres. You must conduct the transaction over a single connection and you cannot mix different transactions simultaneously over a single connection. That's the way postgres wire protocol works. I think that there's some rudimentary async capabilities, but they don't change anything fundamentally.
    It might be different for some exotic databases, but I don't see any reason why ordinary JDBC driver couldn't reuse single TCP connection for multiple logical JDBC connections in this case.
  - frevib 2 years ago
    
    It could also be that there just isn’t enough demand for a non-blocking JDBC. For example, Postgresql server is not coping very well with lots of simultaneous connections, due to it’s (a.o.) process-per-connection model. From the client-side (JDBC), a small thread poool would be enough to max out the Postgresql server. And there is almost no benefit of using non-blocking vs a small thread pool.
    
    haspok 2 years ago
    
    I would argue the main benefit would be that the threadpool that the developer would create anyway would instead be created by the async database driver, which has more intimate knowledge about the server's capabilities. Maybe it knows the limits to the number of connections, or can do other smart optimizations. In any case, for the developer it would be a more streamlined experience, with less code needed, and better defaults.
    
    frevib 2 years ago
    
    I think we’re confusing async and non-blocking? Non-blocking is the part what makes virtual threads more efficient than threads. Async is the programming style; e.g. do things concurrently. Async can be implemented with threads or non-blocking, if the API supports it. I was merely arguing that a non-blocking JDBC has little merit as the connections to a DB are limited. Non-blocking APIs are only beneficial when there are lots, > 10k connections.
    JDBC knows nothing about the amount of connections a server can handle, but to try so many connections until it won’t connect any more.
    | In any case, for the developer it would be a more streamlined experience, with less code needed, and better defaults.
    I agree it would be best not to bother the dev with what is going on under the hood.
- exabrial 2 years ago
  
  Thank you for a A very candid response I enjoyed reading it!
  My question is though: Why even do alleged “non-blocking” _at all_? What are people trying to optimize against?
  - jandrewrogers 2 years ago
    
    The short answer is that blocking is expensive due to the overhead of the implied context switch and poor locality. As computers become faster, a larger percentage of the CPU time is dedicated to context-switching overhead and non-blocking architectures eliminate that. For applications like databases where this problem is more severe, the difference in throughput between a blocking architecture and a non-blocking architecture can be 10x on the same hardware, so it is a very important optimization if you want your software to have performance that is competitive.
    A modern thread-per-core shared-nothing architecture takes this even further and tries to eliminate blocking at the hardware level for the same basic reason.
- immibis 2 years ago
  
  So... What is it seeking to optimize? Why did you need a thread pool before but not any more? What resource was exhausted to prevent you from putting every request on a thread?
  - chipdart 2 years ago
    
    > So... What is it seeking to optimize?
    The goal is to maximize the number of tasks you can run concurrently, while imposing on the developers a low cognitive load to write and maintain the code.
    > Why did you need a thread pool before but not any more?
    You still need a thread pool. Except with virtual threads you are no longer bound to run a single task per thread. This is specially desirable when workloads are IO-bound and will expectedly idle while waiting for external events. If you have a never-ending queue of tasks waiting to run, why should you block a thread consuming that task queue by running a task that stays idle while waiting for something to happen? You're better off starting the task and setting it aside the moment it awaits for something to happen.
    > What resource was exhausted to prevent you from putting every request on a thread?
    
    riku_iki 2 years ago
    
    > why should you block a thread
    if creating gazillion threads on modern hardware is super cheap why not? I have transparency and debuggability what threads are running, can check stacktrace of each and what are they blocked on.
    virtual threads adds lots of magic under the hood, and if there will be some bug or lib in your infra with no vthreads support it is absolutely not clear how to debug it.
    
    chipdart 2 years ago
    
    > if creating gazillion threads on modern hardware is super cheap why not?
    Virtual threads are a performance improvement over threads, no matter how cheap to create threads are. Virtual threads run on threads. If threads become cheaper to create, so do virtual threads. They are not mutually exclusive.
    Virtual threads are on top of that a developer experience improvement. Code is easier to write and maintain.
    Virtual threads improve throughput because the moment a task is waiting for anything like IO, the thread is able to service any other task in the queue.
    
    riku_iki 2 years ago
    
    > Virtual threads are on top of that a developer experience improvement. Code is easier to write and maintain.
    except now you need to prove somehow that all 100 libs in your project support virtual threads.
    > Virtual threads improve throughput because the moment a task is waiting for anything like IO, the thread is able to service any other task in the queue.
    from reading similar discussions, linux for example doesn't have true IO async API, you just push lock of Java thread to lock of thread in the kernel
    
    Fulgen 2 years ago
    
    > linux for example doesn't have true IO async API
    io_uring has been around for a few years at this point, with vulnerabilities having been fixed to the point that it's fit for broadband usage.
    
    riku_iki 2 years ago
    
    In this discussion people claim io_uring is not real async IO, but just another kernel level threadpool: https://news.ycombinator.com/item?id=38919659
    
    stoperaticless 2 years ago
    
    Each thread adds overhead.
    Some usage types don’t care, some do.
    From what I gather virtual threads are an alternative to “callback-hell” (js) or async coloring (python).
    
    riku_iki 2 years ago
    
    > Some usage types don’t care, some do.
    I suspect if you care about threads overhead, you won't pick Java, because there will be overhead in other areas too
    > From what I gather virtual threads are an alternative to “callback-hell” (js) or async coloring (python).
    there is also existing ExecutorService and futures in Java
    
    stoperaticless 2 years ago
    
    > there is also existing ExecutorService and futures in Java
    Yes, virtual threads are an alternative also to those. (Kind of)
    
    riku_iki 2 years ago
    
    And my frustration is that java had that API for 20 years, it is used everywhere and absolutely battle tested, and now they are adding those virtual threads which break third party libs and make JVM more complicated with various degradations in exchange of benefits most will not notice..
  - gregopet 2 years ago
    
    It's mainly trying to make you not worry about how many threads you create (and not worry about the caveats that come with optimising how many threads you create, which is something you are very often forced to do).
    You can create a thread in your code and not worry whether that thing will then be some day run in a huge loop or receive thousands of requests and therefore spend all your memory on thread overhead. Go and other languages (in Java's ecosystem there's Kotlin for example) employ similar mechanisms to avoid native thread overhead, but you have to think about them. Like, there's tutorial code where everything is nice & simple, and then there's real world code where a lot of it must run in these special constructs that may have little to do with what you saw in those first "Hello, world" samples.
    Java's approach tries to erase the difference between virtual and real threads. The programmer should have to employ no special techniques when using virtual threads and should be able to use everything the language has to offer (this isn't true in many languages' virtual/green threads implementations). Old libraries should continue working and perhaps not even be aware they're being run on virtual threads (although, caveats do apply for low level/high performance stuff, see above posts). And libraries that you interact with don't have to care what "model" of green threading you're using or specifically expose "red" and "blue" functions.
    
    giamma 2 years ago
    
    You will still have to worry, too many virtual threads will imply too much context switching. However, virtual threads will be always interruptable on I/O, as they are not mapped to actual o.s. threads, but rather simulated by the JVM which will executed a number of instructions for each virtual thread.
    This gives the chance to the JVM to use real threads more efficiently, avoiding that threads remain unused while waiting on I/O (e.g. a response from a stream). As soon as the JVM detects that a physical thread is blocked on I/O, a semaphore, a lock or anything, it will reallocate that physical thread to running a new virtual thread. This will reduce latency, context switch time (the switching is done by the JVM that already globally manages the memory of the Java process in its heap) and will avoid or at least largely reduce the chance that a real thread remains allocated but idle as it's blocked on I/O or something else.
    
    frant-hartm 2 years ago
    
    What do you mean by context switching?
    My understanding is that virtual threads mostly eliminate context switching - for N CPUs JVM creates N platform threads and they run virtual threads as needed. There is no real context switching apart from GC and other JVM internal threads.
    A platform thread picking another virtual thread to run after its current virtual thread is blocked on IO is not a context switch, that is an expensive OS-level operation.
    
    giamma 2 years ago
    
    The JVM will need to do context switching when reallocating the real thread that is running a blocked virtual thread to the next available virtual thread. It won't be CPU context switching, but context switching happens at the JVM level and represents an effort.
    
    frant-hartm 2 years ago
    
    Ok. This JVM-level switching is called mounting/un-mounting of the virtual thread and is supposed to be several orders of magnitude cheaper compared to normal context switch. You should be fine with millions of virtual threads.
    
    anonymousDan 2 years ago
    
    Does Java's implementation of virtual threads perform any kind of work stealing when a particular physical thread has no virtual threads to run (e.g. they are all blocked on I/O)?
    
    mike_hearn 2 years ago
    
    It does. They get scheduled onto the ForkJoinPool which is a work stealing pool.
    
    immibis 2 years ago
    
    "they run virtual threads as needed" - so when one virtual thread is no longer needed and another one is needed, they switch context, yes?
    
    frant-hartm 2 years ago
    
    This is called mounting/un-mounting and is much cheaper than a context switch.
    
    immibis 2 years ago
    
    This is a type of context switch. You are saying dollars are cheaper than money.
    
    peeters 2 years ago
    
    It's been a really long time since I dealt with anything this low level, but in my very limited and ancient experience when people talk about context switching they're talking specifically about the userland process yielding execution back to the kernel so that the processor can be reassigned to a different process/thread. Naively, if the JVM isn't actually yielding control back to the kernel, it has the freedom to do things in a much more lightweight manner than the kernel would have to.
    So I think it's meaningful to define what we mean by context switch here.
    
    giamma 2 years ago
    
    When a real thread is allocated from a virtual thread to another, the JVM needs to save into the heap the stack of the first virtual thread and restore from the heap the stack of the second virtual thread, see slide 13 of [1]. This is in fact called mounting/unmounting as already pointed out, and occurs via Java Continuation, but from the JVM perspective this is a context switch. It's called JVM and the V stands for Virtual, so yes, it's not the kernel doing it, but it's happening, and it's more frequent the more virtual threads you have in your application.
    [1] https://www.eclipsecon.org/sites/default/files/slides/JavaLo...
    
    immibis 2 years ago
    
    swapcontext(3) is a userland context switch, and is named so.
    
    immibis 2 years ago
    
    It seems that the answer to the question was "memory". Stack allocations, presumably. You have answered by telling us that virtual threads are better than real threads because real threads suck, but you didn't say why they suck or why virtual threads don't suck in the same way.
    
    mike_hearn 2 years ago
    
    Real threads don't suck but they pay a price for generality. The kernel doesn't know what software you're going to run, and there's no standards for how that software might use the stack. So the kernel can't optimize by making any assumptions.
    Virtual threads are less general than kernel threads. If you use a virtual thread to call out of the JVM you lose their benefits, because the JVM becomes like the kernel and can't make any assumptions about the stack.
    But if you are running code controlled by the JVM, then it becomes possible to do optimizations (mostly stack related) that otherwise can't be done, because the GC and the compiler and the threads runtime are all developed together and work together.
    Specifically, what HotSpot can do moving stack frames to and from the heap very fast, which interacts better with the GC. For instance if a virtual thread resumes, iterates in a loop and suspends again, then the stack frames are never copied out of the heap onto the kernel stack at all. Hotspot can incrementally "pages" stack frames out of the heap. Additionally, the storage space used for a suspended virtual thread stack is a lot smaller than a suspended kernel stack because a lot of administrative goop doesn't need to be saved at all.
    
    brabel 2 years ago
    
    OS Threads do not suck, they're great. But they are expensive to create as they require a syscall, and they're expensive to maintain as they consume quite a bit of memory just to exist, even if you don't need it (due to how they must pre-allocate a stack which apparently is around 2MB initially, and can't be made smaller as in most cases you will need even more, so it would make most cases worse).
    Virtual Threads are very fast to create and allocate only the memory needed by the actual call stack, which can be much less than for OS Threads.
    Also, blocking code is very simple compared to the equivalent async code. So using blocking code makes your code much easier to follow. Check out examples of reactive frameworks for Java and you will quickly understand why.
    
    kllrnohj 2 years ago
    
    > and they're expensive to maintain as they consume quite a bit of memory just to exist, even if you don't need it (due to how they must pre-allocate a stack which apparently is around 2MB initially,
    I'm not familiar with windows, but this certainly isn't the case on Linux. It only costs 2mb-8mb of virtual address space, not actual physical memory. And there's no particular reason to believe the JVM can have a list of threads and their states more efficiently than the kernel can.
    All you really save is the syscall to create it and some context switching costs as the JVM doesn't need to deal with saving/restoring registers as there's no preemption.
    The downside though is you don't have any preemption, which depending on your usage is a really fucking massive downside.
    
    brabel 2 years ago
    
    > And there's no particular reason to believe the JVM can have a list of threads and their states more efficiently than the kernel can.
    Of course there is. The JVM is able to store the current stack for the Thread efficiently in the pre-allocated heap. Switching execution between Virtual Threads is very cheap. Experiments show you can have millions of VTs, but only a few thousand OS Threads.
    I don't know why you think preemption is a big downside?! The JVM only suspends a Thread at safe points and those are points where it knows exactly when to resume. I don't believe there's any downsides at all.
    
    Someone 2 years ago
    
    > The downside though is you don't have any preemption, which depending on your usage is a […] massive downside.
    Nobody is taking OS threads away, so you can choose to use them when they better fit your use case.
  - jmaker 2 years ago
    
    Briefly: The cost of spawning schedulable entities, memory and the time to execution. Virtual threads, i.e., fibers, entertain lightweight stacks. You can spawn as many as you like immediately. Your runtime system won’t go out of memory as easily. In addition, the spawning happens much faster in user space. You’re not creating kernel threads, which is a limited and not cheap resource, whence the pooling you’re comparing it to. With virtual threads you can do thread per request explicitly. It makes most sense for IO-bound tasks.
  - davidgay 2 years ago
    
    A thread per request has a high risk of overcommitting on CPU use, leading to a different set of problems. Virtual threads are scheduled on a fixed-size (based on number of cores) underlying (non-virtual) thread pool to avoid this problem.
    
    immibis 2 years ago
    
    Why can't virtual threads overcommit CPU use? If I have 4 CPUs and 4000 virtual threads running CPU-bound code, is that not overcommit? A system without overcommit would refuse to create the 5th thread.
    
    detinho 2 years ago
    
    I think parent is saying overcommit with OS threads. 4k requests = 4k OS threads. That would lead to the problems parent is talking about.
    
    immibis 2 years ago
    
    Why wouldn't 4k virtual threads lead to the same problems?
    
    troupo 2 years ago
    
    Because they don't create 4k real threads, and can be scheduled on n=CPU Cores OS threads
    
    immibis 2 years ago
    
    4k "real" threads can also be scheduled on 4 CPU cores. What's the difference?
    
    troupo 2 years ago
    
    Real threads are extremely expensive both in terms of memory and CPU time compared to virtual threads. I think the main issue is not even that but context switching when switching threads which is also very expensive.
    Virtual threads usually require significantly fewer resources to spawn and run. And, if the underlying system is implemented with them in mind, they can use fewer context switches, and possibly even fewer cache misses etc.
  - gifflar 2 years ago
    
    This article nicely describes the differences between threads and virtual threads: https://www.infoq.com/articles/java-virtual-threads/
    I think it’s definitely worth a read.
  - twic 2 years ago
    
    The memory overhead of threads.
fzeindl 2 years ago

Does it shake out to any real advantage?
To put it shortly: Writing single-threaded blocking code is far easier for most people and has many other benefits, like more understandable and readable programs: https://www.youtube.com/watch?v=449j7oKQVkc
The main reason why non-blocking IO with it's style of intertwining concurrency and algorithms came along is that starting a thread for every request was too expensive. With virtual threads that problem is eliminated so we can go back to writing blocking code.
- nlitened 2 years ago
  
  > is far easier for most people
  I’d say that writing single-threaded code is far easier for _all_ people, even async code experts :)
  Also, single-threaded code is supported by programming language facilities: you have a proper call stack, thread-local vars, exceptions bubbling up, structured concurrency, simple resource management (RAII, try-with-resources, defer). Easy to reason and debug on language level.
  Async runtimes are always complicated, filled with leaky abstractions, it’s like another language that one has to learn in addition, but with a less thought-out, ad-hoc design. Difficult to reason and debug, especially in edge cases
  - bheadmaster 2 years ago
    
    > Async runtimes are always complicated, filled with leaky abstractions, it’s like another language that one has to learn in addition, but with a less thought-out, ad-hoc design. Difficult to reason and debug, especially in edge cases
    Async runtimes themselves are simply attempts to bolt-on green threads on top of a language that doesn't support them on a language level. In JavaScript, async/await uses Promises to enable callback-code to interact with key language features like try/catch, for/while/break, return, etc. In Python, async/await is just syntax sugar for coroutines, which are again just syntax sugar for CPS-style classes with methods split at each "yield". Not sure about Rust, but it probably also uses some Rust macro magic to do something similar.
    
    derriz 2 years ago
    
    Indeed. Async runtimes/sytles are attempts to provide a more readable/useable syntax for CPS[1]. CPS originally had nothing to do with blocking/non-blocking or multi-threading but arose as a technique to structure compiler code.
    Its attraction for non-blocking coding is that it allows hiding the multi-threaded event dispatching loop. But as the parent comment suggests, this abstraction is extremely leaky. And in addition, CPS in non-functional languages or without syntactic sugar has poor readability. Improving the readability requires compiler changes in the host language - so many languages have added compiler support to further hide the CPS underpinnings of their async model.
    I've always felt this was a big mistake in our industry - all this effort not only in compilers but also in debuggers/IDE - building on a leaky abstraction. Adding more layers of leaky abstractions has only made the issue worse. Async code, at first glance, looks simple but is a minefield for inexperienced/non-professional software engineers.
    It's annoying that Rust switched to async style - the abstraction leakiness immediately hits you, as the "hidden event dispatching loop" remains a real dependency even if it's not explicit in the code. Thus libraries using asycn cannot generally be used together although last time i looked, tokio seems to have become the de-facto standard.
    [1] https://en.wikipedia.org/wiki/Continuation-passing_style
    
    kaba0 2 years ago
    
    I absolutely agree that the virtual/green thread style is much better, more ergonomic, less likely to be correct, etc, but I can’t fault Rust’s choice, given it being a low-level language without a fat runtime, making it possible to be called into from other runtimes. What the JVM does is simply not possible that way.
    
    logicchains 2 years ago
    
    >Async runtimes themselves are simply attempts to bolt-on green threads on top of a language that doesn't support them on a language level.
    Haskell supports async code while also supporting green threads on a language level, and the async code has most of the same issues as async code in any other languages.
    
    whateveracct 2 years ago
    
    What problems exactly? Haskell has a few things that imo it does better than most languages in this area:
    - All IO is non-blocking by default.
    - FFI support for interruptible.
    - Haskell threads can be preempted externally - this allows you to ensure they never leak. Vs a goroutine that can just spin forever if it doesn't explicitly yield.
    - There are various stdlib abstractions for building concurrent programs in a compositional way.
    
    kbolino 2 years ago
    
    > Haskell threads can be preempted externally - this allows you to ensure they never leak. Vs a goroutine that can just spin forever if it doesn't explicitly yield.
    Goroutines are preemptible by the runtime (since https://go.dev/doc/go1.14#runtime) but they're still not addressable or killable through the language itself.
    
    whateveracct 2 years ago
    
    The GHC runtime has lots of cool concurrency features.
    Async exceptions as a way to pass messages (and kill threads!)
    Allocation limits for threads.
    Software Transactional Memory.
    
    dwattttt 2 years ago
    
    > Not sure about Rust, but it probably also uses some Rust macro magic to do something similar.
    Much the same as JavaScript I understand, but no macros; the compiler turns them into Futures that can be polled
  - xxs 2 years ago
    
    >I’d say that writing single-threaded code is far easier for _all_ people, even async code experts :)
    While 'async' is just a name, underneath it's epoll - and the virtual threads would not perform better than a proper NIO (epoll) server. I dont consider myself an 'async expert' but I have my share of writing NIO code (dare say not terrible at all)
    
    kaba0 2 years ago
    
    Virtual threads literally replace the “blocking” IO call issued by the user, by a proper NIO call, mounting the issuer virtual thread when it signals.
- chipdart 2 years ago
  
  > To put it shortly: Writing single-threaded blocking code is far easier for most people and has many other benefits, like more understandable and readable programs:
  I think you're missing the whole point.
  The reason why so many smart people invest their time on "virtual threads" is developer experience. The goal is to turn writing event-driven concurrent code into something that's as easy as writing single-threaded blocking code.
  Check why C#'s async/await implementation is such a huge success and replaced all past approaches overnight. Check why node.js is such a huge success. Check why Rust's async support is such a hot mess. It's all about developer experience.
  - kitd 2 years ago
    
    I think he was making the same point as you: writing for virtual threads is like writing for single-threaded blocking code.
  - written-beyond 2 years ago
    
    As someone who has written multiple productions services with Async Rust, that are under constant load, I disagree. I've had team members who have only written in C, pick up and start building very comprehensive and performant services in Rust in a matter of days.
    How do you developers spew such strong opinions without taking a moment to think about what you're about to say. Rust cannot be directly compared to C#, Java or even Go.
    You don't get a runtime or a GC with rust. The developer experience is excellent, you get a lot of control over everything you're building with it. Yes it's not as magical as languages and runtimes like you've mentioned, but the fact that I can at anytime rip those abstractions off and make my service extremely lightweight and performant is not something those languages will allow you to do.
    And this is coming from someone who's written non blocking services before Async rust was a thing with just MIO.
    The very fact Rust gets mentioned between these languages should be a tribute to the efforts of it's maintainers and core team. The amount of tooling and features they've added into the language gives developers of every realm liberty to try and build what they want.
    Honestly, you can hold whatever opinion you want on any language but your comparison really doesn't make sense.
- Nullabillity 2 years ago
  
  > To put it shortly: Writing single-threaded blocking code is far easier for most people. [snip] With virtual threads that problem is eliminated so we can go back to writing blocking code.
  This is the core misunderstanding/dishonesty behind the Loom/Virtual Threads hype. Single-threaded blocking code is easy, yes. But that ease comes from being single-threaded, not from not having to await a few Futures.
  But Loom doesn't magically solve the threading problem. It hides the Futures, but that just means that you're now writing a multi-threaded program, without the guardrails that modern Future-aware APIs provide. It's the worst of all worlds. It's the scenario that gave multi-threading such a bad reputation for inscrutable failures in the first place.
chipdart 2 years ago

> What is the virtual thread / event loop pattern seeking to optimize? Is it context switching?
Throughput.
Some workloads are not CPU-bound or memory-bound, and spend the bulk of their time waiting for external processes to make data available.
If your workloads are expected to stay idle while waiting for external events, you can switch to other tasks while you wait for those external events to trigger.
This is particularly convenient if the other tasks you're hoping to run are also tasks that are bound to stay idle while waiting for external events.
One of the textbook scenarios that suits this pattern well is making HTTP requests. Another one is request handlers, such as the controller pattern used so often in HTTP servers.
Perhaps the poster child of this pattern is Node.js. It might not be the performance king and might be single-threaded, but it features in the top spots in performance benchmarks such as TechEmpower's. Node.js is also highly favoured in function-as-a-service applications, as it's event-driven architecture is well suited for applications involving a hefty dose of network calls running on memory- and CPU-constrained systems.
kevingadd 2 years ago

One of the main reasons to do virtual threads is that it allows you to write naive "thread per request" code and still scale up significantly without hitting the kind of scaling limits you would with OS threads.
- hashmash 2 years ago
  
  The problem with the naïve design is that even with virtual threads, you risk running out of (heap) memory if the threads ever block. Each task makes a bit of progress, allocates some objects, and then lets another one do the same thing.
  With virtual threads, you can limit the damage by using a semaphore, but you still need to tune the size. This isn't much different than sizing a traditional thread pool, and so I'm not sure what benefit virtual threads will really have in practice. You're swapping one config for another.
  - dikei 2 years ago
    
    > The problem with the naïve design is that even with virtual threads, you risk running out of (heap) memory if the threads ever block.
    The key with virtual threads is they are so light weight that you can have thousands of them running concurrently: even when they block for I/O, it doesn't matter. It's similar to light weight coroutine in other language like Go or Kotlin.
  - imtringued 2 years ago
    
    What you are complaining about has nothing to do with thread pools or virtual threads. You're pointing out the fact that more parallelism will also need more hardware and that a finite hardware budget will need a back pressure strategy to keep resource consumption within a limit. While you might be correct that "sizing a traditional thread pool" is a back pressure strategy that can be applied to virtual threads, the problem with it is that IO bound threads will prevent CPU bound threads from making progress. You don't want to apply back pressure based on the number of tasks. You want back pressure to be in response to resource utilization, so that enough tasks get scheduled to max out the hardware.
    This is a common problem with people using Java parallel streams, because they by default share a single global thread pool and the way to use your own thread pool is also extremely counterintuitive, because it essentially relies on some implicit thread local magic to choose to distribute the stream in the thread pool that the parallel stream was launched on, instead of passing it as a parameter.
    It would be best if people came up with more dynamic back pressure strategies, because this is a more general problem that goes way beyond thread pools. In fact, one of the key problems of automatic parallelization is deciding at what point there is too much parallelization.
  - initplus 2 years ago
    
    The benefits from virtual threads come from the simple API that it presents to the programmer. It's not a performance optimization.
    
    hashmash 2 years ago
    
    But that same benefit was always available with platform threads -- a simple API. What is the real gain by using virtual threads? It's either going to be performance or memory utilization.
    
    groestl 2 years ago
    
    It's combining the benefits from async models (state machines separated from os threads, thus more optimal for I/O bound workload), with the benefits from proper threading models (namely the simpler human interface).
    Memory utilization & performance is going to be similar to the async callback mess.
    
    hashmash 2 years ago
    
    Why is an async model better than using OS threads for an I/O bound workload? The OS is doing async stuff internally and shielding the complexity with threads. With virtual threads this work has shifted to the JVM. Can the JVM do threads better than the OS?
    
    mrsilencedogood 2 years ago
    
    "Why is an async model better than using OS threads for an I/O bound workload?"
    Because evented/callback-driven code is a nightmare to reason about and breaks lots of very basic tools, like the humble stack trace.
    Another big thing for me is resource management - try/finally don't work across callback boundaries, but do work within a virtual thread. I recently ported a netty-based evented system to virtual threads and a very long-standing issue - resource leakage - turned into one very nice try/finally block.
    
    zokier 2 years ago
    
    > Can the JVM do threads better than the OS?
    Yes. The JVM has far more opportunities for optimizing threads because it doesn't need to uphold 50 years of accumulated invariants and compatibility that current OSes do, and JVM has more visibilty on the application internals.
    
    adgjlsfhk1 2 years ago
    
    it can do a much better job because there isn't a security boundary. OS thread scheduling requires sys calls and invalidate a bunch of cache to prevent timing leaks
    
    CrimsonRain 2 years ago
    
    Create 100k platform threads and you'll find out.
    
    lichtenberger 2 years ago
    
    Throughput. The code can be "suspended" on a blocking call (I/O, where the platform thread usually is wasted, as the CPU has nothing to do during this time). So, the platform thread can do other work in the meantime.
  - packetlost 2 years ago
    
    Yeah, and it's generally good to be RAM limited instead of CPU, no? The alternative is blowing a bunch of time on syscalls and OS scheduler overhead.
    Also the virtual threads run on a "traditional" thread pool to my understanding, so you can just tweak the number of worker threads to cap the total concurrency.
    The benefit is it's overall more efficient (in the general case) and lets you write linear blocking code (as opposed to function coloring). You don't have to use it, but it's nice that it's there. Now hopefully Valhalla actually makes it in eventually
    
    hashmash 2 years ago
    
    The OS scheduler is still there (for the carrier threads), but now you've added on top of that FJ pool based scheduler overhead. Although virtual threads don't have the syscall overhead when they block, there's a new cost caused by allocating the internal continuation object, and copying state into it. This puts more pressure on the garbage collector. Context switching cost due to CPU cache thrashing doesn't go away regardless of which type of thread you're using.
    I've not yet seen a study that shows that virtual threads offer a huge benefit. The Open Liberty study suggests that they're worse than the existing platform threads.
    
    zokier 2 years ago
    
    > The OS scheduler is still there (for the carrier threads), but now you've added on top of that FJ pool based scheduler overhead.
    Ideally carrier threads would be pinned to isolated cpu cores, which removes most aspects of OS scheduler from the picture
    
    zokier 2 years ago
    
    > I've not yet seen a study that shows that virtual threads offer a huge benefit.
    Not exactly Java virtual threads, but a study on how userland threads beat kernel threads.
    https://cs.uwaterloo.ca/~mkarsten/papers/sigmetrics2020.html
    For quick results, check figures 11 and 15 from the (preprint) paper. Userland threads ("fred") have ~50% higher throughput while having orders of magnitude better latency at high load levels, in a real-world application (memcached).
    
    packetlost 2 years ago
    
    The study says there's surprising performance problems with Java's virtual thread implementation. Their test of throughput was also hilarious, they put 2000 OS threads vs 2000 virtual threads: most of the time OS threads don't start falling apart until 100k+ threads. You can architect an application such that you can handle 200k simultaneous connections using platform-thread-per-core, but it's harder to reason about than the linear, blocking code that virtual threads and async allow for.
    > Context switching cost due to CPU cache thrashing doesn't go away regardless of which type of thread you're using.
    Except it's not a context switch? You're jumping to another instruction in the program, one that should be very predictable. You might lose your cache, but it will depend on a ton of factors.
    > there's a new cost caused by allocating the internal continuation object, and copying state into it.
    This is more of a problem with the implementation (not every virtual thread language does it this way), but yeah this is more overhead on the application. I assume there's improvements that can be made to ease GC pressure, like using object pools.
    Usually virtual threads are a memory vs CPU tradeoff that you typically use in massively concurrent IO-bound applications. Total throughput should take over platform threads with hundreds of thousands of connections, but below that probably perform worse, I'm not that surprised by that.
    
    electroly 2 years ago
    
    > Except it's not a context switch? You're jumping to another instruction in the program, one that should be very predictable. You might lose your cache, but it will depend on a ton of factors.
    Java virtual threads are stackful; they have to save and restore the stack every time they mount a different virtual thread to the platform thread. They do this by naive[0] copying of the stack out to a heap allocation and then back again, every time. That's clearly a context switch that you're paying for; it's just not in the kernel. I believe this is what the person you're replying to is talking about.
    [0] Not totally naive. They do take some effort to copy only subsets of the stack if they can get away with it. But it's still all done by copies. I don't know enough to understand why they need to copy and can't just swap stack pointers. I think it's related to the need to dynamically grow the stack when the thread is active vs. having a fixed size heap allocation to store the stack copy.
  - immibis 2 years ago
    
    Async does exactly the same by the way.
pron 2 years ago

No, it optimises hardware utilisation by simply allowing more tasks to concurrently make progress. This allows throughput to reach the maximum the hardware allows. See https://youtu.be/07V08SB1l8c.
duped 2 years ago

imo the biggest difference between "virtual" threads in a managed runtime and "os" threads is that the latter uses a fixed size stack whereas the former is allowed to resize, it can grow on demand and shrink under pressure.
When you spawn an OS thread you are paying at worst the full cost of it, and at best the max depth seen so far in the program, and stack overflows can happen even if the program is written correctly. Whereas a virtual thread can grow the stack to be exactly the size it needs at any point, and when GC runs it can rewrite pointers to any data on the stack safely.
Virtual/green/user space threads aka stackful coroutines have proven to be an excellent tool for scaling concurrency in real programs, while threads and processes have always played catchup.
> “something” will block eventually no matter what…
The point is to allow everything else to make progress while that resource is busy.
---
At a broader scale, as a programming model it lets you architect programs that are designed to scale horizontally. With the commodization of compute in the cloud that means it's very easy to write a program that can be distributed as i/o demand increases. In principle, a "virtual" thread could be spawned on a different machine entirely.
frevib 2 years ago

They indeed optimize thread context switching. Taking the thread on and off the CPU is becoming expensive when there are thousands of threads.
You are right that everything blocks, even when going to L1 cache you have to wait 1 nanoseconds. But blocking in this context means waiting for “real” IO like a network request or spinning disk access. Virtual threads take away the problem that the thread sits there doing nothing for a while as it is waiting for data, before it is context switched.
Virtual threads won’t improve CPU-bound blocking. There the thread is actually occupying the CPU, so there is no problem of the thread doing nothing as with IO-bound blocking.
kbolino 2 years ago

The hardware now is just as concurrent/parallel as the software. High-end NVMe SSDs and server-grade NICs can do hundreds to thousands of things simultaneously. Even if one lane does get blocked, there are other lanes which are open.
lmm 2 years ago

> I remember saying “something” will block eventually no matter what… anything from the buffer being full on the NIC to your cpu being at anything less than 100%.
Nope. You can go async all the way down, right to the electrical signals if you want. We usually impose some amount of synchronous clocking/polling for sanity, at various levels, but you don't have to; the world is not synchronised, the fastest way to respond to a stimulus will always be to respond when it happens.
> Does it shake out to any real advantage?
Of course it does - did you miss the whole C10K discussions 20+ years ago? Whether it matters for your business is another question, but you can absolutely get a lot more throughput by being nonblocking, and if you're doing request-response across the Internet you generally can't afford not to.

bberrry 2 years ago

I don't understand these benchmarks at all. How could it possibly take virtual threads 40-50 seconds to reach maximum throughput when getting a number of tasks submitted at once?

LinXitoW 2 years ago

From my very limited exposure to virtual threads and the older solution (thread pools), the biggest hurdle was the extensive use of ThreadLocals by most popular libraries.

In one project I had to basically turn a reactive framework into a one thread per request framework, because passing around the MDC (a kv map of extra logging information) was a horrible pain. Getting it to actually jump ship from thread to thread AND deleting it at the correct time was basically impossible.

Has that improved yet?

joshlemer 2 years ago

I faced this issue once. I solved it by creating a wrapping/delegating Executor, which would capture the MDC from the scheduling thread at schedule-time, and then at execute-time, set the MDC for the executing thread, and then clear the MDC after the execution completes. Something like...

    class MyExecutor implements Executor {
        private final Executor delegate;
        public MyExecutor(Executor delegate) {
            this.delegate = delegate;
        }
        @Override
        public void execute(@NotNull Runnable command) {
            var mdc = MDC.getCopyOfContextMap();
            delegate.execute(() -> {
                MDC.setContextMap(mdc);
                try {
                    command.run();
                } finally {
                    MDC.clear();
                }
            });
        }
    }

vbezhenar 2 years ago

What do you mean by hurdle? ThreadLocals work just fine with virtual threads.
- brabel 2 years ago
  
  It's not recommended though.
  See https://openjdk.org/jeps/429
  If you keep ThreadLocal variables, they get inherited by child Threads. If you make many thousands of them, the memory footprint becomes completely unacceptable. If the memory used by ThreadLocal variables is large, it also makes it more expensive to create new Threads (virtual or not), so you lose most advantages of Virtual Threads by doing that.
  - bberrry 2 years ago
    
    I don't think that's correct. ThreadLocals should behave just like on regular OS threads, the difference is that you can suddenly create millions of them.
    You used to be able to depend on OS threads getting reused because you were pooling them. You can do the same with virtual threads if you wish and you will get the same behavior. The difference is we ought to spawn new threads per task now.
    Side note, you have to specifically use InheritableThreadLocal to get the inheritance behavior you speak of.
bberrry 2 years ago

If you are already in a reactive framework, why would you change to virtual threads? Those frameworks pool threads and have their own event loop so I would say they are not suitable for virtual thread migration.
- brabel 2 years ago
  
  Yes, if you're happy with the reactive frameworks there's no reason to migrate. Most people, however, would love to remove their complexities from their code bases. Virtual Threads are much, much easier to program with. There's downsides, like not being able to easily limit concurrency, having to implement your own timeout mechanisms etc. but that will probably be provided by a common lib sooner or later which hopefully provides identical features to reactive frameworks, while being much, much simpler.
  - munksbeer 2 years ago
    
    I've not looked too deeply. We use the eventloop model, and we're guaranteed that data is only mutated by a single unit of work at a time which means you don't need to use any concurrent data types, volatile etc. This is great for micro performance.
    Does the same apply to virtual threads?
    Edit: I think I answered my own question. Java virtual threads have the same memory model as regular java threads so yes, I need to use the same semantics. That rules replacing the eventloop model for us.

davidtos 2 years ago

I did some similar testing a few days ago[1]. Comparing platform threads to virtual threads doing API calls. They mention the right conditions like having high task delays, but it also depends on what the task is. Threads.sleep(1) performs better on virtual threads than platform threads but a rest call taking a few ms performs worse.

[1] https://davidvlijmincx.com/posts/virtual-thread-performance-...

taspeotis 2 years ago

My rough understanding is that this is similar to async/await in .NET?

It’s a shame this article paints a neutral (or even negative) experience with virtual threads.

We rewrote a boring CRUD app that spent 99% of its time waiting the database to respond to be async/await from top-to-bottom. CPU and memory usage went way down on the web server because so many requests could be handled by far fewer threads.

jsiepkes 2 years ago

> My rough understanding is that this is similar to async/await in .NET?
Well somewhat but also not really. They are green threads like async/await, but it's use is more transparent, unlike async/await.
So there are no special "async methods". You just instantiate a "VirtualThread" where you normally instantiate a (kernel) "Thread" and then use it like any other (kernel) thread. This works because for example all blocking IO API will be automatically converted to non-blocking IO underwater.
peteri 2 years ago

It's a different model. Microsoft did work on green threads a while ago and decided against continuing.
Links:
https://github.com/dotnet/runtimelab/issues/2398
https://github.com/dotnet/runtimelab/blob/feature/green-thre...
- pjmlp 2 years ago
  
  It should be pointed out, that the main reason they didn't go further was because of added complexity in .NET, when async/await already exists.
  > Green threads introduce a completely new async programming model. The interaction between green threads and the existing async model is quite complex for .NET developers. For example, invoking async methods from green thread code requires a sync-over-async code pattern that is a very poor choice if the code is executed on a regular thread.
  Also to note that even the current model is complex enough to warrant a FAQ,
  https://devblogs.microsoft.com/dotnet/configureawait-faq
  https://github.com/davidfowl/AspNetCoreDiagnosticScenarios/b...
  - neonsunset 2 years ago
    
    This FAQ is a bit outdated in places, and is not something most users should worry about in practice.
    JVM Green Threads here serve predominantly back-end scenarios, where most of the items on the list are not of concern. This list also exists to address bad habits that carried over from before the tasks were introduced, many years ago.
    In general, the perceived want of green threads is in part caused by misunderstanding of that one bad article about function coloring. And that one bad article about function coloring also does not talk about the way you do async in C#.
    Async/await in C# in back-end is a very easy to work with model with explicit understanding where a method returns an operation that promises to complete in the future or not, and composing tasks[0] for easy (massive) concurrency is significantly more idiomatic than doing so with green threads or completable futures that existed in Java before these. And as evidenced by adoption of green threads by large scale Java projects, turns out the failure modes share similarities except green threads end up violating way more expectations and the code author may not have any indication or explicit mechanism to address this, like using AsyncLocal.
    Also one change to look for is "Runtime Handled Tasks" project in .NET that will replace Roslyn-generated state machine code with runtime-provided suspension mechanism which will only ever suspend at true suspension points where task's execution actually yields asynchronously. So far numbers show at least 5x decrease in overhead, which is massive and will bring performance of computation heavy async paths in line with sync ones:
    https://github.com/dotnet/runtimelab/blob/feature/async2-exp...
    Note that you were trivially able to have millions of scheduled tasks even before that as they are very lightweight.
    [0]: e.g. sending requests in parallel is just this
    using var http = new HttpClient() { BaseAddress = new("https://news.ycombinator.com/news") }; var requests = Enumerable .Range(1, 4) .Select(n => $"?p={n}") .Select(http.GetStringAsync); var pages = await Task.WhenAll(requests);
    
    no_wizard 2 years ago
    
    I take your point about the aforementioned article[0][1] being a popular reference when discussing async / await (and to a lesser extent, async programming in modern languages more generally) I think its popularity is highlighting the fact that it is a pain point for folks.
    Take for instance Go. It is well liked in part, because its so easy to do concurrency with goroutines, and they're easy to reason about, easy to call, easy to write, and for how much heavy weight they're lifting, relatively simple to understand.
    The reason Java is getting alot of kudos here for their implementation of green threads is exactly the same reason people talk about Go being an easy language to use for concurrency: It doesn't gate code behind specialized idioms / syntax / features that are only specific to asynchronous work. Rather, it largely utilizes the same idioms / syntax as synchronous code, and therefore is easier to reason about, adopt, and ultimately I think history is starting to show, to use.
    Java is taking an approach paved by Go, and ultimately I think its the right choice, because having worked extensively with C# and other languages that use async / await, there are simply less footguns for the average developer to hit when you reduce the surface area of having to understand async / sync boundaries.
    [0]: https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...
    [1]: HN discussion: https://news.ycombinator.com/item?id=8984648
    
    neonsunset 2 years ago
    
    Green Threads increase the footgun count as methods which return tasks are rather explicit about their nature. The domain of async/await is well-studied, and enables crucial patterns that, like in my previous example, Green Threads do nothing to improve the UX of in any way. This also applies to Go approach which expects you to use Channels, which have their own plethora of footguns, even for things trivially solved by firing off a couple of tasks and awaiting their result. In Go, you are also expected to use explicit synchronization primitives for trivial concurrent code that require no cognitive effort in C# whatsoever. C# does have channels that work well, but turns out you rarely need them when you can just write simple task-based code instead.
    I'm tired of this, that one article is bad, and incorrect, and promotes straight-up harmful intuition and probably sets the industry in terms of concurrent and asynchronous programming back by 10 years in the same way misinterpreting Donald Knuth's quote did in terms of performance.
    
    kaba0 2 years ago
    
    That’s a very simplistic view. Especially that java does/will provide “structured concurrency” as something analogous to structured control flow, vs gotos.
    Also, nothing prevents you from building your own, more limited but safer (the two always come together!) abstraction on top, but you couldn’t express Loom on async as the primitive.
    
    ffsm8 2 years ago
    
    I don't think that this would be a good showcase for Virtual Threads. The "async" API for Java is CompletableFutures, right? thats been stable for something like 10 years, so no real change since Java 8.
    You'd jsut have to define a ThreadPool with n Threads before, where each request would've blocked one pending thread. Now it just keeps going.
    So your equivalent Java example should've been something like this, but again: the completeable futures api is pretty old at this point.
    @HttpExchange(value = "https://news.ycombinator.com") interface HnClient { @GetExchange("news?p={page}") CompletableFuture<String> getNews(@PathVariable("page") Integer page); } @RequiredArgsConstructor @Service class HnService { private final HnClient hnClient; List<String> getNews() { var requests = IntStream.rangeClosed(1, 4) .boxed().map(hnClient::getNews).toList(); return requests.stream().map(CompletableFuture::join).toList(); } }
    
    vips7L 2 years ago
    
    Structured concurrency is still being developed: https://openjdk.org/jeps/453
    Also, I wouldnt consider that the equivalent Java code. That is all Spring and Lombok magic. Just write the code and just use java.net.HttpClient.
    
    ffsm8 2 years ago
    
    > and just use java.net.HttpClient.
    No.
    
    no_wizard 2 years ago
    
    it might be obvious to others, but why the 'No'?
    
    vips7L 2 years ago
    
    The standard http client doesn’t have as great of UX as other community libs. Most of us (including me) don’t like to use it.
    That being said, imo you can’t call something equivalent when doing a bunch of spring magic. This disregards that OPs logic isn’t equivalent at all. It waits for each future 1 by 1 instead of doing something like CompletableFuture.allOf or in JS: Promise.all.
  - jayd16 2 years ago
    
    It would break a lot of the native interop and UI code devx of the language. Java was never as nice in those categories so it had less to lose going this path.
devjab 2 years ago

> My rough understanding is that this is similar to async/await in .NET?
Not really. What C# does is sort of similar but it has the disadvantages of splitting your code ecosystem into non-blocking/blocking code. This means you can “accidentally” start your non-blocking code. Something which may cause your relatively simple API to consume a ridiculous amount of resources. It also makes it much more complicated to update and maintain your code as it grows over the years. What is perhaps worse is that C# lacks an interruption model.
Java’s approach is much more modern but then it kind of had to be because the JVM already supported structured concurrency from Kotlin. Which means that Java’s “async/await” had to work in a way which wouldn’t break what was already there. Because Java is like that.
I think you can sort of view it as another example of how Java has overtaken C# (for now), but I imagine C# will get an improved async/await model in the next couple of years. Neither approach is something you would actually chose if concurrency is important to what you build and you don’t have a legacy reason to continue to build on Java/C# . This is because Go or Erlang would be the obvious choice, but it’s nice that you at least have the option if your organisation is married to a specific language.
- za3faran 2 years ago
  
  I would not argue that golang is the obvious choice for concurrency. Java's approach is actually superior to golang's. It takes it a step further by offering structured concurrency[1].
  Kotlin's design had no bearing on Java's or the JVM's implementation.
  C# has an interruption model through CancellationToken as far as I'm aware.
  [1] https://openjdk.org/jeps/453
- jayd16 2 years ago
  
  It's foolish to say that green threads are strictly better and ignore async/await as something outdated. It can do a lot that green threads can't.
  For example, you can actually share a thread with another runtime.
  Cooperative threading allows for implicit critical sections that can be cumbersome in preemptive threading.
  Async/await and virtual threads are solving different problems.
  > What is perhaps worse is that C# lacks an interruption model
  Btw, You'd just use OS threads if you really needed pre-emptively scheduled threads. Async tasks run on top of OS threads so you get both co-opertive scheduling within threads and pre-emptive scheduling of threads onto cores.
  - devjab 2 years ago
    
    > It's foolish to say that green threads are strictly better and ignore async/await as something outdated
    I’m not sure I said outdated, but I can see what you mean by how I called Javas approach “more modern”. What I should have called Javas approach was “correctly designed”.
    C#’s async/await isn’t all terrible as you point out, but it’s designed wrong from the bottom up because computation should always be blocking by default. The fact that you can accidentally start running your code asynchronous is just… Aside from trapping developers with simple mistakes, it’s also part of what has lead to the ecosystem irrecoverably being split into two.
    I was actually a little surprised to see Microsoft make their whole .Net to .Net core without addressing some of the glaring issues with it, when that massive disruption process uprooted everything anyway.
    
    jayd16 2 years ago
    
    What do you think about the Structured Concurrency library Java is working with things like fork() and join()? Is that incorrectly designed? Why do you think there's a call for that if virtual threads serves every use case?
- troupo 2 years ago
  
  Erlang, not Go, should be the obvious choice for concurrency, but it's impossible to retrofit Erlang's concurrency onto existing systems.
  - toast0 2 years ago
    
    As an Erlang person, from reading about Java's Virtual Threads, it feels like it should get a significant portion of the Erlang concurrency story.
    With virtual threads, it seems like if you don't hit gotchas, you can spawn a thead, and run straight through blocking code and not worry about too many threads, etc. So you could do thread per connection/user chat servers and http servers and what not.
    Yes, it's still shared memory, so you can miss out on the simplifying effect of explicit communication instead of shared memory communication and how that makes it easy to work with remote and local communication partners. But you can build a mailbox system if you want (it's not going to be as nice as built in one, of course). I'm not sure if Java virtual threads can kill each other effectively, either.
    
    troupo 2 years ago
    
    Erlang's concurrency story isn't green threads.
    It's (with caveats, of course):
    - a thread crashing will not bring the system down
    - a thread cannot hog all processing time as the system ensures all threads get to run. The entire system is re-entrant and execution of each thread can be suspended to let other threads continue
    - all CPU cores can and will be utilized transparently to the user
    - you can monitor a thread and if it crashes you're guaranteed to receive info on why and how it crashed
    - immutable data structures play a huge part of it, of course, but the above is probably more important
    That's why Go's concurrency is not that good, actually. Goroutines are not even half-way there: an error in a goroutine can panic-kill your entire program, there are no good ways to monitor them etc.
    
    kbolino 2 years ago
    
    Neither an error nor a recovered-from panic will cause a Go program to crash; only an unrecovered panic does that.
    The bigger problem with Go in this regard is how easy it is to cause a panic thanks to nil.
    
    troupo 2 years ago
    
    In Erlang even a nil will not lead to an unrecovered panic (if it happens in the process aka green thread).
    Go made half a step in the right direction with goroutines, but never committed fully
    
    kbolino 2 years ago
    
    Each has its tradeoffs. I had a case that cropped up more than once where RabbitMQ kept on trucking even though the process for an important queue had crashed; had it propagated all the way to the server itself it may have been easier to diagnose and fix (I'm assuming there's something like defer or finally in Erlang to ensure the mnesia database was synced properly on exit). Instead, I had to monitor for this condition and periodically run some command-line trickery to fix it (without ever really knowing why it happened). This was years ago, maybe RabbitMQ handles that better now.
    The Go authors are adamant that goroutines not be addressable (from without) or identifiable (from within). This is diametrically opposed to Erlang, where processes are meant to be addressed/identified. I can't say I've ever found a case where a problem couldn't be solved due to this constraint in Go, but it does complicate some things.
  - morsch 2 years ago
    
    Isn't that Akka?
    
    troupo 2 years ago
    
    Akka is heavily inspired by Erlang, but the underlying system/VM has to provide certain guarantees for actual Erlang-style concurrency to work: https://news.ycombinator.com/item?id=40989995
- szundi 2 years ago
  
  Maybe C# is going to have a new asynv await model but the fragmentation of libs and codes cannot be undone probably.
  Java has the power that they make relatively more decisions about the language and the libs that they don’t have to fix later. That’s a great value if you’re not building throw-away software but SaaS or something that has to live long.
- kaba0 2 years ago
  
  > This is because Go or Erlang would be the obvious choice
  Why go? It has a quite anemic standard library for concurrent data structures, compared to java and is a less expressive , and arguably worse language on any count, verbosity included.
- delusional 2 years ago
  
  From what I recall, and this is a while ago so bare with me, Java Virtual Threads still have a lot of pitfalls where the promise of concurrency isn't really fulfilled.
  I seem to remember that is was some pretty basic operations (like maybe read or something) that caused the thread not to unmount, and therefore just block the underlying os thread. At that point you've just invented the world's most complicated thread pool.
  - mike_hearn 2 years ago
    
    Reading from sockets definitely works. It'd be pretty useless if it didn't.
    Some operations that don't cause a task switch to another virtual thread are:
    - If you've called into a native library and back into Java that then blocks. In practice this never happens because Java code doesn't rely on native libraries or frameworks that much and when it does happen it's nearly always in-and-out quickly without callbacks. This can't be fixed by the JVM, however.
    - File IO. No fundamental problem here, it can be fixed, it's just that not so many programs need tens of thousands of threads doing async file IO.
    - If you're holding a lock using 'synchronized'. No fundamental problem here, it's just annoying because of how HotSpot is implemented. They're fixing this at the moment.
    In practice it's mostly the last one that causes issues in real apps. It's not hard to work around, and eventually those workarounds won't be needed anymore.
  - za3faran 2 years ago
    
    You're referring to thread pinning, and this is being addressed.
kimi 2 years ago

It's more like Erlang threads - they appear to be blocking, so existing code will work with zero changes. But you can create a gazillion of them.
he0001 2 years ago

> My rough understanding is that this is similar to async/await in .NET?
The biggest difference is that C# async/await code is rewritten by the compiler to be able to be async. This means that you see artifacts in the stack that weren’t there when you wrote the code.
There are no rewrites with virtual threads and the code is presented on the stack just as you write it.
They solve the same problem but in very different ways.
- pansa2 2 years ago
  
  > They solve the same problem but in very different ways.
  Yes. Async/await is stackless, which leads to the “coloured functions” problem (because it can only suspend function calls one-by-one). Threads are stackful (the whole stack can be suspended at once), which avoids the issue.
- jayd16 2 years ago
  
  There is overlap but they really don't solve the same problem. Cooperative threading has its own advantages and patterns that won't be served by virtual threads.
  - he0001 2 years ago
    
    What patterns does async/await solve which virtual threads don’t?
    
    jayd16 2 years ago
    
    If you need to be explicit about thread contexts because you're using a thread that's bound to some other runtime (say, a GL Context) or you simply want to use a single thread for synchronization like is common in UI programming with a Main/UI Thread, async/await does quite well. The async/await sugar ends up being a better devx than thread locking and implicit threading just doesn't cut it.
    In Java they're working on a structured concurrency library to bridge this gap, but IMO, it'll end up looking like async/await with all its ups and downs but with less sugar.
    
    he0001 2 years ago
    
    What’s stopping you from using a single thread for synchronization?
    
    jayd16 2 years ago
    
    You can use virtual threads running on a single OS thread and that will work but then everything will be on that one thread. You'll have synchronization but you'll also always be blocking on that one thread as well.
    Async/await is able to achieve good UX around explicitly defining what goes on your Main thread and what goes elsewhere. Its trivial to mix UI thread and background thread code by bouncing between synchronization contexts as needed.
    When the threading model is implicit its impossible to have this control.
    
    neonsunset 2 years ago
    
    "Green Threads" as implemented in Java is a solution that solves only a single problem - blocking/multiplexing.
    It does not enable easy concurrency and task/future composition the way C#/JS/Rust do, which offer strictly better and more comprehensive model.
    
    za3faran 2 years ago
    
    Structured concurrency[1] offers task composition and more.
    [1] https://openjdk.org/jeps/453
    
    he0001 2 years ago
    
    What do you mean? It implements the Future/Task interface and you can definitely use that. In fact you can’t tell the difference from a virtual thread vs a platform one, and it’s available everywhere. I for one thinks it’s much easier to use than the async/await pattern as I don’t need any special syntax to use it.
fulafel 2 years ago

Can you expand on how the benefit in your rewrite came about? Threads don't consume CPU when they're waiting for the DB, after all. And threads share memory with each other.
(I guess scaling to ridiculous levels you could be approaching trouble if you have O(100k) outstanding DB queries per application server, hope you have a DB that can handle millions of oustanding DB queries then!)
- segfaltnh 2 years ago
  
  In large numbers the cost of switching between threads does consume CPU while they're waiting for the database. This is why green threads exist, to have large numbers of in flight work executing over a smaller number of OS threads.
  - fulafel 2 years ago
    
    When using OS threads, there's no switching when they are waiting for a socket (db connection). The OS knows to wake the thread up only when there's something new to see on the connection.
    
    kbolino 2 years ago
    
    Both sides of a sleep/awake transition with conventional blocking system calls involve heavyweight context switches: the CPU protection level changes and the thread registers get saved out or loaded back in.
    
    fulafel 2 years ago
    
    Yes, but these don't happen while waiting.
xxs 2 years ago

>My rough understanding is that this is similar to async/await in .NET?
No, the I/O is still blocking with respect to the application code.

tzahifadida 2 years ago

Similarly the power of golang concurrent programming is that you write non-blocking code as you write normal code. You don't have to wrap it in functions and pollute the code but moreover, not every coder on the planet knows how to handle blocking code properly and that is the main advantage. Most programming languages can do anything the other languages can do. The problem is that not all coders can make use of it. This is why I see languages like golang as an advantage.

jillesvangurp 2 years ago

Kotlin embraced the same thing via co-routines, which are conceptually similar to go routines. It adds a few useful concepts around this though; mainly that of a co-routine context which encapsulates that a tree of co-routine calls needs some notion of failure handling and cancellation. Additionally, co-routines are dispatched to a dispatcher. A dispatcher can be just on the same thread or actually use a thread pool. Or as of recent Java versions a virtual thread pool. There's actually very little point in using virtual threads in Kotlin. They are basically a slightly more heavy weight way of doing co-routines. The main benefit is dealing with legacy blocking Java libraries.
But the bottom line with virtual threads, go-routines, or kotlin's co-routines is that it indeed allows for imperative code style code that is easy to read and understand. Of course you still need to understand all the pitfalls of concurrency bugs and all the weird and wonderful way things can fail to work as you expect. And while Java's virtual threads are designed to work like magic pixie dust, it does have some nasty failure modes where a single virtual thread can end up blocking all your virtual threads. Having a lot of synchronized blocks in legacy code could cause that.
- tzahifadida 2 years ago
  
  Kotlin is not a language I learned so I will avoid commenting.
  However, the use of JAVA for me is for admin backend or heavy weight services for enterprises or startups I coded for, so for my taste I can't use it without spring or jboss, etc.. , and in that way I think simplicity went out the window a long long time ago :) It took me years to learn all the quirks of these frameworks... and the worse thing about it is that they keep changing every few months...
  - jillesvangurp 2 years ago
    
    Kotlin makes a lot of that stuff easier to deal with and there is also a growing number of things that work without Java libraries. Or even the JVM. I use it with Spring Boot. But we also have a lot of kotlin-js code running in a browser. And I use quite a few multiplatform libraries for Kotlin that work pretty much anywhere. I've even written a few myself. It's pretty easy to write portable code in Kotlin these days.
    For example ktor works on the JVM but you can also build native applications with it. And I use ktor client in the browser. When running in the browser it uses the browser fetch API. When running on the jvm you can configure it to use any of a wide range of Java http clients. On native it uses curl.
juyjf_3 2 years ago

Can we stop pretending Erlang does not exist?
Go is a next-gen trumpian language that rejects sum types, pattern matching, non-nil pointers, and for years, generics; it's unhinged.
- seabrookmx 2 years ago
  
  While I generally agree with your take that it's a regression in PL design, there's no need to be inflammatory. There's lots of good software written in it.
  > pretending Erlang does not exist
  For better or worse it doesn't to most programmers. The syntax is not nearly as approachable as GoLang. Luckily Elixir exists.

Settings

Java Virtual Threads: A Case Study

Keyboard Shortcuts