Playing with Loom

Published March 04, 2022.

Written by Maarten Mulders , reading time roughly 8 minutes (1536 words).

Posted in java

If there’s one topic that has kept the Java community excited over the last years, it’s Project Loom. We all know it’s coming someday, but when? What will it look like? And how will it change the way we write concurrent code? In this blog, I’ll try to play a bit with what Loom currently looks like.

So before we continue, here’s a little disclaimer: what I am about to share here is the current state of affairs, as delivered in OpenJDK 19-loom+4-115. You can download this yourself through the Project Loom Early-Access Builds page.

About Project Loom

The goal of Project Loom is two-fold: delivering Java Virtual Machine (JVM) features and Java APIs on top of them. Both the JVM features and the APIs would enable “lightweight concurrency” for the Java platform. This may sound familiar to people who already know Kotlins “co-routines” or Gos “goroutines”. What they all share is their promise to make it easier to write code that makes the best use of available hardware. How exactly? In this intro, I will skip the JVM features and show you some of the APIs. Also, I will focus on “Virtual Threads” (JEP 8277131), and not pay attention to the the “Structured Concurrency” (JEP 8277129) part. Both originated from Project Loom, but they focus on different parts of the goals for Project Loom.

But before we do that, let’s dive into the “why”. Threads have been around in Java since forever - why should we care about “lightweight concurrency” today?

`Threads`, `Runnables` and scheduling

In the conventional approach, you would create a new Thread, pass it a Runnable and start it. The Runnable is the code that you want to run in parallel with your “main” code. The Thread is a one-to-one representation of the underlying concept that runs your code: an operating-system (OS) thread. These threads have considerable cost in terms of creation time and memory allocation.

But more importantly, given a Thread represents an OS thread, the CPU scheduler of your operating system gets to decide when your Runnable will run. More importantly, it will also decide for how long it can run. When the scheduler decides another thread can use the CPU, it will perform a context switch. This means storing the state of the thread, so that it can be restored and resume execution at a later point in time.

The disadvantage of this approach is that the CPU scheduler does not know when would be a good time to postpone execution of your code. It may decide to do so when your code is actually blocking on some network call. That would be a great decision, as the code isn’t using the CPU at that moment. But the scheduler could also decide to do the context switch during an expensive computation, when your code is actually utilising the CPU to the max. That wouldn’t be great: the work-in-progress isn’t completed, your other code can’t use it yet, but the work is paused until further notice.

Meet Fibers: Virtual Threads

Let’s have a quick look at the new concept of Fibers or virtual threads.

In contrast to the conventional approach, a virtual thread does not map one-to-one to an OS thread. This means many virtual threads share one or more traditional thread(s) that will run their code.

Rather than relying on the operating system to decide when your code gets to run, and when it will be paused, the JVM will do the context switching. It does so by inspecting the code you run. When it encounters certain method calls, the JVM will release the underlying OS thread to do other work. As an application developer, you should typically not need to care about when exactly the JVM will pause your virtual thread. But if you’re curious, the Blocking Operations page of the OpenJDK wiki lists them all.

As an example, we see that if you invoke connect, read or write on a java.net.Socket, this tells the JVM that your code will not be using the CPU for the next couple of cycles. In other words, “this may be a good moment to do some other work”. But rather than pausing the underlying OS thread, the JVM will decide to pick up work from a different virtual thread. This will keep the OS thread as busy as possible.

A virtual thread is a thread that gets scheduled by the Java Virtual Machine rather than the operating system.

Show me code!

The conventional approach would look like this:

package it.mulders.loom.playground;

public class ThreadedApp {
    class PrintThreadNameJob implements Runnable {
        @Override
        public void run() {
            var name = Thread.currentThread().getName();
            System.out.printf("This code runs in thread %s%n", name);
        }
    }

    public static void main(final String... args) throws InterruptedException {
        System.out.printf("Starting conventional threading app...%n");
        Thread thread = new Thread(new PrintThreadNameJob());
        thread.start();
        thread.join();
    }
}

In project Loom, a new API makes it more explicit what type of thread you are creating. The new Thread may leave you wondering what type of thread you’re creating; but Thread.ofPlatform() leaves no doubt. Of course, in contrast, there is also Thread.ofVirtual() which create a virtual thread.

Using it looks like this:

public static void main(final String... args) {
    System.out.printf("Starting virtual threading app...%n");
    Thread thread = Thread.ofVirtual();
    thread.start(new PrintThreadNameJob());
    thread.join();
}

It’s striking to see the similarity in code!

But does it work?

All right, the code may not be too complicated, but how do the two approaches behave at runtime?

In order to run both programs, we must start the JVM with the --enable-preview switch. Running both programs shows the following output:

WARNING: Using incubator modules: jdk.incubator.concurrent
Starting conventional threading app...
  ... took 1 milliseconds

WARNING: Using incubator modules: jdk.incubator.concurrent
Starting virtual threading app with Project Loom...
  ... took 10 milliseconds

So yes, it works. Nothing to be excited about. Maybe a bit of disappointment: the code that uses Loom seems a bit slower than the traditional approach.

Numbers, numbers, numbers

Let’s see how those number change when we start increasing the number of jobs.

Number of jobs	Convential threads	Virtual threads
1	1 ms	1 ms
100	13 ms	16 ms
100.000	5.522 ms	214 ms
1.000.000	57.512 ms	803 ms

That’s already a very interesting development. The reason for it becomes quite obvious when we measure how much (conventional) threads our little application starts. We can measure this using the Java Flight Recorder (JFR). To enable the JFR, add the following to the JVM invocation:

-XX:+FlightRecorder -XX:StartFlightRecording=duration=200s,filename=conventional.jfr

This creates a conventional.jfr (or virtual.jfr) file with a lot of diagnostic data. We can visually inspect the diagnostic with VisualVM, or script-wise with jfr. When we use jfr, it is important to take the executable from the same Java distribution that we use to run the program.

For instance, we can convert a Flight Recorder file to JSON using jfr print --json conventional.jfr. By adding --events '<event type>', we can filter the events that we are interested in. Combining this, jfr print --events 'jdk.ThreadStart' --json conventional.jfr, shows output like this:

{
  "recording": {
    "events": [
      {
        "type": "jdk.ThreadStart",
        "values": {
          "startTime": "2022-03-02T20:52:33.823949676+01:00",
          "eventThread": {
            "osName": "JFR Recording Scheduler",
            "osThreadId": 1107880,
            "javaName": "JFR Recording Scheduler",
            "javaThreadId": 27,
            "group": {
              "parent": {
                "parent": null,
                "name": "system"
              },
              "name": "main"
            },
            "isVirtual": false
          },
          "stackTrace": null,
          "thread": {
            "osName": "JFR Recording Scheduler",
            "osThreadId": 1107880,
            "javaName": "JFR Recording Scheduler",
            "javaThreadId": 27,
            "group": {
              "parent": {
                "parent": null,
                "name": "system"
              },
              "name": "main"
            },
            "isVirtual": false
          },
          "parentThread": {
            "osName": "Permissionless thread",
            "osThreadId": 1107878,
            "javaName": "Permissionless thread",
            "javaThreadId": 25,
            "group": {
              "parent": {
                "parent": null,
                "name": "system"
              },
              "name": "main"
            },
            "isVirtual": false
          }
        }
      },

We then use the excellent jq tool to process the roughly 10 megabyte of JSON output and find the number of ‘jdk.ThreadStart’ events the JVM has emitted:

$ jfr print --events 'jdk.ThreadStart' --json conventional.jfr | jq '.recording.events | length'
1006

The order of magnitude of this number should not come as a surprise, as our application starts 1.000 threads. Running the same analysis for our virtual threads application:

$ jfr print --events 'jdk.ThreadStart' --json virtual.jfr | jq '.recording.events | length'
11

Now this number may be a surprise - unless you’ve payed attention a few minutes ago. Rather than starting operating system threads for every new job, the JVM delegates the workload to a small set of “carrier” threads. This keeps the number of operating system threads extremely low, and allows to run utilise the hardware a lot more efficiently than relying on the CPU scheduling by the operating system.

For good measure, let’s run the same experiments as above, this time counting how many operating system threads the JVM creates:

Number of jobs	Convential threads	Virtual threads
1	7	7
100	106	9
100.000	100.006	17
1.000.000	1.000.006	17

Wrapping up

This was my first encounter with Project Loom, and I have to say, I’m pretty excited about it. Of course, the numbers are “just numbers”, but I think there’s a huge potential for writing multi-threaded code without the usual hassle of worrying about hardware utilisation. If you’re curious, too, I encourage you to check out the sample code which is available on GitHub. Feel free to play around, experiment, and let me know what you found!