Virtual Threads in Java 21: Handling Millions of Concurrent Requests with Low Resource Usage – ITFROMZERO

Table of Contents

The Real Problem: When the Thread Pool Gets Choked During Peak Hours

Late last year, I was brought in to optimize a Java Spring Boot service receiving around 50,000 requests per second. The symptoms were familiar: CPU wasn’t spiking, RAM wasn’t full, but latency was skyrocketing and Tomcat kept throwing “No threads available”. Adding more threads to the pool caused the heap to groan — each platform thread defaults to consuming ~512KB–1MB of stack space.

The core issue wasn’t a lack of CPU. The thread-per-request model simply doesn’t scale for I/O-bound workloads — 80% of the time, threads just sit waiting for the database to return results or for a response from a downstream API, doing absolutely nothing.

Comparing 3 Approaches to Concurrency in Java

1. Platform Threads — Simple but Doesn’t Scale

This model has been around since Java 1.0. Each request gets its own OS thread, code runs sequentially, it’s easy to read, and debugging is straightforward. But OS threads are expensive — slow to create and destroy, context switches have overhead, and the count is hard-limited by the kernel.

// Traditional thread pool
ExecutorService executor = Executors.newFixedThreadPool(200);

Future<String> future = executor.submit(() -> {
    String dbResult = queryDatabase();     // blocks ~50ms
    String apiResult = callExternalApi();  // blocks ~100ms
    return dbResult + apiResult;
});

With 200 threads and an average latency of 150ms, maximum throughput is only around 1,300 req/s. Scaling up to 2,000 threads costs ~2GB of RAM just for the stack — before even counting the application’s heap.

2. Reactive Programming (WebFlux / Project Reactor)

Reactive Programming emerged around 2018 as the answer to this scalability problem. Using non-blocking I/O with an event loop, far fewer threads can handle far more requests.

// Reactive with Spring WebFlux
@GetMapping("/data")
public Mono<ResponseData> getData(@RequestParam String id) {
    return webClient.get()
        .uri("/external/{id}", id)
        .retrieve()
        .bodyToMono(ExternalData.class)
        .flatMap(ext -> repository.findByKey(ext.getKey()))
        .map(entity -> new ResponseData(entity));
}

High throughput, efficient memory — but the trade-off is real. I once spent 2 days debugging a bug in a reactive chain that would have taken 15 minutes with synchronous code. Reactive stack traces are full of lambda$0 and onNext — you have no idea where the error originated. More importantly: the entire codebase must be reactive from top to bottom — one blocking call stalls the entire event loop.

3. Virtual Threads (Project Loom) — GA Since Java 21

Virtual Threads are lightweight threads managed by the JVM, not the OS. You can create millions of them — their stack is small, stored on the heap, with no memory allocated at the OS level.

Here’s how it works: when a virtual thread is blocked by I/O, the JVM automatically unmounts it from the OS thread so that OS thread can serve another virtual thread. Once the I/O completes, the JVM mounts the virtual thread back and resumes execution. With platform threads, this unmount/mount step doesn’t exist — a blocked thread means a blocked OS thread.

// Virtual Thread — code is still written in regular blocking style
try (ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor()) {
    Future<String> future = executor.submit(() -> {
        String dbResult = queryDatabase();     // JVM handles this automatically, OS thread is not blocked
        String apiResult = callExternalApi();  // same as above
        return dbResult + apiResult;
    });
    System.out.println(future.get());
}

The code looks identical to platform threads. But the JVM handles all the scheduling underneath — no callbacks, no chains, no reactive mindset required.

Pros and Cons Analysis: Which Approach Should You Choose?

Here’s my summary from real-world experience:

Platform Threads: Best suited for low concurrency (<500 simultaneous requests) or CPU-bound workloads (computation, image processing). Avoid when I/O-heavy.
Reactive: Best when the team is comfortable with reactive mindset, needs extreme throughput, and is building greenfield applications. Migrating a legacy codebase to reactive is essentially a full rewrite.
Virtual Threads: Best fit for I/O-bound services (REST APIs, microservices calling DB/downstream), especially when you want to increase throughput without rewriting code. Migration from platform threads is nearly painless.

Simple rule of thumb: if your service spends most of its time waiting (DB queries, HTTP calls, file I/O) — Virtual Threads are the most practical answer you can apply right now.

Practical Guide to Implementing Virtual Threads

Step 1: Ensure Java 21+

java -version
# Requires: openjdk 21.0.x or higher

# If using SDKMAN
sdk install java 21.0.3-tem
sdk use java 21.0.3-tem

Step 2: Standalone — No Framework Required

import java.util.concurrent.Executors;
import java.util.concurrent.ExecutorService;

public class VirtualThreadDemo {
    public static void main(String[] args) throws Exception {
        // Create 100,000 virtual threads — try this with platform threads and watch it OOM
        try (ExecutorService vte = Executors.newVirtualThreadPerTaskExecutor()) {
            for (int i = 0; i < 100_000; i++) {
                final int taskId = i;
                vte.submit(() -> {
                    Thread.sleep(1000); // simulate I/O wait
                    System.out.println("Task " + taskId + " done on: " + Thread.currentThread());
                    return null;
                });
            }
        } // auto-shutdown and await termination
    }
}

Step 3: Integration with Spring Boot 3.2+

Spring Boot 3.2 supports virtual threads natively with just one line of config:

# application.yml
spring:
  threads:
    virtual:
      enabled: true

Or declare the bean manually if you need more control:

@Configuration
public class ThreadConfig {

    @Bean
    public TomcatProtocolHandlerCustomizer<?> virtualThreadTomcatCustomizer() {
        return protocolHandler -> protocolHandler
            .setExecutor(Executors.newVirtualThreadPerTaskExecutor());
    }

    // For @Async tasks
    @Bean
    public AsyncTaskExecutor applicationTaskExecutor() {
        return new TaskExecutorAdapter(
            Executors.newVirtualThreadPerTaskExecutor());
    }
}

For Spring Boot < 3.2 or without Spring, set the executor directly on the server:

// With HttpServer (JDK built-in)
HttpServer server = HttpServer.create(new InetSocketAddress(8080), 0);
server.setExecutor(Executors.newVirtualThreadPerTaskExecutor());
server.start();

Step 4: Pinning — The Trap to Avoid

A virtual thread becomes pinned (unable to unmount from its carrier thread) in two cases: inside a synchronized block, and when calling a native method. Many developers overlook this and then wonder why performance hasn’t improved even after enabling virtual threads.

// BAD — synchronized pins the virtual thread, negating all the benefits
public synchronized String getFromCache(String key) {
    return cache.get(key); // if blocked here, the carrier thread is also blocked
}

// GOOD — use ReentrantLock instead of synchronized
private final ReentrantLock lock = new ReentrantLock();

public String getFromCache(String key) {
    lock.lock();
    try {
        return cache.get(key);
    } finally {
        lock.unlock();
    }
}

Detect pinning with the JVM flag:

java -Djdk.tracePinnedThreads=full -jar your-app.jar

This will print a full stack trace every time a virtual thread is pinned — use it to identify and fix the exact location.

Step 5: Thread-local Variables — Use with Caution

Millions of virtual threads mean millions of potential ThreadLocal instances. Large data stored in ThreadLocal can easily become a memory leak at scale. Consider ScopedValue (Java 21 preview) for new use cases:

// ScopedValue — a virtual thread-friendly replacement for ThreadLocal
static final ScopedValue<User> CURRENT_USER = ScopedValue.newInstance();

// Set value for the scope
ScopedValue.where(CURRENT_USER, user).run(() -> {
    processRequest(); // CURRENT_USER.get() returns the user within this scope
});

Benchmark Results and Real-World Numbers

After migrating the service from a 400-thread platform thread pool to virtual threads, here are the measurements from production (load tested with k6):

Throughput: from ~2,600 req/s up to ~47,000 req/s (workload was primarily DB queries + downstream HTTP)
P99 latency: dropped from 3.2s to 180ms at the same load level
Heap usage: down ~40% with no more stack memory for 400 platform threads
Migration effort: <2 hours, with no changes to business logic

During debugging and config validation before deploying, I frequently use toolcraft.app to quickly test JSON config snippets or format API responses — specifically toolcraft.app/en/tools/developer/json-formatter. It’s far more convenient than installing an extension, especially when you’re SSH’d into a server and need to quickly check a response format.

Virtual Threads are not a replacement for CPU-bound tasks like encryption or image resizing — the bottleneck there is CPU, not I/O. But for the vast majority of typical web services (CRUD, API gateways, microservices calling a DB), this is probably the simplest change with the biggest impact you can make when upgrading to Java 21.