BEAM Processes vs Threads and async

Why a BEAM process beats OS threads, green threads, and async/await: preemptive scheduling, per-process heaps, and millions of cheap actors with no function coloring.

ErlangElixir

Almost every mainstream runtime offers you a choice between two unhappy concurrency models. You can have OS threads - heavyweight, preemptive, and so expensive that a few thousand will exhaust a machine - or you can have async/await - cheap and scalable, but cooperative, infectious, and prone to one bad function stalling everything.

The BEAM, the virtual machine behind Erlang, Elixir, Gleam, LFE, and Luerl, refuses the trade. Its processes are as cheap as the lightest green threads and as fair as preemptive OS threads, while sharing nothing and needing no await keyword anywhere. You get both halves of the deal at once. This article puts the BEAM's model side by side with OS threads, green threads, and async/await to show exactly why - and where the costs really land.

A quick taxonomy of concurrency models

Before the comparison, four reference points:

OS threads (Java's Thread, C's pthread, Go's underlying M:N machinery, Python's threading). The kernel schedules them preemptively across cores. Each carries a large stack (often 1 MB committed/reserved by default) and a kernel control block; creating one is a syscall. They share the process's address space, so you coordinate with locks, mutexes, and atomics - and inherit data races and deadlocks.
Green threads / user threads (Go goroutines, Java 21+ virtual threads, Haskell). Scheduled by a runtime in user space, many-to-few onto OS threads. Far cheaper than OS threads. Most still share memory, so the locking problems remain even though the threads are light.
async/await (JavaScript, Python asyncio, Rust, C#). A single (or small pool of) OS thread runs an event loop; await is an explicit yield point where a coroutine suspends so the loop can run another. Cooperative, not preemptive - control only changes hands at an await.
BEAM processes (this article). Lightweight, share-nothing actors scheduled preemptively by the VM, communicating only by copying immutable messages. Millions per node.

The BEAM sits in a category by itself: it has the cost profile of green threads, the fairness of OS threads, and the isolation of separate OS processes.

Cheap like green threads: millions of processes

An OS thread is a budget item. A typical default thread stack reserves around 1 MB of address space, and creating one is a trip into the kernel; spin up tens of thousands and you are measuring memory in gigabytes and creation in milliseconds. This is the wall behind the old "C10K problem": serving ten thousand simultaneous connections was hard precisely because thread-per-connection did not fit.

A BEAM process starts at roughly 233 words of heap - a few hundred bytes - including its stack, deliberately kept tiny so that systems can hold millions of them. Spawning is a cheap VM operation measured in microseconds, not a syscall. The default ceiling on recent OTP is 1,048,576 (2^20) simultaneous processes, but that is just a default: the +P startup flag tunes it anywhere in the range 1,024 to 134,217,727 (2^27 − 1). The heap grows by garbage collection only as a process actually needs it, so idle actors stay cheap.

This is not theoretical headroom. WhatsApp famously drove 2-3 million concurrent TCP connections on a single node - one lightweight process per connection - at a time when ten thousand was considered a hard problem. Thread-per-connection cannot reach those numbers; process-per-connection is the normal way to write BEAM code.

%% Spawn one hundred thousand processes that each just wait for a message.
%% On the BEAM this is unremarkable; with OS threads it would be impossible.
-module(swarm).
-export([run/1]).

run(N) ->
    Pids = [spawn(fun idle/0) || _ <- lists:seq(1, N)],
    length(Pids).      %% swarm:run(100000) => 100000

idle() ->
    receive
        stop -> ok
    end.

The same swarm in Elixir:

defmodule Swarm do
  # Start N processes, each blocked on receive, and return the count.
  def run(n) do
    1..n
    |> Enum.map(fn _ -> spawn(fn -> idle() end) end)
    |> length()
  end

  defp idle do
    receive do
      :stop -> :ok
    end
  end
end

# Swarm.run(1_000_000) returns 1_000_000 on a default machine.

Nothing here is exotic. A million blocked processes consume a few hundred megabytes and zero CPU, because a process waiting in receive is not scheduled at all.

Fair like OS threads: preemptive scheduling

Here is where the BEAM diverges sharply from every cooperative model. Green-thread runtimes and async event loops are usually cooperative: a task keeps the CPU until it voluntarily yields (an await, a channel op, an allocation safepoint). One CPU-bound task with no yield point can monopolise its thread and starve every other task on it - the classic event-loop freeze when someone runs a tight loop in JavaScript or forgets an await asyncio.sleep.

The BEAM is preemptive, and it achieves this without OS thread switches by counting reductions. A reduction is roughly one function call or BIF (built-in) invocation. Each process is handed a budget - 4,000 reductions by default in recent OTP - and when it spends that budget the scheduler suspends it mid-flight and runs the next runnable process. There is no yield point to forget, because the VM inserts the checkpoint for you on every function call.

The consequence is profound: a single runaway process cannot freeze the system. This loop never yields voluntarily, yet it is interrupted thousands of times and every other process keeps making progress:

%% A deliberately greedy, never-yielding loop. On a cooperative runtime
%% this would starve everything sharing its thread. On the BEAM the
%% scheduler preempts it every ~4000 reductions, so the rest of the
%% system stays responsive.
spin(0) -> done;
spin(N) -> spin(N - 1).

%% spin(1_000_000_000) pins one core but never blocks other processes.

The BEAM runs one scheduler thread per CPU core by default, each with its own run queue, and it migrates processes between schedulers to balance load - so the preemption is also genuinely parallel across cores. That is something a single-threaded async event loop cannot do at all: asyncio and Node's loop give you concurrency on one core, and reaching for multiple cores means spawning whole worker processes and serialising messages between them.

What about blocking and CPU-heavy work?

Preemption by reduction-counting works because Erlang code calls functions often. Two cases could still hurt fairness, and the BEAM handles both:

Long-running native code (NIFs) does not increment reductions, so a slow C function could hog a scheduler. The runtime provides dirty CPU schedulers for exactly this - CPU-bound NIFs run on a separate pool so they never stall the main schedulers.
Blocking I/O is handled by dirty I/O schedulers and by the VM's async I/O threads, so a process waiting on a file or socket does not tie up a scheduler core.

In every case the principle holds: blocking and number-crunching are pushed off the schedulers that keep ordinary processes fair.

Isolated like OS processes: no shared memory, no locks

OS threads and most green threads share one heap. That sharing is the source of their hardest bugs: forget a lock and you get a data race; take two locks in the wrong order and you deadlock; read a field mid-update and you see a torn value.

BEAM processes share nothing. Each has its own heap, so garbage collection is per-process - collecting one actor never stops the world - and when a process dies its heap is reclaimed wholesale. Processes communicate only by sending immutable messages, which are copied from sender heap to receiver heap (large binaries are the exception: they live in a shared, reference-counted off-heap area and pass by reference). Because everything is immutable, the copy is safe and there is simply nothing to lock.

Compare a shared counter. In a threaded language you need a mutex or an atomic:

# Thread-style pseudocode (NOT how you'd write Elixir): a shared counter
# guarded by a lock. Forget the lock and increments are lost to a race.
#
#   lock.acquire()
#   counter = counter + 1
#   lock.release()

On the BEAM the counter is just a process that owns its state. Concurrent "mutation" is safe because only one process ever touches that state, and it does so one message at a time:

defmodule Counter do
  # The state lives inside one process; no lock, no shared memory.
  def start, do: spawn(fn -> loop(0) end)

  defp loop(count) do
    receive do
      :inc ->
        loop(count + 1)

      {:get, from} ->
        send(from, {:count, count})
        loop(count)
    end
  end
end

# c = Counter.start()
# send(c, :inc); send(c, :inc)
# send(c, {:get, self()})
# receive do {:count, n} -> n end   #=> 2

A thousand processes can hammer this counter with :inc messages concurrently and not a single increment is lost - the mailbox serialises them, and there was never any shared memory to corrupt. That loop/1 carrying state through a receive is, in miniature, exactly what an OTP GenServer is.

No async/await, and no function coloring

The most quietly painful tax of async/await is function coloring: async functions can only be awaited from other async functions, so "asyncness" infects everything up the call stack. You end up with two parallel universes of functions, duplicated libraries (sync and async variants), and the rule that you cannot call an async function from a synchronous one. The keyword is also a leaky reminder that I/O is special - you must remember to await or you get a dangling promise instead of a value.

On the BEAM there is no async/await, because every process is already "async." A process that does I/O simply blocks itself in receive or a call; the scheduler runs other processes meanwhile. Blocking code and non-blocking code look identical, every function is the same "color," and a slow call only ever suspends the one process making it - never the loop, never a sibling.

Here a process makes what looks like an ordinary blocking call, but the rest of the system keeps running because only the caller waits:

# Looks synchronous; behaves async for the whole system. This process
# blocks on the reply, but every other process keeps being scheduled.
def fetch_sum(server) do
  ref = make_ref()
  send(server, {:sum, [1, 2, 3], self(), ref})

  receive do
    {^ref, result} -> result          # blocks ONLY this process
  after
    5_000 -> {:error, :timeout}        # no dangling promise; a real value or a timeout
  end
end

That make_ref()/match-on-ref pattern is the heart of an OTP call: a synchronous-looking request/reply that never colors your code. Idiomatic Elixir hides it entirely behind GenServer.call/2 and Task.await/1, and idiomatic Erlang behind gen_server:call/2 - friendly faces over plain message passing, not a separate execution model.

The same model across five languages

Because concurrency lives in the VM, every BEAM language inherits the identical model - preemptive, share-nothing, lock-free - and only the surface syntax changes.

Erlang exposes the primitives raw: spawn, the ! send operator, and receive.

Pid = spawn(fun() -> receive {ping, From} -> From ! pong end end),
Pid ! {ping, self()},
receive pong -> ok end.

LFE (Lisp Flavoured Erlang) is the same runtime as S-expressions - same spawn, same !, same receive:

(let ((pid (spawn (lambda ()
                    (receive
                      ((tuple 'ping from) (! from 'pong)))))))
  (! pid (tuple 'ping (self)))
  (receive ('pong 'ok)))

Gleam keeps the lightweight, preemptive processes but makes the mailbox statically typed. A Subject(a) is a typed handle to a process that can carry only a-typed messages, and process.receive returns a Result with a mandatory timeout instead of blocking forever - so the type system closes the "unexpected message" hole without changing the runtime model at all.

import gleam/erlang/process

pub fn main() {
  let inbox = process.new_subject()        // typed: carries Int here
  process.spawn(fn() { process.send(inbox, 42) })

  case process.receive(inbox, within: 1000) {
    Ok(n) -> n          // 42
    Error(_) -> 0       // timed out
  }
}

Luerl is the outlier: it runs Lua 5.x on the BEAM, but plain Lua has no processes, no spawn, and no mailbox of its own. Inside one Luerl interpreter, code is sequential. To get the preemptive, isolated concurrency described above, the host Erlang/Elixir program spawns real BEAM processes and runs a Lua chunk inside each:

-- Inside a single Luerl interpreter this is just sequential Lua;
-- there is no spawn and no preemption here.
local function work(x) return x * x end
print(work(7))   --> 49

-- Real concurrency comes from the HOST: the embedding BEAM app spawns
-- processes and runs chunks like this one inside each via luerl:do/2,
-- then collects results over normal BEAM message passing. The
-- parallelism lives in Erlang; Lua only supplies the per-actor logic.

So four of the five languages give you the BEAM model directly; Luerl borrows it from whatever BEAM program embeds it.

Side by side

	OS threads	Green threads	async/await	BEAM processes
Scheduled by	kernel	runtime	event loop	the VM
Scheduling	preemptive	usually cooperative	cooperative	preemptive (reductions)
Cost to create	high (syscall, ~1 MB stack)	low	very low	very low (~233 words)
Practical count	thousands	millions	many	millions
Memory	shared heap	usually shared	shared	share-nothing, per-process heap
Coordination	locks / mutexes	locks / channels	the loop	immutable message passing
Uses all cores	yes	yes	no (one loop)	yes (scheduler per core)
Function coloring	no	no	yes	no
A runaway task can starve peers	no	yes	yes	no
GC pauses	global	global	global	per-process

The pattern is clear. Each rival model wins some rows and loses others; the BEAM is the only column with the favourable answer in every row that matters - cheap to create, preemptively fair, multi-core, isolated, and free of function coloring.

What it costs

This is engineering, not magic, and the trade-offs are real and honest:

Message passing copies data. Sending a large term duplicates it on the receiver's heap (binaries excepted). You reason about this cost explicitly rather than sharing a pointer - the price of share-nothing isolation.
No back-pressure by default. send is asynchronous and always "succeeds" locally; a slow consumer can let its mailbox grow unbounded. Bounded queues and flow control (e.g. GenStage, gen_server:call round-trips) are something you add deliberately.
Raw throughput per process is not the goal - the BEAM trades a little single-threaded speed for fairness, isolation, and soft-real-time latency. For pure number-crunching a tight C loop wins; for a million concurrent, independent, fault-tolerant activities, nothing else comes close.

The takeaway

OS threads give you preemption but are too heavy to use by the million. Green threads are cheap but usually share memory and often yield only cooperatively. async/await is cheap and non-blocking but cooperative, single-core, and colors your whole codebase. The BEAM process is the rare design that is cheap and preemptive and isolated and multi-core and colorless - which is why "a process per connection," "a process per user," even "a process per cell in a spreadsheet" are ordinary BEAM advice rather than performance suicide. Learn spawn, send, and receive, and you are programming a runtime where concurrency is the default, not a library you bolt on.