Distribution: clustering the BEAM

How the BEAM stretches its actor model across machines: named nodes, transparent message passing, global and pg for cluster-wide naming, and the trade-offs of a fully-meshed, trust-everything network.

ErlangElixir

The BEAM's actor model has a property most concurrency runtimes can only envy: it does not stop at the edge of one machine. The same Pid ! Message that delivers to a process on the next core can deliver to a process on a server in another rack, another datacentre, another continent. You spawn, you send, you receive - and whether the peer is in your own VM or across the network is, syntactically, none of your business. This is Distributed Erlang, and it has been baked into the runtime since the early 1990s, long before "distributed system" was a phrase you put on a CV.

This article walks the machinery: what a node is, how nodes find and connect to each other, what location transparency really means (and where it leaks), the cluster-wide naming facilities global and pg, and the honest trade-offs of a model that assumes the whole cluster is one trusted, fully-connected family. Distribution is fundamentally an Erlang/OTP runtime feature, so the spotlight is on Erlang and Elixir; Gleam, LFE, and Luerl all inherit the same wire protocol and we note where each fits.

Nodes: a named, running BEAM

A node is a single running BEAM instance with a name. Start erl or iex with a name and it becomes alive - able to talk to other nodes:

# Two terminals on the same host. Each VM gets a name and a shared cookie.
erl -sname alice -setcookie clustersecret
erl -sname bob   -setcookie clustersecret

Names come in two flavours. A short name (-sname alice) is alice@hostname and only works among nodes on the same DNS short-name domain. A long name (-name alice@10.0.0.7 or alice@host.example.com) carries a fully-qualified host and is what you use across real networks. The two kinds cannot connect to each other - a long-named node and a short-named node will refuse to cluster, which is a classic first-day gotcha.

Inside the VM the node's name is just an atom. You can ask who you are and who you can see:

node().               %% => alice@myhost      (my own name)
nodes().              %% => [bob@myhost]      (other connected nodes)
is_alive().           %% => true             (am I distributed at all?)

Elixir exposes the identical runtime through the Node module:

Node.self()           # => :alice@myhost
Node.list()           # => [:bob@myhost]
Node.alive?()         # => true

A node that was started without a name is nonode@nohost and is not alive; it cannot participate in distribution at all until renamed (via net_kernel:start/1). For Elixir releases this is why your rel/env.sh or vm.args sets -name/-sname - distribution is opt-in, switched on by giving the VM an identity.

Before two nodes will speak, they must present the same magic cookie - an atom that acts as a cluster-wide shared password. If the cookies differ, the connection is refused and you get a {:error, :not_allowed}-shaped failure. It is set with -setcookie on the command line, read from ~/.erlang.cookie if not given, or changed at runtime:

erlang:set_cookie(node(), clustersecret).
erlang:get_cookie().                       %% => clustersecret

Be clear about what the cookie is and is not. It is a coarse "are you one of us?" check performed once during the connection handshake. It is not encryption, not per-message authentication, and not a defence against anyone who can read the cookie file or sniff the (by default plaintext) handshake. A node that knows the cookie can do anything on any connected node - there is no permission model below the cluster boundary. We return to the security consequences below; for now, treat the cookie as the front door key to the entire cluster.

EPMD and how nodes find each other

When a named node starts, it registers with the Erlang Port Mapper Daemon (EPMD) - a tiny service that listens on TCP port 4369 and maps node names to the port numbers their distribution listeners chose. EPMD is the cluster's phone book: it is launched automatically by the first node on a host (unless you pass -start_epmd false).

Connecting is a two-step dance. To reach bob@myhost, your node asks EPMD on myhost:4369 "what port is bob listening on?", then opens a direct TCP connection to that port and performs the cookie handshake. After that, EPMD is out of the loop entirely - the two nodes talk peer-to-peer.

You rarely connect explicitly, because any interaction with a remote node triggers a connection automatically. But you can force it:

net_kernel:connect_node('bob@myhost').   %% => true
net_adm:ping('bob@myhost').              %% => pong  (or pang if unreachable)

Node.connect(:"bob@myhost")              # => true
Node.ping(:"bob@myhost")                 # => :pong

A pong means the connection is up and the cookies matched; a pang means it failed. Note net_adm:ping/1 is also the laziest possible health check - it is what most "is the cluster healthy?" scripts are quietly built on.

Location-transparent message passing

Here is the payoff, and the reason distribution feels like magic the first time. A Pid is globally unique across the cluster - it encodes the node it lives on. So sending to a remote process is exactly the same code as sending to a local one:

%% Pid may live on this node or any connected node; the code is identical.
Pid ! {hello, self()}.

To reach a named process on another node, you send to a {RegisteredName, Node} tuple. This is the canonical pattern, because you usually do not have a remote PID in hand - you have a name and a node:

%% Send to the locally-registered process `logger` on node bob@myhost.
{logger, 'bob@myhost'} ! {log, "user signed in"}.

Elixir spells the same thing with send/2, which accepts the very same {name, node} form:

# Fire a message at the `:logger` process running on bob@myhost.
send({:logger, :"bob@myhost"}, {:log, "user signed in"})

Spawning is transparent too - spawn/4 and friends take a node argument, so you can start a process on another machine and get back a PID you treat like any other:

%% Run worker:loop/0 over on bob, get a normal Pid back.
RemotePid = spawn('bob@myhost', worker, loop, []),
RemotePid ! {task, 42}.

remote_pid = Node.spawn(:"bob@myhost", fn -> Worker.loop() end)
send(remote_pid, {:task, 42})

This is location transparency: the semantics of spawn, send, receive, links, and monitors are the same whether the peer is local or remote. The same property extends all the way up the stack - GenServer.call({:cache, :"bob@myhost"}, :get) performs a synchronous call to a GenServer on another node with no special "remote" API. OTP was designed for this from the start.

Where transparency leaks (location transparency, not location invisibility)

"Transparent" does not mean "free" or "identical in failure behaviour," and pretending otherwise is how distributed BEAM systems get into trouble. The honest caveats:

Latency and cost. A local send is a heap copy; a remote send serialises the term, pushes it across TCP, and deserialises it on the far side. Sending a 100 MB binary to a process one core away is cheap (binaries are reference-counted); sending it to another node copies all 100 MB over the wire.
send is still fire-and-forget - and still "succeeds." A remote ! returns immediately and gives no indication the peer (or its node) is gone. A synchronous gen_server:call to a dead remote node is what actually surfaces the failure, via the monitor it sets up internally.
Ordering is per-pair only. Message order is guaranteed between a given sender and a given receiver, exactly as locally. There is no global ordering across the cluster, and a node going down and coming back can reorder relative to a new connection.
Partitions are real. TCP between two nodes can drop while both stay alive - the dreaded netsplit. Each side sees the other as down (nodedown), and when the link heals you can have two halves that diverged. The runtime hands you the events; reconciling the state is your problem.

The rule of thumb: distribution makes the plumbing invisible so you can spend your design budget on the failure modes, which never become invisible.

Monitoring nodes and remote processes

Because failure is the interesting part, the runtime lets you subscribe to it. monitor_node/2 (or Node.monitor/2, or :net_kernel.monitor_nodes/1) turns a node going up or down into a message in your mailbox:

net_kernel:monitor_nodes(true),
receive
    {nodeup,   Node} -> io:format("~p joined~n", [Node]);
    {nodedown, Node} -> io:format("~p left~n", [Node])
end.

And the same monitors and links from the single-node world work across nodes: monitor(process, {Name, Node}) gives you a 'DOWN' message if that remote process or its whole node disappears. This is how a supervisor on one machine can react to the death of a worker on another.

Cluster-wide naming, part 1: `global`

Local name registration (register/2, or a GenServer's name: option) only registers a name on one node. The moment you have a cluster, you want names that mean the same process everywhere - "the one true order sequencer," wherever it happens to be running. That is the global module.

global maintains a replicated name table on every node, kept consistent by a distributed locking protocol, so any node can translate a global name to a PID locally and instantly. Registration is atomic and cluster-wide:

%% On whichever node owns the sequencer:
yes = global:register_name(order_seq, self()).

%% From ANY node in the cluster:
Pid  = global:whereis_name(order_seq),   %% the one global PID, or undefined
ok2  = global:send(order_seq, {next, self()}).  %% send by global name

Crucially, register_name returns yes or no - no if the name is already taken - so it doubles as a cluster-wide mutex for "elect exactly one of these." Elixir reaches global either directly via :global or, more idiomatically, through the :global via tuple, which plugs straight into GenServer:

# Register a GenServer under a cluster-wide global name.
GenServer.start_link(Sequencer, :ok, name: {:global, :order_seq})

# Call it from any node without knowing which node it lives on.
GenServer.call({:global, :order_seq}, :next)

When a netsplit heals and global discovers the same name was registered on both sides, it must pick a winner. By default it resolves the clash by killing one of the two (its predefined random_exit_name resolver), but you can supply your own resolve function. This is the moment to remember that global favours availability and convenience over strong consistency: during a partition both halves can believe they own the name, and reconciliation is lossy by design.

The cost of `global`

global is wonderfully convenient and does not scale infinitely. Every registration takes a cluster-wide lock and replicates to every node, so a workload that registers and unregisters names rapidly, or a cluster of hundreds of nodes, will feel the chatter. global is the right tool for a small number of singleton services (one sequencer, one cluster coordinator), not for naming a million short-lived sessions.

Cluster-wide naming, part 2: `pg` (process groups)

The complement to "one named process" is "a group of processes, possibly many per node, that I want to address together." That is pg - distributed named process groups, introduced in OTP 23 as a faster, more scalable replacement for the now-removed pg2. (pg2 was deprecated in OTP 23 and removed in OTP 24; reach for pg.)

A process joins a group by name; any node can list the group's members and fan a message out to all of them. This is the natural substrate for pub/sub, presence, and "broadcast to every chat-room subscriber across the cluster":

%% Start the default pg scope once per node (often done by your app's supervisor):
pg:start_link().

%% A subscriber joins the "room:42" group:
pg:join(room42, self()).

%% A publisher fans a message out to every member, cluster-wide:
[ Pid ! {msg, "hi"} || Pid <- pg:get_members(room42) ].

:pg.start_link()
:pg.join(:room42, self())

for pid <- :pg.get_members(:room42), do: send(pid, {:msg, "hi"})

Two design points matter. First, pg uses scopes: every group lives inside a scope (an atom, defaulting to pg), letting you run independent, isolated group namespaces - Phoenix.PubSub and the Elixir Registry-adjacent ecosystem lean on exactly this. Second - and this is the deliberate trade-off versus global - pg is eventually consistent and lock-free. Joins propagate as asynchronous broadcasts; there is no cluster-wide lock per operation, so pg scales to large, churning memberships where global would choke. The price is that get_members may briefly lag a join or a node failure. For pub/sub and presence that is exactly the right bargain; for "elect one leader" it is not, which is why the two modules coexist.

A member that dies is removed from its groups automatically - pg monitors every member - so you never broadcast to a dead PID lingering in a list.

Connection topology: the fully-connected mesh

By default, Distributed Erlang forms a fully-connected mesh (a clique): the instant node A connects to node B, and B already knows C, A transitively connects to C as well. Every node ends up with a direct TCP link to every other node. This is what makes global and cluster-wide PID routing simple - everyone can reach everyone directly.

The flip side is scaling math. A fully-connected cluster of N nodes has N·(N−1)/2 connections, each carrying periodic heartbeat (tick) traffic to detect dead peers. This is comfortable into the dozens of nodes and gets strained in the low hundreds; classic Erlang distribution was never meant for thousand-node clusters. Two escape hatches exist:

-connect_all false (or the kernel connect_all setting) disables transitive auto-connection, so you can build a hidden or hub-and-spoke topology by hand and avoid the mesh entirely.
Hidden nodes (-hidden) connect to specific peers without joining the global mesh or appearing in everyone's nodes() - the standard trick for tooling, observers, and bridges that should not become full cluster members.

For genuinely large clusters, teams reach above raw distribution: libcluster for automatic node discovery (DNS, Kubernetes, gossip) that forms the cluster, and libraries like Partisan that replace the default mesh with alternative overlays. But the default - small, fully-meshed, everyone-trusts-everyone - is the model you should understand first, because it is what global, pg, and OTP assume.

Security: the cluster is one trust domain

This is the trade-off that most surprises newcomers, and it deserves to be stated plainly. Distributed Erlang was designed for a trusted network of cooperating nodes - originally machines inside a telecom rack. The security model reflects that origin:

A correct cookie grants total control. Any node that connects can spawn processes, read ETS tables, call any function, and even rpc:call(Node, os, cmd, ["rm -rf /"]) on every peer. There is no sandbox between connected nodes. The cluster is a single trust domain.
The default handshake and traffic are plaintext. Cookies cross the wire (in older versions, weakly obscured), and messages are unencrypted unless you enable distribution over TLS.
EPMD leaks topology. Anyone who can reach port 4369 can enumerate every node name on a host without authenticating.

The mitigations are real but must be chosen deliberately:

%% Run distribution over TLS instead of plain TCP (in vm.args / sys.config):
%%   -proto_dist inet_tls
%%   -ssl_dist_optfile /etc/erlang/ssl_dist.conf
%% This authenticates and encrypts node-to-node traffic with certificates,
%% replacing "the cookie is the only check" with real mutual TLS.

The non-negotiable rule: never expose the distribution ports (the dynamic dist port and EPMD's 4369) to an untrusted network. Put the cluster on a private subnet or VPC, restrict by firewall to known peers, and - for anything crossing a trust boundary - turn on TLS distribution. Treat the cookie as a convenience for an already-isolated network, not as a security boundary.

The other three languages

Distribution is a property of the BEAM and OTP, so every language on the VM gets it - but with different ergonomics.

LFE is Erlang with Lisp syntax; it calls the distribution modules natively. Everything above is available verbatim, just parenthesised:

;; Connect to a node and globally register this process, in LFE.
(net_kernel:connect_node 'bob@myhost)   ; => true
(global:register_name 'order-seq (self))
(global:whereis_name 'order-seq)        ; the one global pid

Gleam is statically typed, which collides with distribution's deeply dynamic nature. gleam_erlang provides a typed node API for the cluster mechanics - node.self(), node.visible() (the connected nodes), and node.connect(atom) returning a Result(Node, ConnectError):

import gleam/erlang/atom
import gleam/erlang/node

pub fn join_cluster() -> Result(node.Node, node.ConnectError) {
  // Returns Error(FailedToConnect) or Error(LocalNodeIsNotAlive) honestly,
  // instead of Erlang's bare `true`/`false`.
  node.connect(atom.create("bob@myhost"))
}

But the typed part stops at the node boundary. A Gleam Subject is an opaque, process-local handle (a PID plus a unique tag) - it cannot be serialised and shipped to another node. So cross-node Gleam falls back to sending to named processes, where the receiver gets a Dynamic value it must decode by hand, surrendering the very type safety Gleam exists for. The community answer is libraries that wrap :global/pg with explicit binary codecs so mismatches are caught at compile time before a term ever leaves the process. Distributed Gleam works; it is simply where Gleam's static guarantees and the BEAM's dynamic wire format have to negotiate.

Luerl sits below distribution entirely. It is a Lua interpreter running inside a single BEAM process, with no node identity, no PID, and no mailbox of its own - so Lua code cannot send a cross-node message any more than it can spawn a local process. Distribution is the host's job: an Erlang or Elixir node embeds the Luerl interpreter, and when that node clusters, the host can ship work to a remote node, run a Lua chunk there, and return the result. The parallelism and the distribution live in the BEAM around Luerl; Lua supplies only the per-task logic.

-- Plain Lua in a Luerl interpreter. No nodes, no PIDs, no sends:
-- distribution happens in the Erlang/Elixir host that drives this chunk.
local function score(events)
  local total = 0
  for _, e in ipairs(events) do total = total + e.weight end
  return total          -- handed back to the host, which may live on any node
end

Putting it together

Distributed Erlang takes the actor model that already worked across cores and stretches it across machines with startling economy: name your VMs, share a cookie, and send/spawn/monitor keep working with their meaning intact. global gives you cluster-wide singletons with strong-ish, lock-based naming; pg gives you scalable, eventually-consistent process groups for pub/sub and presence. The defaults - a fully-connected mesh on a trusted network - are the assumptions OTP is built on, and they are exactly the assumptions you must revisit at scale (mesh size) and at the edge (security).

The deal the BEAM offers is honest. It makes the plumbing of distribution disappear so completely that your three-node chat server and your three-machine one read almost identically. What it refuses to hide - latency, partitions, the single trust domain, the cost of cluster-wide consensus - are the things no runtime can make disappear, only manage. Learn where transparency ends, and you have learned how to build BEAM systems that span machines without lying to yourself about the network underneath.