Skip to content

The Bayesian workflow in sbi

In this tutorial, we will perform a full Bayesian workflow with sbi using the Lotka-Volterra simulator.

Here is a code-snippet that you will learn to understand:

from sbi.neural_nets import posterior_nn
from sbi.inference import NPE, DirectPosterior
from sbi.diagnostics import SBC

num_simulations = 1000
theta = prior.sample((num_simulations,))
x = simulate(theta)

# Train neural network.
inference = NPE(prior)
posterior_net = inference.append_simulations(theta, x).train()
posterior = inference.build_posterior()

# Diagnostics with SBC.
thetas = prior.sample((num_sbc_samples,))
xs = simulator(thetas)
ranks, dap_samples = run_sbc(
    thetas, xs, posterior, num_posterior_samples=1_000,
)

Let’s get started!

Defining the simulator and prior

In this example, we will use the Lotka-Volterra simulator. This simulator models the population of prey and predator, given four parameters:

  • \(\alpha\): Prey birth rate: The rate at which prey reproduce in the absence of predators
  • \(\beta\): Predation rate: The rate at which predators consume prey, reducing the prey population
  • \(\delta\): Predator reproduction rate: The rate at which predators reproduce based on prey consumption
  • \(\gamma\): Predator death rate: The rate at which predators die in the absence of prey
import numpy as np

_ = np.random.seed(0)

def lotka_volterra(y, alpha, beta, delta, gamma):
    prey, predator = y
    dprey_dt = alpha * prey - beta * prey * predator
    dpredator_dt = delta * prey * predator - gamma * predator
    return np.asarray([dprey_dt, dpredator_dt])

def simulate(parameters):
    alpha = parameters[0]
    beta = parameters[1]
    delta = parameters[2]
    gamma = parameters[3]

    y0 = np.asarray([40.0, 9.0])  # Initial populations
    t_span = 200  # Total simulation time
    dt = 0.1  # Time step

    timesteps = int(t_span / dt)
    y = np.zeros((timesteps, 2))
    y[0] = y0

    for i in range(1, timesteps):
        y[i] = y[i-1] + lotka_volterra(y[i-1], alpha, beta, delta, gamma) * dt

    return y

Let’s inspect this simulator a set of example parameters:

import matplotlib.pyplot as plt

time_vec = np.arange(0, 200, 0.1)
observation = simulate(np.asarray([0.1, 0.02, 0.01, 0.1]))
fig, ax = plt.subplots(1, 1, figsize=(6, 3))
_ = ax.plot(time_vec, observation)
_ = ax.legend(["Prey", "Predator"])
_ = ax.set_xlabel("Time")
_ = ax.set_ylabel("Population")

png

Typically, we will not be able to observe populations exactly, but only with noise. In addition, in this example, we will aim to reproduce summary statistics of these simulations. Let’s add noise and define these statistics:

def summarize_simulation(simulation_result):
    observation_noise = np.reshape(
        np.random.randn(2000 * 2),
        (2000, 2)
    )
    noisy_sim = simulation_result + observation_noise

    prey_population = noisy_sim[:, 0]
    predator_population = noisy_sim[:, 1]
    summary = [
        np.max(prey_population).item(),
        np.max(predator_population).item(),
        np.mean(prey_population).item(),
        np.mean(predator_population).item()
    ]
    return np.asarray(summary)

sbi can also perform inference given the raw time series with an embedding network. This is described in this how-to guide.

Let’s define a prior over the four parameters. In this case, we assume a uniform prior:

import torch

_ = torch.manual_seed(42)
from sbi.utils import BoxUniform

lower_bound = torch.as_tensor([0.05, 0.01, 0.005, 0.005])
upper_bound = torch.as_tensor([0.15, 0.03, 0.03, 0.15])
prior = BoxUniform(low=lower_bound, high=upper_bound)

Generating the training dataset

Using the prior and the simulator, we can generate simulated data. We first generate prior samples:

theta = prior.sample((10_000,))

As before, you can run simulations offline (e.g. on a cluster). For this example, we parallelize simulations via joblib. We define a small helper to run simulations in parallel:

from joblib import Parallel, delayed


def parallel_simulate(theta):
    # Our simulator uses numpy, but prior samples are in PyTorch.
    theta_np = theta.numpy()

    num_workers = 8
    simulation_outputs = Parallel(n_jobs=num_workers)(
        delayed(simulate)(batch)
        for batch in theta_np
    )
    return np.asarray(simulation_outputs)

And we can run the simulations:

# This takes a few seconds.
simulation_outputs = parallel_simulate(theta)
x = torch.as_tensor(np.asarray([summarize_simulation(sim) for sim in simulation_outputs]), dtype=torch.float32)

print("theta.shape", theta.shape)
print("x.shape", x.shape)
theta.shape torch.Size([10000, 4])
x.shape torch.Size([10000, 4])

Having generated this dataset, we recommend that you briefly check whether the simulations roughly cover the observation. Let’s first compute the summary statistics of the observation:

x_obs = summarize_simulation(observation)
print(x_obs.shape)
(4,)

And then we can visualize simulation outputs and the observation:

from sbi.analysis import pairplot

_ = pairplot(
    samples=x,
    points=x_obs[None, :],  # `points` needs a batch dimension.
    limits=[[40, 70], [0, 200], [0, 20], [0, 50]],
    figsize=(4, 4),
)
WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.

png

Notice two things: - First, the simulations (blue histogram) roughly cover the observation (orange line). This is a good sign. If the observation is far outside of the simulations, sbi will likely fail. Change your prior or simulator. - Second, there are no extreme outliers in the simulation data. Remember that we will train a neural network on simulated data. If there are extreme outliers, this might lead to poor neural network convergence. For NPE, outliers can be safely ignored (i.e. removed from the dataset).

Setting up the neural network

We now want to set up a neural network for this example. For this example, we will use Neural Posterior Estimation (NPE) with a neural spline flow (nsf) as density estimator. See here for the collection of neural networks and here for a guide on which methods to choose.

from sbi.inference import NPE

inference = NPE(density_estimator="nsf")

We then train the network on our simulations:

posterior_net = inference.append_simulations(theta, x).train()
 Neural network successfully converged after 251 epochs.

We can check the convergence of the training loop by inspecting its loss curve:

from sbi.analysis import plot_summary

_ = plot_summary(
    inference,
    tags=["training_loss", "validation_loss"],
    figsize=(10, 2),
)
# All training logs are available in `trainer.summary`.
For an interactive, detailed view of the summary, launch tensorboard  with 'tensorboard --logdir=/Users/michaeldeistler/Documents/phd/sbi/docs/tutorials/sbi-logs/NPE_C/2025-03-27T08_04_09.315358' from a terminal on your machine, visit http://127.0.0.1:6006 afterwards. Requires port forwarding if tensorboard runs on a remote machine, as e.g. https://stackoverflow.com/a/42445070/7770835 explains.

Valid tags are: ['best_validation_loss', 'epoch_durations_sec', 'epochs_trained', 'training_loss', 'validation_loss'].

png

In this case, the validation loss seems to still be going down, so it might be worth setting a higher value for .train(stop_after_epochs=30) (default value is 20). This parameter sets how many epochs must pass for without the validation loss reaching a new minimum. For this example, we will not retrain.

After training, we can build the posterior:

posterior = inference.build_posterior()
print(posterior)
Posterior p(θ|x) of type DirectPosterior. It samples the posterior network and rejects samples that
            lie outside of the prior bounds.

For NPE, the posterior and the posterior_net are almost identical. The posterior_estimator merely includes convenience functionality. For example, it automatically rejects samples outside of the prior support, or it can compute the maximum-a-posteriori estimate with .map()

Inferring the posterior

Let’s aim to infer the posterior given our observation:

print("Observation: ", x_obs)
Observation:  [50.2497606  24.65478002  9.52736287  6.05178971]

We can sample from the NPE-posterior:

samples = posterior.sample((1_000,), x=x_obs)
Drawing 1000 posterior samples for 1 observations:   0%|          | 0/1000 [00:00<?, ?it/s]


/Users/michaeldeistler/anaconda3/envs/sbi/lib/python3.12/site-packages/nflows/transforms/lu.py:80: UserWarning: torch.triangular_solve is deprecated in favor of torch.linalg.solve_triangularand will be removed in a future PyTorch release.
torch.linalg.solve_triangular has its arguments reversed and does not return a copy of one of the inputs.
X = torch.triangular_solve(B, A).solution
should be replaced with
X = torch.linalg.solve_triangular(A, B). (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/BatchLinearAlgebra.cpp:2196.)
  outputs, _ = torch.triangular_solve(

Again, you can infer the posterior distribution for any observation \(x_{obs}\) without having to run new simulations and without having to re-train (amortization).

Visualizing the posterior

Next, we visualize these posterior samples with the pairplot function:

from sbi.analysis import pairplot

_ = pairplot(
    samples,
    limits=[[0.05, 0.2], [0.01, 0.05], [0.005, 0.05], [0.005, 0.2]],
    ticks=[[0.05, 0.2], [0.01, 0.05], [0.005, 0.05], [0.005, 0.2]],
    figsize=(5, 5),
    labels=[r"$\alpha$", r"$\beta$", r"$\delta$", r"$\gamma$"]
)

png

Diagnosing potential issues in the posterior

How do we know that the posterior is correct? The sbi toolbox implements a wide range of methods that diagnose potential issues (more detail in this how-to guide). In this tutorial, we will perform posterior predictive checks (PPC) and we will perform simulation-based calibration (SBC).

Posterior predictive checks

Let’s first perform posterior predictive checks. For these, we simulate posterior samples and compare them to the observation.

posterior_samples = posterior.sample((10,), x=x_obs)
posterior_predictives = parallel_simulate(posterior_samples)

posterior_predictive_summary_stats = torch.as_tensor(
    np.asarray([summarize_simulation(sim) for sim in simulation_outputs]),
    dtype=torch.float32
)
Drawing 10 posterior samples for 1 observations:   0%|          | 0/10 [00:00<?, ?it/s]
fig, ax = plt.subplots(1, 1, figsize=(6, 3))
_ = ax.plot(time_vec, observation[:, 0], c="b")
_ = ax.plot(time_vec, observation[:, 1], c="orange")
for i in range(10):
    _ = ax.plot(time_vec, posterior_predictives[i, :, 0], c="b", alpha=0.2)
    _ = ax.plot(time_vec, posterior_predictives[i, :, 1], c="orange", alpha=0.2)
_ = ax.legend(["Prey", "Predator"])
_ = ax.set_xlabel("Time")
_ = ax.set_ylabel("Population")

png

As you can see, the posterior predictives (light blue and light orange) look similar to the observation (blue and orange). They do not look exactly the same, but this is expected: we only used the maximum and mean of the two traces as summary statistic. If you want a closer fit, define more summary statistics, or use the full time series with an embedding network.

Simulation-based calibration

Posterior-predictive checks are an ad-hoc heuristic for assessing posterior quality. Simulation-based calibration (SBC) provides a quantitative assessment of posterior quality. It allows you to check, for every parameter, whether the parameter is estimated as (on average) to low, to high, and whether it has too low or two high uncertainty. To run SBC, you first have to generate more simulations based on prior samples:

num_sbc_samples = 200  # choose a number of sbc runs, should be ~100s

prior_samples = prior.sample((num_sbc_samples,))

prior_predictives = parallel_simulate(prior_samples)
prior_predictive_summary_stats = torch.as_tensor(
    np.asarray([summarize_simulation(sim) for sim in prior_predictives]),
    dtype=torch.float32
)

SBC is implemented in sbi for your use on any sbi posterior. To run it, we only need to call run_sbc with appropriate parameters.

from sbi.diagnostics import run_sbc

# run SBC: for each inference we draw 1000 posterior samples.
num_posterior_samples = 1_000
ranks, dap_samples = run_sbc(
    prior_samples,
    prior_predictive_summary_stats,
    posterior,
    reduce_fns=lambda theta, x: -posterior.log_prob(theta, x),
    num_posterior_samples=num_posterior_samples,
    use_batched_sampling=False,  # `True` can give speed-ups, but can cause memory issues.
)
Sampling 200 times (1000,) posterior samples.: 100%|████████████████████| 200/200 [00:01<00:00, 122.30it/s]



Calculating ranks for 200 sbc samples.:   0%|          | 0/200 [00:00<?, ?it/s]

For amortized neural posteriors (like in this tutorial), execution of sbc is expected to be fast. For posteriors that conduct inference with MCMC and hence are slow, run_sbc exposes the use of multiple internal parallel workers to the user. To use this feature, add num_workers = 2 to the parameters for use of two workers. See the API documentation for details.

If the posterior approximation is faithful, the ranks should be uniformly distributed. We can evaluate this with by plotting a histogram of the resulting ranks:

from sbi.analysis.plot import sbc_rank_plot

fig, ax = sbc_rank_plot(
    ranks,
    num_posterior_samples,
    plot_type="cdf",
    num_bins=20,
    figsize=(5, 3),
)

png

The gray are marks the region which is still sufficiently uniform (meaning that we cannot reject the null hypothesis that the samples are uniform). In this case, all ranks (in red) are within this gray band. This is a good sign, but it is only a necessary—but not a sufficient—condition for the posterior to be correct. To this end, sbi implements a wide range of diagnostic methods, which are described here.

Next steps

Next, we recommend that you check our how-to guide for brief tutorials on specific features, our advanced tutorials for detailed explanations, or our API reference, which provides a complete list of all features in the sbi toolbox.