Skip to content

API Reference

Inference

Runs simulation-based inference and returns the posterior.

This function provides a simple interface to run sbi. Inference is run for a single round and hence the returned posterior \(p(\theta|x)\) can be sampled and evaluated for any \(x\) (i.e. it is amortized).

The scope of this function is limited to the most essential features of sbi. For more flexibility (e.g. multi-round inference, different density estimators) please use the flexible interface described here: https://www.mackelab.org/sbi/tutorial/02_flexible_interface/

Parameters:

Name Type Description Default
simulator Callable

A function that takes parameters \(\theta\) and maps them to simulations, or observations, x, \(\mathrm{sim}(\theta)\to x\). Any regular Python callable (i.e. function or class with __call__ method) can be used.

required
prior Distribution

A probability distribution that expresses prior knowledge about the parameters, e.g. which ranges are meaningful for them. Any object with .log_prob()and .sample() (for example, a PyTorch distribution) can be used.

required
method str

What inference method to use. Either of SNPE, SNLE or SNRE.

required
num_simulations int

Number of simulation calls. More simulations means a longer runtime, but a better posterior estimate.

required
num_workers int

Number of parallel workers to use for simulations.

1

Returns: Posterior over parameters conditional on observations (amortized).

Source code in sbi/inference/base.py
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
def infer(
    simulator: Callable,
    prior: Distribution,
    method: str,
    num_simulations: int,
    num_workers: int = 1,
) -> NeuralPosterior:
    r"""Runs simulation-based inference and returns the posterior.

    This function provides a simple interface to run sbi. Inference is run for a single
    round and hence the returned posterior $p(\theta|x)$ can be sampled and evaluated
    for any $x$ (i.e. it is amortized).

    The scope of this function is limited to the most essential features of sbi. For
    more flexibility (e.g. multi-round inference, different density estimators) please
    use the flexible interface described here:
    https://www.mackelab.org/sbi/tutorial/02_flexible_interface/

    Args:
        simulator: A function that takes parameters $\theta$ and maps them to
            simulations, or observations, `x`, $\mathrm{sim}(\theta)\to x$. Any
            regular Python callable (i.e. function or class with `__call__` method)
            can be used.
        prior: A probability distribution that expresses prior knowledge about the
            parameters, e.g. which ranges are meaningful for them. Any
            object with `.log_prob()`and `.sample()` (for example, a PyTorch
            distribution) can be used.
        method: What inference method to use. Either of SNPE, SNLE or SNRE.
        num_simulations: Number of simulation calls. More simulations means a longer
            runtime, but a better posterior estimate.
        num_workers: Number of parallel workers to use for simulations.

    Returns: Posterior over parameters conditional on observations (amortized).
    """

    try:
        method_fun: Callable = getattr(sbi.inference, method.upper())
    except AttributeError:
        raise NameError(
            "Method not available. `method` must be one of 'SNPE', 'SNLE', 'SNRE'."
        )

    simulator, prior = prepare_for_sbi(simulator, prior)

    inference = method_fun(prior=prior)
    theta, x = simulate_for_sbi(
        simulator=simulator,
        proposal=prior,
        num_simulations=num_simulations,
        num_workers=num_workers,
    )
    _ = inference.append_simulations(theta, x).train()
    posterior = inference.build_posterior()

    return posterior

Prepare simulator and prior for usage in sbi.

NOTE: This is a wrapper around process_prior and process_simulator which can be used in isolation as well.

Attempts to meet the following requirements by reshaping and type-casting:

  • the simulator function receives as input and returns a Tensor.
  • the simulator can simulate batches of parameters and return batches of data.
  • the prior does not produce batches and samples and evaluates to Tensor.
  • the output shape is a torch.Size((1,N)) (i.e, has a leading batch dimension 1).

If this is not possible, a suitable exception will be raised.

Parameters:

Name Type Description Default
simulator Callable

Simulator as provided by the user.

required
prior

Prior as provided by the user.

required

Returns:

Type Description
Tuple[Callable, Distribution]

Tuple (simulator, prior) checked and matching the requirements of sbi.

Source code in sbi/utils/user_input_checks.py
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
def prepare_for_sbi(simulator: Callable, prior) -> Tuple[Callable, Distribution]:
    """Prepare simulator and prior for usage in sbi.

    NOTE: This is a wrapper around `process_prior` and `process_simulator` which can be
    used in isolation as well.

    Attempts to meet the following requirements by reshaping and type-casting:

    - the simulator function receives as input and returns a Tensor.<br/>
    - the simulator can simulate batches of parameters and return batches of data.<br/>
    - the prior does not produce batches and samples and evaluates to Tensor.<br/>
    - the output shape is a `torch.Size((1,N))` (i.e, has a leading batch dimension 1).

    If this is not possible, a suitable exception will be raised.

    Args:
        simulator: Simulator as provided by the user.
        prior: Prior as provided by the user.

    Returns:
        Tuple (simulator, prior) checked and matching the requirements of sbi.
    """

    # Check prior, return PyTorch prior.
    prior, _, prior_returns_numpy = process_prior(prior)

    # Check simulator, returns PyTorch simulator able to simulate batches.
    simulator = process_simulator(simulator, prior, prior_returns_numpy)

    # Consistency check after making ready for sbi.
    check_sbi_inputs(simulator, prior)

    return simulator, prior

Returns (\(\theta, x\)) pairs obtained from sampling the proposal and simulating.

This function performs two steps:

  • Sample parameters \(\theta\) from the proposal.
  • Simulate these parameters to obtain \(x\).

Parameters:

Name Type Description Default
simulator Callable

A function that takes parameters \(\theta\) and maps them to simulations, or observations, x, \(\text{sim}(\theta)\to x\). Any regular Python callable (i.e. function or class with __call__ method) can be used.

required
proposal Any

Probability distribution that the parameters \(\theta\) are sampled from.

required
num_simulations int

Number of simulations that are run.

required
num_workers int

Number of parallel workers to use for simulations.

1
simulation_batch_size int

Number of parameter sets that the simulator maps to data x at once. If None, we simulate all parameter sets at the same time. If >= 1, the simulator has to process data of shape (simulation_batch_size, parameter_dimension).

1
seed Optional[int]

Seed for reproducibility.

None
show_progress_bar bool

Whether to show a progress bar for simulating. This will not affect whether there will be a progressbar while drawing samples from the proposal.

True

Returns: Sampled parameters \(\theta\) and simulation-outputs \(x\).

Source code in sbi/inference/base.py
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
def simulate_for_sbi(
    simulator: Callable,
    proposal: Any,
    num_simulations: int,
    num_workers: int = 1,
    simulation_batch_size: int = 1,
    seed: Optional[int] = None,
    show_progress_bar: bool = True,
) -> Tuple[Tensor, Tensor]:
    r"""Returns ($\theta, x$) pairs obtained from sampling the proposal and simulating.

    This function performs two steps:

    - Sample parameters $\theta$ from the `proposal`.
    - Simulate these parameters to obtain $x$.

    Args:
        simulator: A function that takes parameters $\theta$ and maps them to
            simulations, or observations, `x`, $\text{sim}(\theta)\to x$. Any
            regular Python callable (i.e. function or class with `__call__` method)
            can be used.
        proposal: Probability distribution that the parameters $\theta$ are sampled
            from.
        num_simulations: Number of simulations that are run.
        num_workers: Number of parallel workers to use for simulations.
        simulation_batch_size: Number of parameter sets that the simulator
            maps to data x at once. If None, we simulate all parameter sets at the
            same time. If >= 1, the simulator has to process data of shape
            (simulation_batch_size, parameter_dimension).
        seed: Seed for reproducibility.
        show_progress_bar: Whether to show a progress bar for simulating. This will not
            affect whether there will be a progressbar while drawing samples from the
            proposal.

    Returns: Sampled parameters $\theta$ and simulation-outputs $x$.
    """

    theta = proposal.sample((num_simulations,))

    x = simulate_in_batches(
        simulator=simulator,
        theta=theta,
        sim_batch_size=simulation_batch_size,
        num_workers=num_workers,
        seed=seed,
        show_progress_bars=show_progress_bar,
    )

    return theta, x

Bases: PosteriorEstimator

Source code in sbi/inference/snpe/snpe_a.py
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
class SNPE_A(PosteriorEstimator):
    def __init__(
        self,
        prior: Optional[Distribution] = None,
        density_estimator: Union[str, Callable] = "mdn_snpe_a",
        num_components: int = 10,
        device: str = "cpu",
        logging_level: Union[int, str] = "WARNING",
        summary_writer: Optional[TensorboardSummaryWriter] = None,
        show_progress_bars: bool = True,
    ):
        r"""SNPE-A [1].

        [1] _Fast epsilon-free Inference of Simulation Models with Bayesian Conditional
            Density Estimation_, Papamakarios et al., NeurIPS 2016,
            https://arxiv.org/abs/1605.06376.

        This class implements SNPE-A. SNPE-A trains across multiple rounds with a
        maximum-likelihood-loss. This will make training converge to the proposal
        posterior instead of the true posterior. To correct for this, SNPE-A applies a
        post-hoc correction after training. This correction has to be performed
        analytically. Thus, SNPE-A is limited to Gaussian distributions for all but the
        last round. In the last round, SNPE-A can use a Mixture of Gaussians.

        Args:
            prior: A probability distribution that expresses prior knowledge about the
                parameters, e.g. which ranges are meaningful for them. Any
                object with `.log_prob()`and `.sample()` (for example, a PyTorch
                distribution) can be used.
            density_estimator: If it is a string (only "mdn_snpe_a" is valid), use a
                pre-configured mixture of densities network. Alternatively, a function
                that builds a custom neural network can be provided. The function will
                be called with the first batch of simulations (theta, x), which can
                thus be used for shape inference and potentially for z-scoring. It
                needs to return a PyTorch `nn.Module` implementing the density
                estimator. The density estimator needs to provide the methods
                `.log_prob` and `.sample()`. Note that until the last round only a
                single (multivariate) Gaussian component is used for training (see
                Algorithm 1 in [1]). In the last round, this component is replicated
                `num_components` times, its parameters are perturbed with a very small
                noise, and then the last training round is done with the expanded
                Gaussian mixture as estimator for the proposal posterior.
            num_components: Number of components of the mixture of Gaussians in the
                last round. This overrides the `num_components` value passed to
                `posterior_nn()`.
            device: Training device, e.g., "cpu", "cuda" or "cuda:{0, 1, ...}".
            logging_level: Minimum severity of messages to log. One of the strings
                INFO, WARNING, DEBUG, ERROR and CRITICAL.
            summary_writer: A tensorboard `SummaryWriter` to control, among others, log
                file location (default is `<current working directory>/logs`.)
            show_progress_bars: Whether to show a progressbar during training.
        """

        # Catch invalid inputs.
        if not ((density_estimator == "mdn_snpe_a") or callable(density_estimator)):
            raise TypeError(
                "The `density_estimator` passed to SNPE_A needs to be a "
                "callable or the string 'mdn_snpe_a'!"
            )

        # `num_components` will be used to replicate the Gaussian in the last round.
        self._num_components = num_components
        self._ran_final_round = False

        # WARNING: sneaky trick ahead. We proxy the parent's `train` here,
        # requiring the signature to have `num_atoms`, save it for use below, and
        # continue. It's sneaky because we are using the object (self) as a namespace
        # to pass arguments between functions, and that's implicit state management.
        kwargs = utils.del_entries(
            locals(),
            entries=("self", "__class__", "num_components"),
        )
        super().__init__(**kwargs)

    def train(
        self,
        final_round: bool = False,
        training_batch_size: int = 50,
        learning_rate: float = 5e-4,
        validation_fraction: float = 0.1,
        stop_after_epochs: int = 20,
        max_num_epochs: int = 2**31 - 1,
        clip_max_norm: Optional[float] = 5.0,
        calibration_kernel: Optional[Callable] = None,
        resume_training: bool = False,
        retrain_from_scratch: bool = False,
        show_train_summary: bool = False,
        dataloader_kwargs: Optional[Dict] = None,
        component_perturbation: float = 5e-3,
    ) -> nn.Module:
        r"""Return density estimator that approximates the proposal posterior.

        [1] _Fast epsilon-free Inference of Simulation Models with Bayesian Conditional
            Density Estimation_, Papamakarios et al., NeurIPS 2016,
            https://arxiv.org/abs/1605.06376.

        Training is performed with maximum likelihood on samples from the latest round,
        which leads the algorithm to converge to the proposal posterior.

        Args:
            final_round: Whether we are in the last round of training or not. For all
                but the last round, Algorithm 1 from [1] is executed. In last the
                round, Algorithm 2 from [1] is executed once.
            training_batch_size: Training batch size.
            learning_rate: Learning rate for Adam optimizer.
            validation_fraction: The fraction of data to use for validation.
            stop_after_epochs: The number of epochs to wait for improvement on the
                validation set before terminating training.
            max_num_epochs: Maximum number of epochs to run. If reached, we stop
                training even when the validation loss is still decreasing. Otherwise,
                we train until validation loss increases (see also `stop_after_epochs`).
            clip_max_norm: Value at which to clip the total gradient norm in order to
                prevent exploding gradients. Use None for no clipping.
            calibration_kernel: A function to calibrate the loss with respect to the
                simulations `x`. See Lueckmann, Gonçalves et al., NeurIPS 2017.
            resume_training: Can be used in case training time is limited, e.g. on a
                cluster. If `True`, the split between train and validation set, the
                optimizer, the number of epochs, and the best validation log-prob will
                be restored from the last time `.train()` was called.
            force_first_round_loss: If `True`, train with maximum likelihood,
                i.e., potentially ignoring the correction for using a proposal
                distribution different from the prior.
            retrain_from_scratch: Whether to retrain the conditional density
                estimator for the posterior from scratch each round. Not supported for
                SNPE-A.
            show_train_summary: Whether to print the number of epochs and validation
                loss and leakage after the training.
            dataloader_kwargs: Additional or updated kwargs to be passed to the training
                and validation dataloaders (like, e.g., a collate_fn)
            component_perturbation: The standard deviation applied to all weights and
                biases when, in the last round, the Mixture of Gaussians is build from
                a single Gaussian. This value can be problem-specific and also depends
                on the number of mixture components.

        Returns:
            Density estimator that approximates the distribution $p(\theta|x)$.
        """

        assert not retrain_from_scratch, """Retraining from scratch is not supported in
            SNPE-A yet. The reason for this is that, if we reininitialized the density
            estimator, the z-scoring would change, which would break the posthoc
            correction. This is a pure implementation issue."""

        kwargs = utils.del_entries(
            locals(),
            entries=(
                "self",
                "__class__",
                "final_round",
                "component_perturbation",
            ),
        )

        # SNPE-A always discards the prior samples.
        kwargs["discard_prior_samples"] = True
        kwargs["force_first_round_loss"] = True

        self._round = max(self._data_round_index)

        if final_round:
            # If there is (will be) only one round, train with Algorithm 2 from [1].
            if self._round == 0:
                self._build_neural_net = partial(
                    self._build_neural_net, num_components=self._num_components
                )
            # Run Algorithm 2 from [1].
            elif not self._ran_final_round:
                # Now switch to the specified number of components. This method will
                # only be used if `retrain_from_scratch=True`. Otherwise,
                # the MDN will be built from replicating the single-component net for
                # `num_component` times (via `_expand_mog()`).
                self._build_neural_net = partial(
                    self._build_neural_net, num_components=self._num_components
                )

                # Extend the MDN to the originally desired number of components.
                self._expand_mog(eps=component_perturbation)
            else:
                warnings.warn(
                    "You have already run SNPE-A with `final_round=True`. Running it"
                    "again with this setting will not allow computing the posthoc"
                    "correction applied in SNPE-A. Thus, you will get an error when "
                    "calling `.build_posterior()` after training.",
                    UserWarning,
                )
        else:
            # Run Algorithm 1 from [1].
            # Wrap the function that builds the MDN such that we can make
            # sure that there is only one component when running.
            self._build_neural_net = partial(self._build_neural_net, num_components=1)

        if final_round:
            self._ran_final_round = True

        return super().train(**kwargs)

    def correct_for_proposal(
        self,
        density_estimator: Optional[TorchModule] = None,
    ) -> "SNPE_A_MDN":
        r"""Build mixture of Gaussians that approximates the posterior.

        Returns a `SNPE_A_MDN` object, which applies the posthoc-correction required in
        SNPE-A.

        Args:
            density_estimator: The density estimator that the posterior is based on.
                If `None`, use the latest neural density estimator that was trained.

        Returns:
            Posterior $p(\theta|x)$  with `.sample()` and `.log_prob()` methods.
        """
        if density_estimator is None:
            density_estimator = deepcopy(
                self._neural_net
            )  # PosteriorEstimator.train() also returns a deepcopy, mimic this here
            # If internal net is used device is defined.
            device = self._device
        else:
            # Otherwise, infer it from the device of the net parameters.
            device = str(next(density_estimator.parameters()).device)

        # Set proposal of the density estimator.
        # This also evokes the z-scoring correction if necessary.
        if (
            self._proposal_roundwise[-1] is self._prior
            or self._proposal_roundwise[-1] is None
        ):
            proposal = self._prior
            assert isinstance(
                proposal, (MultivariateNormal, utils.BoxUniform)
            ), """Prior must be `torch.distributions.MultivariateNormal` or `sbi.utils.
                BoxUniform`"""
        else:
            assert isinstance(
                self._proposal_roundwise[-1], DirectPosterior
            ), """The proposal you passed to `append_simulations` is neither the prior
                nor a `DirectPosterior`. SNPE-A currently only supports these scenarios.
                """
            proposal = self._proposal_roundwise[-1]

        # Create the SNPE_A_MDN
        wrapped_density_estimator = SNPE_A_MDN(
            flow=density_estimator,  # type: ignore
            proposal=proposal,
            prior=self._prior,
            device=device,
        )
        return wrapped_density_estimator

    def build_posterior(
        self,
        density_estimator: Optional[TorchModule] = None,
        prior: Optional[Distribution] = None,
    ) -> "DirectPosterior":
        r"""Build posterior from the neural density estimator.

        This method first corrects the estimated density with `correct_for_proposal`
        and then returns a `DirectPosterior`.

        Args:
            density_estimator: The density estimator that the posterior is based on.
                If `None`, use the latest neural density estimator that was trained.
            prior: Prior distribution.

        Returns:
            Posterior $p(\theta|x)$  with `.sample()` and `.log_prob()` methods.
        """
        if prior is None:
            assert (
                self._prior is not None
            ), """You did not pass a prior. You have to pass the prior either at
                initialization `inference = SNPE_A(prior)` or to `.build_posterior
                (prior=prior)`."""
            prior = self._prior

        wrapped_density_estimator = self.correct_for_proposal(
            density_estimator=density_estimator
        )
        self._posterior = DirectPosterior(
            posterior_estimator=wrapped_density_estimator,  # type: ignore
            prior=prior,
        )
        return deepcopy(self._posterior)

    def _log_prob_proposal_posterior(
        self,
        theta: Tensor,
        x: Tensor,
        masks: Tensor,
        proposal: Optional[Any],
    ) -> Tensor:
        """Return the log-probability of the proposal posterior.

        For SNPE-A this is the same as `self._neural_net.log_prob(theta, x)` in
        `_loss()` to be found in `snpe_base.py`.

        Args:
            theta: Batch of parameters θ.
            x: Batch of data.
            masks: Mask that is True for prior samples in the batch in order to train
                them with prior loss.
            proposal: Proposal distribution.

        Returns: Log-probability of the proposal posterior.
        """
        return self._neural_net.log_prob(theta, x)

    def _expand_mog(self, eps: float = 1e-5):
        """
        Replicate a singe Gaussian trained with Algorithm 1 before continuing
        with Algorithm 2. The weights and biases of the associated MDN layers
        are repeated `num_components` times, slightly perturbed to break the
        symmetry such that the gradients in the subsequent training are not
        all identical.

        Args:
            eps: Standard deviation for the random perturbation.
        """
        assert isinstance(self._neural_net._distribution, MultivariateGaussianMDN)

        # Increase the number of components
        self._neural_net._distribution._num_components = self._num_components

        # Expand the 1-dim Gaussian.
        for name, param in self._neural_net.named_parameters():
            if any(
                key in name for key in ["logits", "means", "unconstrained", "upper"]
            ):
                if "bias" in name:
                    param.data = param.data.repeat(self._num_components)
                    param.data.add_(torch.randn_like(param.data) * eps)
                    param.grad = None  # let autograd construct a new gradient
                elif "weight" in name:
                    param.data = param.data.repeat(self._num_components, 1)
                    param.data.add_(torch.randn_like(param.data) * eps)
                    param.grad = None  # let autograd construct a new gradient

__init__(prior=None, density_estimator='mdn_snpe_a', num_components=10, device='cpu', logging_level='WARNING', summary_writer=None, show_progress_bars=True)

SNPE-A [1].

[1] Fast epsilon-free Inference of Simulation Models with Bayesian Conditional Density Estimation, Papamakarios et al., NeurIPS 2016, https://arxiv.org/abs/1605.06376.

This class implements SNPE-A. SNPE-A trains across multiple rounds with a maximum-likelihood-loss. This will make training converge to the proposal posterior instead of the true posterior. To correct for this, SNPE-A applies a post-hoc correction after training. This correction has to be performed analytically. Thus, SNPE-A is limited to Gaussian distributions for all but the last round. In the last round, SNPE-A can use a Mixture of Gaussians.

Parameters:

Name Type Description Default
prior Optional[Distribution]

A probability distribution that expresses prior knowledge about the parameters, e.g. which ranges are meaningful for them. Any object with .log_prob()and .sample() (for example, a PyTorch distribution) can be used.

None
density_estimator Union[str, Callable]

If it is a string (only “mdn_snpe_a” is valid), use a pre-configured mixture of densities network. Alternatively, a function that builds a custom neural network can be provided. The function will be called with the first batch of simulations (theta, x), which can thus be used for shape inference and potentially for z-scoring. It needs to return a PyTorch nn.Module implementing the density estimator. The density estimator needs to provide the methods .log_prob and .sample(). Note that until the last round only a single (multivariate) Gaussian component is used for training (see Algorithm 1 in [1]). In the last round, this component is replicated num_components times, its parameters are perturbed with a very small noise, and then the last training round is done with the expanded Gaussian mixture as estimator for the proposal posterior.

'mdn_snpe_a'
num_components int

Number of components of the mixture of Gaussians in the last round. This overrides the num_components value passed to posterior_nn().

10
device str

Training device, e.g., “cpu”, “cuda” or “cuda:{0, 1, …}”.

'cpu'
logging_level Union[int, str]

Minimum severity of messages to log. One of the strings INFO, WARNING, DEBUG, ERROR and CRITICAL.

'WARNING'
summary_writer Optional[TensorboardSummaryWriter]

A tensorboard SummaryWriter to control, among others, log file location (default is <current working directory>/logs.)

None
show_progress_bars bool

Whether to show a progressbar during training.

True
Source code in sbi/inference/snpe/snpe_a.py
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
def __init__(
    self,
    prior: Optional[Distribution] = None,
    density_estimator: Union[str, Callable] = "mdn_snpe_a",
    num_components: int = 10,
    device: str = "cpu",
    logging_level: Union[int, str] = "WARNING",
    summary_writer: Optional[TensorboardSummaryWriter] = None,
    show_progress_bars: bool = True,
):
    r"""SNPE-A [1].

    [1] _Fast epsilon-free Inference of Simulation Models with Bayesian Conditional
        Density Estimation_, Papamakarios et al., NeurIPS 2016,
        https://arxiv.org/abs/1605.06376.

    This class implements SNPE-A. SNPE-A trains across multiple rounds with a
    maximum-likelihood-loss. This will make training converge to the proposal
    posterior instead of the true posterior. To correct for this, SNPE-A applies a
    post-hoc correction after training. This correction has to be performed
    analytically. Thus, SNPE-A is limited to Gaussian distributions for all but the
    last round. In the last round, SNPE-A can use a Mixture of Gaussians.

    Args:
        prior: A probability distribution that expresses prior knowledge about the
            parameters, e.g. which ranges are meaningful for them. Any
            object with `.log_prob()`and `.sample()` (for example, a PyTorch
            distribution) can be used.
        density_estimator: If it is a string (only "mdn_snpe_a" is valid), use a
            pre-configured mixture of densities network. Alternatively, a function
            that builds a custom neural network can be provided. The function will
            be called with the first batch of simulations (theta, x), which can
            thus be used for shape inference and potentially for z-scoring. It
            needs to return a PyTorch `nn.Module` implementing the density
            estimator. The density estimator needs to provide the methods
            `.log_prob` and `.sample()`. Note that until the last round only a
            single (multivariate) Gaussian component is used for training (see
            Algorithm 1 in [1]). In the last round, this component is replicated
            `num_components` times, its parameters are perturbed with a very small
            noise, and then the last training round is done with the expanded
            Gaussian mixture as estimator for the proposal posterior.
        num_components: Number of components of the mixture of Gaussians in the
            last round. This overrides the `num_components` value passed to
            `posterior_nn()`.
        device: Training device, e.g., "cpu", "cuda" or "cuda:{0, 1, ...}".
        logging_level: Minimum severity of messages to log. One of the strings
            INFO, WARNING, DEBUG, ERROR and CRITICAL.
        summary_writer: A tensorboard `SummaryWriter` to control, among others, log
            file location (default is `<current working directory>/logs`.)
        show_progress_bars: Whether to show a progressbar during training.
    """

    # Catch invalid inputs.
    if not ((density_estimator == "mdn_snpe_a") or callable(density_estimator)):
        raise TypeError(
            "The `density_estimator` passed to SNPE_A needs to be a "
            "callable or the string 'mdn_snpe_a'!"
        )

    # `num_components` will be used to replicate the Gaussian in the last round.
    self._num_components = num_components
    self._ran_final_round = False

    # WARNING: sneaky trick ahead. We proxy the parent's `train` here,
    # requiring the signature to have `num_atoms`, save it for use below, and
    # continue. It's sneaky because we are using the object (self) as a namespace
    # to pass arguments between functions, and that's implicit state management.
    kwargs = utils.del_entries(
        locals(),
        entries=("self", "__class__", "num_components"),
    )
    super().__init__(**kwargs)

build_posterior(density_estimator=None, prior=None)

Build posterior from the neural density estimator.

This method first corrects the estimated density with correct_for_proposal and then returns a DirectPosterior.

Parameters:

Name Type Description Default
density_estimator Optional[TorchModule]

The density estimator that the posterior is based on. If None, use the latest neural density estimator that was trained.

None
prior Optional[Distribution]

Prior distribution.

None

Returns:

Type Description
DirectPosterior

Posterior \(p(\theta|x)\) with .sample() and .log_prob() methods.

Source code in sbi/inference/snpe/snpe_a.py
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
def build_posterior(
    self,
    density_estimator: Optional[TorchModule] = None,
    prior: Optional[Distribution] = None,
) -> "DirectPosterior":
    r"""Build posterior from the neural density estimator.

    This method first corrects the estimated density with `correct_for_proposal`
    and then returns a `DirectPosterior`.

    Args:
        density_estimator: The density estimator that the posterior is based on.
            If `None`, use the latest neural density estimator that was trained.
        prior: Prior distribution.

    Returns:
        Posterior $p(\theta|x)$  with `.sample()` and `.log_prob()` methods.
    """
    if prior is None:
        assert (
            self._prior is not None
        ), """You did not pass a prior. You have to pass the prior either at
            initialization `inference = SNPE_A(prior)` or to `.build_posterior
            (prior=prior)`."""
        prior = self._prior

    wrapped_density_estimator = self.correct_for_proposal(
        density_estimator=density_estimator
    )
    self._posterior = DirectPosterior(
        posterior_estimator=wrapped_density_estimator,  # type: ignore
        prior=prior,
    )
    return deepcopy(self._posterior)

correct_for_proposal(density_estimator=None)

Build mixture of Gaussians that approximates the posterior.

Returns a SNPE_A_MDN object, which applies the posthoc-correction required in SNPE-A.

Parameters:

Name Type Description Default
density_estimator Optional[TorchModule]

The density estimator that the posterior is based on. If None, use the latest neural density estimator that was trained.

None

Returns:

Type Description
SNPE_A_MDN

Posterior \(p(\theta|x)\) with .sample() and .log_prob() methods.

Source code in sbi/inference/snpe/snpe_a.py
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
def correct_for_proposal(
    self,
    density_estimator: Optional[TorchModule] = None,
) -> "SNPE_A_MDN":
    r"""Build mixture of Gaussians that approximates the posterior.

    Returns a `SNPE_A_MDN` object, which applies the posthoc-correction required in
    SNPE-A.

    Args:
        density_estimator: The density estimator that the posterior is based on.
            If `None`, use the latest neural density estimator that was trained.

    Returns:
        Posterior $p(\theta|x)$  with `.sample()` and `.log_prob()` methods.
    """
    if density_estimator is None:
        density_estimator = deepcopy(
            self._neural_net
        )  # PosteriorEstimator.train() also returns a deepcopy, mimic this here
        # If internal net is used device is defined.
        device = self._device
    else:
        # Otherwise, infer it from the device of the net parameters.
        device = str(next(density_estimator.parameters()).device)

    # Set proposal of the density estimator.
    # This also evokes the z-scoring correction if necessary.
    if (
        self._proposal_roundwise[-1] is self._prior
        or self._proposal_roundwise[-1] is None
    ):
        proposal = self._prior
        assert isinstance(
            proposal, (MultivariateNormal, utils.BoxUniform)
        ), """Prior must be `torch.distributions.MultivariateNormal` or `sbi.utils.
            BoxUniform`"""
    else:
        assert isinstance(
            self._proposal_roundwise[-1], DirectPosterior
        ), """The proposal you passed to `append_simulations` is neither the prior
            nor a `DirectPosterior`. SNPE-A currently only supports these scenarios.
            """
        proposal = self._proposal_roundwise[-1]

    # Create the SNPE_A_MDN
    wrapped_density_estimator = SNPE_A_MDN(
        flow=density_estimator,  # type: ignore
        proposal=proposal,
        prior=self._prior,
        device=device,
    )
    return wrapped_density_estimator

train(final_round=False, training_batch_size=50, learning_rate=0.0005, validation_fraction=0.1, stop_after_epochs=20, max_num_epochs=2 ** 31 - 1, clip_max_norm=5.0, calibration_kernel=None, resume_training=False, retrain_from_scratch=False, show_train_summary=False, dataloader_kwargs=None, component_perturbation=0.005)

Return density estimator that approximates the proposal posterior.

[1] Fast epsilon-free Inference of Simulation Models with Bayesian Conditional Density Estimation, Papamakarios et al., NeurIPS 2016, https://arxiv.org/abs/1605.06376.

Training is performed with maximum likelihood on samples from the latest round, which leads the algorithm to converge to the proposal posterior.

Parameters:

Name Type Description Default
final_round bool

Whether we are in the last round of training or not. For all but the last round, Algorithm 1 from [1] is executed. In last the round, Algorithm 2 from [1] is executed once.

False
training_batch_size int

Training batch size.

50
learning_rate float

Learning rate for Adam optimizer.

0.0005
validation_fraction float

The fraction of data to use for validation.

0.1
stop_after_epochs int

The number of epochs to wait for improvement on the validation set before terminating training.

20
max_num_epochs int

Maximum number of epochs to run. If reached, we stop training even when the validation loss is still decreasing. Otherwise, we train until validation loss increases (see also stop_after_epochs).

2 ** 31 - 1
clip_max_norm Optional[float]

Value at which to clip the total gradient norm in order to prevent exploding gradients. Use None for no clipping.

5.0
calibration_kernel Optional[Callable]

A function to calibrate the loss with respect to the simulations x. See Lueckmann, Gonçalves et al., NeurIPS 2017.

None
resume_training bool

Can be used in case training time is limited, e.g. on a cluster. If True, the split between train and validation set, the optimizer, the number of epochs, and the best validation log-prob will be restored from the last time .train() was called.

False
force_first_round_loss

If True, train with maximum likelihood, i.e., potentially ignoring the correction for using a proposal distribution different from the prior.

required
retrain_from_scratch bool

Whether to retrain the conditional density estimator for the posterior from scratch each round. Not supported for SNPE-A.

False
show_train_summary bool

Whether to print the number of epochs and validation loss and leakage after the training.

False
dataloader_kwargs Optional[Dict]

Additional or updated kwargs to be passed to the training and validation dataloaders (like, e.g., a collate_fn)

None
component_perturbation float

The standard deviation applied to all weights and biases when, in the last round, the Mixture of Gaussians is build from a single Gaussian. This value can be problem-specific and also depends on the number of mixture components.

0.005

Returns:

Type Description
Module

Density estimator that approximates the distribution \(p(\theta|x)\).

Source code in sbi/inference/snpe/snpe_a.py
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
def train(
    self,
    final_round: bool = False,
    training_batch_size: int = 50,
    learning_rate: float = 5e-4,
    validation_fraction: float = 0.1,
    stop_after_epochs: int = 20,
    max_num_epochs: int = 2**31 - 1,
    clip_max_norm: Optional[float] = 5.0,
    calibration_kernel: Optional[Callable] = None,
    resume_training: bool = False,
    retrain_from_scratch: bool = False,
    show_train_summary: bool = False,
    dataloader_kwargs: Optional[Dict] = None,
    component_perturbation: float = 5e-3,
) -> nn.Module:
    r"""Return density estimator that approximates the proposal posterior.

    [1] _Fast epsilon-free Inference of Simulation Models with Bayesian Conditional
        Density Estimation_, Papamakarios et al., NeurIPS 2016,
        https://arxiv.org/abs/1605.06376.

    Training is performed with maximum likelihood on samples from the latest round,
    which leads the algorithm to converge to the proposal posterior.

    Args:
        final_round: Whether we are in the last round of training or not. For all
            but the last round, Algorithm 1 from [1] is executed. In last the
            round, Algorithm 2 from [1] is executed once.
        training_batch_size: Training batch size.
        learning_rate: Learning rate for Adam optimizer.
        validation_fraction: The fraction of data to use for validation.
        stop_after_epochs: The number of epochs to wait for improvement on the
            validation set before terminating training.
        max_num_epochs: Maximum number of epochs to run. If reached, we stop
            training even when the validation loss is still decreasing. Otherwise,
            we train until validation loss increases (see also `stop_after_epochs`).
        clip_max_norm: Value at which to clip the total gradient norm in order to
            prevent exploding gradients. Use None for no clipping.
        calibration_kernel: A function to calibrate the loss with respect to the
            simulations `x`. See Lueckmann, Gonçalves et al., NeurIPS 2017.
        resume_training: Can be used in case training time is limited, e.g. on a
            cluster. If `True`, the split between train and validation set, the
            optimizer, the number of epochs, and the best validation log-prob will
            be restored from the last time `.train()` was called.
        force_first_round_loss: If `True`, train with maximum likelihood,
            i.e., potentially ignoring the correction for using a proposal
            distribution different from the prior.
        retrain_from_scratch: Whether to retrain the conditional density
            estimator for the posterior from scratch each round. Not supported for
            SNPE-A.
        show_train_summary: Whether to print the number of epochs and validation
            loss and leakage after the training.
        dataloader_kwargs: Additional or updated kwargs to be passed to the training
            and validation dataloaders (like, e.g., a collate_fn)
        component_perturbation: The standard deviation applied to all weights and
            biases when, in the last round, the Mixture of Gaussians is build from
            a single Gaussian. This value can be problem-specific and also depends
            on the number of mixture components.

    Returns:
        Density estimator that approximates the distribution $p(\theta|x)$.
    """

    assert not retrain_from_scratch, """Retraining from scratch is not supported in
        SNPE-A yet. The reason for this is that, if we reininitialized the density
        estimator, the z-scoring would change, which would break the posthoc
        correction. This is a pure implementation issue."""

    kwargs = utils.del_entries(
        locals(),
        entries=(
            "self",
            "__class__",
            "final_round",
            "component_perturbation",
        ),
    )

    # SNPE-A always discards the prior samples.
    kwargs["discard_prior_samples"] = True
    kwargs["force_first_round_loss"] = True

    self._round = max(self._data_round_index)

    if final_round:
        # If there is (will be) only one round, train with Algorithm 2 from [1].
        if self._round == 0:
            self._build_neural_net = partial(
                self._build_neural_net, num_components=self._num_components
            )
        # Run Algorithm 2 from [1].
        elif not self._ran_final_round:
            # Now switch to the specified number of components. This method will
            # only be used if `retrain_from_scratch=True`. Otherwise,
            # the MDN will be built from replicating the single-component net for
            # `num_component` times (via `_expand_mog()`).
            self._build_neural_net = partial(
                self._build_neural_net, num_components=self._num_components
            )

            # Extend the MDN to the originally desired number of components.
            self._expand_mog(eps=component_perturbation)
        else:
            warnings.warn(
                "You have already run SNPE-A with `final_round=True`. Running it"
                "again with this setting will not allow computing the posthoc"
                "correction applied in SNPE-A. Thus, you will get an error when "
                "calling `.build_posterior()` after training.",
                UserWarning,
            )
    else:
        # Run Algorithm 1 from [1].
        # Wrap the function that builds the MDN such that we can make
        # sure that there is only one component when running.
        self._build_neural_net = partial(self._build_neural_net, num_components=1)

    if final_round:
        self._ran_final_round = True

    return super().train(**kwargs)

Bases: PosteriorEstimator

Source code in sbi/inference/snpe/snpe_c.py
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
class SNPE_C(PosteriorEstimator):
    def __init__(
        self,
        prior: Optional[Distribution] = None,
        density_estimator: Union[str, Callable] = "maf",
        device: str = "cpu",
        logging_level: Union[int, str] = "WARNING",
        summary_writer: Optional[TensorboardSummaryWriter] = None,
        show_progress_bars: bool = True,
    ):
        r"""SNPE-C / APT [1].

        [1] _Automatic Posterior Transformation for Likelihood-free Inference_,
            Greenberg et al., ICML 2019, https://arxiv.org/abs/1905.07488.

        This class implements two loss variants of SNPE-C: the non-atomic and the atomic
        version. The atomic loss of SNPE-C can be used for any density estimator,
        i.e. also for normalizing flows. However, it suffers from leakage issues. On
        the other hand, the non-atomic loss can only be used only if the proposal
        distribution is a mixture of Gaussians, the density estimator is a mixture of
        Gaussians, and the prior is either Gaussian or Uniform. It does not suffer from
        leakage issues. At the beginning of each round, we print whether the non-atomic
        or the atomic version is used.

        In this codebase, we will automatically switch to the non-atomic loss if the
        following criteria are fulfilled:<br/>
        - proposal is a `DirectPosterior` with density_estimator `mdn`, as built
            with `utils.sbi.posterior_nn()`.<br/>
        - the density estimator is a `mdn`, as built with
            `utils.sbi.posterior_nn()`.<br/>
        - `isinstance(prior, MultivariateNormal)` (from `torch.distributions`) or
            `isinstance(prior, sbi.utils.BoxUniform)`

        Note that custom implementations of any of these densities (or estimators) will
        not trigger the non-atomic loss, and the algorithm will fall back onto using
        the atomic loss.

        Args:
            prior: A probability distribution that expresses prior knowledge about the
                parameters, e.g. which ranges are meaningful for them.
            density_estimator: If it is a string, use a pre-configured network of the
                provided type (one of nsf, maf, mdn, made). Alternatively, a function
                that builds a custom neural network can be provided. The function will
                be called with the first batch of simulations (theta, x), which can
                thus be used for shape inference and potentially for z-scoring. It
                needs to return a PyTorch `nn.Module` implementing the density
                estimator. The density estimator needs to provide the methods
                `.log_prob` and `.sample()`.
            device: Training device, e.g., "cpu", "cuda" or "cuda:{0, 1, ...}".
            logging_level: Minimum severity of messages to log. One of the strings
                INFO, WARNING, DEBUG, ERROR and CRITICAL.
            summary_writer: A tensorboard `SummaryWriter` to control, among others, log
                file location (default is `<current working directory>/logs`.)
            show_progress_bars: Whether to show a progressbar during training.
        """

        kwargs = del_entries(locals(), entries=("self", "__class__"))
        super().__init__(**kwargs)

    def train(
        self,
        num_atoms: int = 10,
        training_batch_size: int = 50,
        learning_rate: float = 5e-4,
        validation_fraction: float = 0.1,
        stop_after_epochs: int = 20,
        max_num_epochs: int = 2**31 - 1,
        clip_max_norm: Optional[float] = 5.0,
        calibration_kernel: Optional[Callable] = None,
        resume_training: bool = False,
        force_first_round_loss: bool = False,
        discard_prior_samples: bool = False,
        use_combined_loss: bool = False,
        retrain_from_scratch: bool = False,
        show_train_summary: bool = False,
        dataloader_kwargs: Optional[Dict] = None,
    ) -> nn.Module:
        r"""Return density estimator that approximates the distribution $p(\theta|x)$.

        Args:
            num_atoms: Number of atoms to use for classification.
            training_batch_size: Training batch size.
            learning_rate: Learning rate for Adam optimizer.
            validation_fraction: The fraction of data to use for validation.
            stop_after_epochs: The number of epochs to wait for improvement on the
                validation set before terminating training.
            max_num_epochs: Maximum number of epochs to run. If reached, we stop
                training even when the validation loss is still decreasing. Otherwise,
                we train until validation loss increases (see also `stop_after_epochs`).
            clip_max_norm: Value at which to clip the total gradient norm in order to
                prevent exploding gradients. Use None for no clipping.
            calibration_kernel: A function to calibrate the loss with respect to the
                simulations `x`. See Lueckmann, Gonçalves et al., NeurIPS 2017.
            resume_training: Can be used in case training time is limited, e.g. on a
                cluster. If `True`, the split between train and validation set, the
                optimizer, the number of epochs, and the best validation log-prob will
                be restored from the last time `.train()` was called.
            force_first_round_loss: If `True`, train with maximum likelihood,
                i.e., potentially ignoring the correction for using a proposal
                distribution different from the prior.
            discard_prior_samples: Whether to discard samples simulated in round 1, i.e.
                from the prior. Training may be sped up by ignoring such less targeted
                samples.
            use_combined_loss: Whether to train the neural net also on prior samples
                using maximum likelihood in addition to training it on all samples using
                atomic loss. The extra MLE loss helps prevent density leaking with
                bounded priors.
            retrain_from_scratch: Whether to retrain the conditional density
                estimator for the posterior from scratch each round.
            show_train_summary: Whether to print the number of epochs and validation
                loss and leakage after the training.
            dataloader_kwargs: Additional or updated kwargs to be passed to the training
                and validation dataloaders (like, e.g., a collate_fn)

        Returns:
            Density estimator that approximates the distribution $p(\theta|x)$.
        """

        # WARNING: sneaky trick ahead. We proxy the parent's `train` here,
        # requiring the signature to have `num_atoms`, save it for use below, and
        # continue. It's sneaky because we are using the object (self) as a namespace
        # to pass arguments between functions, and that's implicit state management.
        self._num_atoms = num_atoms
        self._use_combined_loss = use_combined_loss
        kwargs = del_entries(
            locals(),
            entries=("self", "__class__", "num_atoms", "use_combined_loss"),
        )

        self._round = max(self._data_round_index)

        if self._round > 0:
            # Set the proposal to the last proposal that was passed by the user. For
            # atomic SNPE, it does not matter what the proposal is. For non-atomic
            # SNPE, we only use the latest data that was passed, i.e. the one from the
            # last proposal.
            proposal = self._proposal_roundwise[-1]
            self.use_non_atomic_loss = (
                isinstance(proposal, DirectPosterior)
                and isinstance(proposal.posterior_estimator._distribution, mdn)
                and isinstance(self._neural_net._distribution, mdn)
                and check_dist_class(
                    self._prior, class_to_check=(Uniform, MultivariateNormal)
                )[0]
            )

            algorithm = "non-atomic" if self.use_non_atomic_loss else "atomic"
            print(f"Using SNPE-C with {algorithm} loss")

            if self.use_non_atomic_loss:
                # Take care of z-scoring, pre-compute and store prior terms.
                self._set_state_for_mog_proposal()

        return super().train(**kwargs)

    def _set_state_for_mog_proposal(self) -> None:
        """Set state variables that are used at each training step of non-atomic SNPE-C.

        Three things are computed:
        1) Check if z-scoring was requested. To do so, we check if the `_transform`
            argument of the net had been a `CompositeTransform`. See pyknos mdn.py.
        2) Define a (potentially standardized) prior. It's standardized if z-scoring
            had been requested.
        3) Compute (Precision * mean) for the prior. This quantity is used at every
            training step if the prior is Gaussian.
        """

        self.z_score_theta = isinstance(self._neural_net._transform, CompositeTransform)

        self._set_maybe_z_scored_prior()

        if isinstance(self._maybe_z_scored_prior, MultivariateNormal):
            self.prec_m_prod_prior = torch.mv(
                self._maybe_z_scored_prior.precision_matrix,  # type: ignore
                self._maybe_z_scored_prior.loc,  # type: ignore
            )

    def _set_maybe_z_scored_prior(self) -> None:
        r"""Compute and store potentially standardized prior (if z-scoring was done).

        The proposal posterior is:
        $pp(\theta|x) = 1/Z * q(\theta|x) * prop(\theta) / p(\theta)$

        Let's denote z-scored theta by `a`: a = (theta - mean) / std
        Then pp'(a|x) = 1/Z_2 * q'(a|x) * prop'(a) / p'(a)$

        The ' indicates that the evaluation occurs in standardized space. The constant
        scaling factor has been absorbed into Z_2.
        From the above equation, we see that we need to evaluate the prior **in
        standardized space**. We build the standardized prior in this function.

        The standardize transform that is applied to the samples theta does not use
        the exact prior mean and std (due to implementation issues). Hence, the z-scored
        prior will not be exactly have mean=0 and std=1.
        """

        if self.z_score_theta:
            scale = self._neural_net._transform._transforms[0]._scale
            shift = self._neural_net._transform._transforms[0]._shift

            # Following the definintion of the linear transform in
            # `standardizing_transform` in `sbiutils.py`:
            # shift=-mean / std
            # scale=1 / std
            # Solving these equations for mean and std:
            estim_prior_std = 1 / scale
            estim_prior_mean = -shift * estim_prior_std

            # Compute the discrepancy of the true prior mean and std and the mean and
            # std that was empirically estimated from samples.
            # N(theta|m,s) = N((theta-m_e)/s_e|(m-m_e)/s_e, s/s_e)
            # Above: m,s are true prior mean and std. m_e,s_e are estimated prior mean
            # and std (estimated from samples and used to build standardize transform).
            almost_zero_mean = (self._prior.mean - estim_prior_mean) / estim_prior_std
            almost_one_std = torch.sqrt(self._prior.variance) / estim_prior_std

            if isinstance(self._prior, MultivariateNormal):
                self._maybe_z_scored_prior = MultivariateNormal(
                    almost_zero_mean, torch.diag(almost_one_std)
                )
            else:
                range_ = torch.sqrt(almost_one_std * 3.0)
                self._maybe_z_scored_prior = utils.BoxUniform(
                    almost_zero_mean - range_, almost_zero_mean + range_
                )
        else:
            self._maybe_z_scored_prior = self._prior

    def _log_prob_proposal_posterior(
        self,
        theta: Tensor,
        x: Tensor,
        masks: Tensor,
        proposal: DirectPosterior,
    ) -> Tensor:
        """Return the log-probability of the proposal posterior.

        If the proposal is a MoG, the density estimator is a MoG, and the prior is
        either Gaussian or uniform, we use non-atomic loss. Else, use atomic loss (which
        suffers from leakage).

        Args:
            theta: Batch of parameters θ.
            x: Batch of data.
            masks: Mask that is True for prior samples in the batch in order to train
                them with prior loss.
            proposal: Proposal distribution.

        Returns: Log-probability of the proposal posterior.
        """

        if self.use_non_atomic_loss:
            return self._log_prob_proposal_posterior_mog(theta, x, proposal)
        else:
            return self._log_prob_proposal_posterior_atomic(theta, x, masks)

    def _log_prob_proposal_posterior_atomic(
        self, theta: Tensor, x: Tensor, masks: Tensor
    ):
        """Return log probability of the proposal posterior for atomic proposals.

        We have two main options when evaluating the proposal posterior.
            (1) Generate atoms from the proposal prior.
            (2) Generate atoms from a more targeted distribution, such as the most
                recent posterior.
        If we choose the latter, it is likely beneficial not to do this in the first
        round, since we would be sampling from a randomly-initialized neural density
        estimator.

        Args:
            theta: Batch of parameters θ.
            x: Batch of data.
            masks: Mask that is True for prior samples in the batch in order to train
                them with prior loss.

        Returns:
            Log-probability of the proposal posterior.
        """

        batch_size = theta.shape[0]

        num_atoms = int(
            clamp_and_warn("num_atoms", self._num_atoms, min_val=2, max_val=batch_size)
        )

        # Each set of parameter atoms is evaluated using the same x,
        # so we repeat rows of the data x, e.g. [1, 2] -> [1, 1, 2, 2]
        repeated_x = repeat_rows(x, num_atoms)

        # To generate the full set of atoms for a given item in the batch,
        # we sample without replacement num_atoms - 1 times from the rest
        # of the theta in the batch.
        probs = ones(batch_size, batch_size) * (1 - eye(batch_size)) / (batch_size - 1)

        choices = torch.multinomial(probs, num_samples=num_atoms - 1, replacement=False)
        contrasting_theta = theta[choices]

        # We can now create our sets of atoms from the contrasting parameter sets
        # we have generated.
        atomic_theta = torch.cat((theta[:, None, :], contrasting_theta), dim=1).reshape(
            batch_size * num_atoms, -1
        )

        # Evaluate large batch giving (batch_size * num_atoms) log prob posterior evals.
        log_prob_posterior = self._neural_net.log_prob(atomic_theta, repeated_x)
        utils.assert_all_finite(log_prob_posterior, "posterior eval")
        log_prob_posterior = log_prob_posterior.reshape(batch_size, num_atoms)

        # Get (batch_size * num_atoms) log prob prior evals.
        log_prob_prior = self._prior.log_prob(atomic_theta)
        log_prob_prior = log_prob_prior.reshape(batch_size, num_atoms)
        utils.assert_all_finite(log_prob_prior, "prior eval")

        # Compute unnormalized proposal posterior.
        unnormalized_log_prob = log_prob_posterior - log_prob_prior

        # Normalize proposal posterior across discrete set of atoms.
        log_prob_proposal_posterior = unnormalized_log_prob[:, 0] - torch.logsumexp(
            unnormalized_log_prob, dim=-1
        )
        utils.assert_all_finite(log_prob_proposal_posterior, "proposal posterior eval")

        # XXX This evaluates the posterior on _all_ prior samples
        if self._use_combined_loss:
            log_prob_posterior_non_atomic = self._neural_net.log_prob(theta, x)
            masks = masks.reshape(-1)
            log_prob_proposal_posterior = (
                masks * log_prob_posterior_non_atomic + log_prob_proposal_posterior
            )

        return log_prob_proposal_posterior

    def _log_prob_proposal_posterior_mog(
        self, theta: Tensor, x: Tensor, proposal: DirectPosterior
    ) -> Tensor:
        """Return log-probability of the proposal posterior for MoG proposal.

        For MoG proposals and MoG density estimators, this can be done in closed form
        and does not require atomic loss (i.e. there will be no leakage issues).

        Notation:

        m are mean vectors.
        prec are precision matrices.
        cov are covariance matrices.

        _p at the end indicates that it is the proposal.
        _d indicates that it is the density estimator.
        _pp indicates the proposal posterior.

        All tensors will have shapes (batch_dim, num_components, ...)

        Args:
            theta: Batch of parameters θ.
            x: Batch of data.
            proposal: Proposal distribution.

        Returns:
            Log-probability of the proposal posterior.
        """

        # Evaluate the proposal. MDNs do not have functionality to run the embedding_net
        # and then get the mixture_components (**without** calling log_prob()). Hence,
        # we call them separately here.
        encoded_x = proposal.posterior_estimator._embedding_net(proposal.default_x)
        dist = (
            proposal.posterior_estimator._distribution
        )  # defined to avoid ugly black formatting.
        logits_p, m_p, prec_p, _, _ = dist.get_mixture_components(encoded_x)
        norm_logits_p = logits_p - torch.logsumexp(logits_p, dim=-1, keepdim=True)

        # Evaluate the density estimator.
        encoded_x = self._neural_net._embedding_net(x)
        dist = self._neural_net._distribution  # defined to avoid black formatting.
        logits_d, m_d, prec_d, _, _ = dist.get_mixture_components(encoded_x)
        norm_logits_d = logits_d - torch.logsumexp(logits_d, dim=-1, keepdim=True)

        # z-score theta if it z-scoring had been requested.
        theta = self._maybe_z_score_theta(theta)

        # Compute the MoG parameters of the proposal posterior.
        (
            logits_pp,
            m_pp,
            prec_pp,
            cov_pp,
        ) = self._automatic_posterior_transformation(
            norm_logits_p, m_p, prec_p, norm_logits_d, m_d, prec_d
        )

        # Compute the log_prob of theta under the product.
        log_prob_proposal_posterior = utils.mog_log_prob(
            theta, logits_pp, m_pp, prec_pp
        )
        utils.assert_all_finite(
            log_prob_proposal_posterior,
            """the evaluation of the MoG proposal posterior. This is likely due to a
            numerical instability in the training procedure. Please create an issue on
            Github.""",
        )

        return log_prob_proposal_posterior

    def _automatic_posterior_transformation(
        self,
        logits_p: Tensor,
        means_p: Tensor,
        precisions_p: Tensor,
        logits_d: Tensor,
        means_d: Tensor,
        precisions_d: Tensor,
    ):
        r"""Returns the MoG parameters of the proposal posterior.

        The proposal posterior is:
        $pp(\theta|x) = 1/Z * q(\theta|x) * prop(\theta) / p(\theta)$
        In words: proposal posterior = posterior estimate * proposal / prior.

        If the posterior estimate and the proposal are MoG and the prior is either
        Gaussian or uniform, we can solve this in closed-form. The is implemented in
        this function.

        This function implements Appendix A1 from Greenberg et al. 2019.

        We have to build L*K components. How do we do this?
        Example: proposal has two components, density estimator has three components.
        Let's call the two components of the proposal i,j and the three components
        of the density estimator x,y,z. We have to multiply every component of the
        proposal with every component of the density estimator. So, what we do is:
        1) for the proposal, build: i,i,i,j,j,j. Done with torch.repeat_interleave()
        2) for the density estimator, build: x,y,z,x,y,z. Done with torch.repeat()
        3) Multiply them with simple matrix operations.

        Args:
            logits_p: Component weight of each Gaussian of the proposal.
            means_p: Mean of each Gaussian of the proposal.
            precisions_p: Precision matrix of each Gaussian of the proposal.
            logits_d: Component weight for each Gaussian of the density estimator.
            means_d: Mean of each Gaussian of the density estimator.
            precisions_d: Precision matrix of each Gaussian of the density estimator.

        Returns: (Component weight, mean, precision matrix, covariance matrix) of each
            Gaussian of the proposal posterior. Has L*K terms (proposal has L terms,
            density estimator has K terms).
        """

        precisions_pp, covariances_pp = self._precisions_proposal_posterior(
            precisions_p, precisions_d
        )

        means_pp = self._means_proposal_posterior(
            covariances_pp, means_p, precisions_p, means_d, precisions_d
        )

        logits_pp = self._logits_proposal_posterior(
            means_pp,
            precisions_pp,
            covariances_pp,
            logits_p,
            means_p,
            precisions_p,
            logits_d,
            means_d,
            precisions_d,
        )

        return logits_pp, means_pp, precisions_pp, covariances_pp

    def _precisions_proposal_posterior(
        self, precisions_p: Tensor, precisions_d: Tensor
    ):
        """Return the precisions and covariances of the proposal posterior.

        Args:
            precisions_p: Precision matrices of the proposal distribution.
            precisions_d: Precision matrices of the density estimator.

        Returns: (Precisions, Covariances) of the proposal posterior. L*K terms.
        """

        num_comps_p = precisions_p.shape[1]
        num_comps_d = precisions_d.shape[1]

        precisions_p_rep = precisions_p.repeat_interleave(num_comps_d, dim=1)
        precisions_d_rep = precisions_d.repeat(1, num_comps_p, 1, 1)

        precisions_pp = precisions_p_rep + precisions_d_rep
        if isinstance(self._maybe_z_scored_prior, MultivariateNormal):
            precisions_pp -= self._maybe_z_scored_prior.precision_matrix

        covariances_pp = torch.inverse(precisions_pp)

        return precisions_pp, covariances_pp

    def _means_proposal_posterior(
        self,
        covariances_pp: Tensor,
        means_p: Tensor,
        precisions_p: Tensor,
        means_d: Tensor,
        precisions_d: Tensor,
    ):
        """Return the means of the proposal posterior.

        means_pp = C_ix * (P_i * m_i + P_x * m_x - P_o * m_o).

        Args:
            covariances_pp: Covariance matrices of the proposal posterior.
            means_p: Means of the proposal distribution.
            precisions_p: Precision matrices of the proposal distribution.
            means_d: Means of the density estimator.
            precisions_d: Precision matrices of the density estimator.

        Returns: Means of the proposal posterior. L*K terms.
        """

        num_comps_p = precisions_p.shape[1]
        num_comps_d = precisions_d.shape[1]

        # First, compute the product P_i * m_i and P_j * m_j
        prec_m_prod_p = batched_mixture_mv(precisions_p, means_p)
        prec_m_prod_d = batched_mixture_mv(precisions_d, means_d)

        # Repeat them to allow for matrix operations: same trick as for the precisions.
        prec_m_prod_p_rep = prec_m_prod_p.repeat_interleave(num_comps_d, dim=1)
        prec_m_prod_d_rep = prec_m_prod_d.repeat(1, num_comps_p, 1)

        # Means = C_ij * (P_i * m_i + P_x * m_x - P_o * m_o).
        summed_cov_m_prod_rep = prec_m_prod_p_rep + prec_m_prod_d_rep
        if isinstance(self._maybe_z_scored_prior, MultivariateNormal):
            summed_cov_m_prod_rep -= self.prec_m_prod_prior

        means_pp = batched_mixture_mv(covariances_pp, summed_cov_m_prod_rep)

        return means_pp

    @staticmethod
    def _logits_proposal_posterior(
        means_pp: Tensor,
        precisions_pp: Tensor,
        covariances_pp: Tensor,
        logits_p: Tensor,
        means_p: Tensor,
        precisions_p: Tensor,
        logits_d: Tensor,
        means_d: Tensor,
        precisions_d: Tensor,
    ):
        """Return the component weights (i.e. logits) of the proposal posterior.

        Args:
            means_pp: Means of the proposal posterior.
            precisions_pp: Precision matrices of the proposal posterior.
            covariances_pp: Covariance matrices of the proposal posterior.
            logits_p: Component weights (i.e. logits) of the proposal distribution.
            means_p: Means of the proposal distribution.
            precisions_p: Precision matrices of the proposal distribution.
            logits_d: Component weights (i.e. logits) of the density estimator.
            means_d: Means of the density estimator.
            precisions_d: Precision matrices of the density estimator.

        Returns: Component weights of the proposal posterior. L*K terms.
        """

        num_comps_p = precisions_p.shape[1]
        num_comps_d = precisions_d.shape[1]

        # Compute log(alpha_i * beta_j)
        logits_p_rep = logits_p.repeat_interleave(num_comps_d, dim=1)
        logits_d_rep = logits_d.repeat(1, num_comps_p)
        logit_factors = logits_p_rep + logits_d_rep

        # Compute sqrt(det()/(det()*det()))
        logdet_covariances_pp = torch.logdet(covariances_pp)
        logdet_covariances_p = -torch.logdet(precisions_p)
        logdet_covariances_d = -torch.logdet(precisions_d)

        # Repeat the proposal and density estimator terms such that there are LK terms.
        # Same trick as has been used above.
        logdet_covariances_p_rep = logdet_covariances_p.repeat_interleave(
            num_comps_d, dim=1
        )
        logdet_covariances_d_rep = logdet_covariances_d.repeat(1, num_comps_p)

        log_sqrt_det_ratio = 0.5 * (
            logdet_covariances_pp
            - (logdet_covariances_p_rep + logdet_covariances_d_rep)
        )

        # Compute for proposal, density estimator, and proposal posterior:
        # mu_i.T * P_i * mu_i
        exponent_p = batched_mixture_vmv(precisions_p, means_p)
        exponent_d = batched_mixture_vmv(precisions_d, means_d)
        exponent_pp = batched_mixture_vmv(precisions_pp, means_pp)

        # Extend proposal and density estimator exponents to get LK terms.
        exponent_p_rep = exponent_p.repeat_interleave(num_comps_d, dim=1)
        exponent_d_rep = exponent_d.repeat(1, num_comps_p)
        exponent = -0.5 * (exponent_p_rep + exponent_d_rep - exponent_pp)

        logits_pp = logit_factors + log_sqrt_det_ratio + exponent

        return logits_pp

    def _maybe_z_score_theta(self, theta: Tensor) -> Tensor:
        """Return potentially standardized theta if z-scoring was requested."""

        if self.z_score_theta:
            theta, _ = self._neural_net._transform(theta)

        return theta

__init__(prior=None, density_estimator='maf', device='cpu', logging_level='WARNING', summary_writer=None, show_progress_bars=True)

SNPE-C / APT [1].

[1] Automatic Posterior Transformation for Likelihood-free Inference, Greenberg et al., ICML 2019, https://arxiv.org/abs/1905.07488.

This class implements two loss variants of SNPE-C: the non-atomic and the atomic version. The atomic loss of SNPE-C can be used for any density estimator, i.e. also for normalizing flows. However, it suffers from leakage issues. On the other hand, the non-atomic loss can only be used only if the proposal distribution is a mixture of Gaussians, the density estimator is a mixture of Gaussians, and the prior is either Gaussian or Uniform. It does not suffer from leakage issues. At the beginning of each round, we print whether the non-atomic or the atomic version is used.

In this codebase, we will automatically switch to the non-atomic loss if the following criteria are fulfilled:
- proposal is a DirectPosterior with density_estimator mdn, as built with utils.sbi.posterior_nn().
- the density estimator is a mdn, as built with utils.sbi.posterior_nn().
- isinstance(prior, MultivariateNormal) (from torch.distributions) or isinstance(prior, sbi.utils.BoxUniform)

Note that custom implementations of any of these densities (or estimators) will not trigger the non-atomic loss, and the algorithm will fall back onto using the atomic loss.

Parameters:

Name Type Description Default
prior Optional[Distribution]

A probability distribution that expresses prior knowledge about the parameters, e.g. which ranges are meaningful for them.

None
density_estimator Union[str, Callable]

If it is a string, use a pre-configured network of the provided type (one of nsf, maf, mdn, made). Alternatively, a function that builds a custom neural network can be provided. The function will be called with the first batch of simulations (theta, x), which can thus be used for shape inference and potentially for z-scoring. It needs to return a PyTorch nn.Module implementing the density estimator. The density estimator needs to provide the methods .log_prob and .sample().

'maf'
device str

Training device, e.g., “cpu”, “cuda” or “cuda:{0, 1, …}”.

'cpu'
logging_level Union[int, str]

Minimum severity of messages to log. One of the strings INFO, WARNING, DEBUG, ERROR and CRITICAL.

'WARNING'
summary_writer Optional[TensorboardSummaryWriter]

A tensorboard SummaryWriter to control, among others, log file location (default is <current working directory>/logs.)

None
show_progress_bars bool

Whether to show a progressbar during training.

True
Source code in sbi/inference/snpe/snpe_c.py
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
def __init__(
    self,
    prior: Optional[Distribution] = None,
    density_estimator: Union[str, Callable] = "maf",
    device: str = "cpu",
    logging_level: Union[int, str] = "WARNING",
    summary_writer: Optional[TensorboardSummaryWriter] = None,
    show_progress_bars: bool = True,
):
    r"""SNPE-C / APT [1].

    [1] _Automatic Posterior Transformation for Likelihood-free Inference_,
        Greenberg et al., ICML 2019, https://arxiv.org/abs/1905.07488.

    This class implements two loss variants of SNPE-C: the non-atomic and the atomic
    version. The atomic loss of SNPE-C can be used for any density estimator,
    i.e. also for normalizing flows. However, it suffers from leakage issues. On
    the other hand, the non-atomic loss can only be used only if the proposal
    distribution is a mixture of Gaussians, the density estimator is a mixture of
    Gaussians, and the prior is either Gaussian or Uniform. It does not suffer from
    leakage issues. At the beginning of each round, we print whether the non-atomic
    or the atomic version is used.

    In this codebase, we will automatically switch to the non-atomic loss if the
    following criteria are fulfilled:<br/>
    - proposal is a `DirectPosterior` with density_estimator `mdn`, as built
        with `utils.sbi.posterior_nn()`.<br/>
    - the density estimator is a `mdn`, as built with
        `utils.sbi.posterior_nn()`.<br/>
    - `isinstance(prior, MultivariateNormal)` (from `torch.distributions`) or
        `isinstance(prior, sbi.utils.BoxUniform)`

    Note that custom implementations of any of these densities (or estimators) will
    not trigger the non-atomic loss, and the algorithm will fall back onto using
    the atomic loss.

    Args:
        prior: A probability distribution that expresses prior knowledge about the
            parameters, e.g. which ranges are meaningful for them.
        density_estimator: If it is a string, use a pre-configured network of the
            provided type (one of nsf, maf, mdn, made). Alternatively, a function
            that builds a custom neural network can be provided. The function will
            be called with the first batch of simulations (theta, x), which can
            thus be used for shape inference and potentially for z-scoring. It
            needs to return a PyTorch `nn.Module` implementing the density
            estimator. The density estimator needs to provide the methods
            `.log_prob` and `.sample()`.
        device: Training device, e.g., "cpu", "cuda" or "cuda:{0, 1, ...}".
        logging_level: Minimum severity of messages to log. One of the strings
            INFO, WARNING, DEBUG, ERROR and CRITICAL.
        summary_writer: A tensorboard `SummaryWriter` to control, among others, log
            file location (default is `<current working directory>/logs`.)
        show_progress_bars: Whether to show a progressbar during training.
    """

    kwargs = del_entries(locals(), entries=("self", "__class__"))
    super().__init__(**kwargs)

train(num_atoms=10, training_batch_size=50, learning_rate=0.0005, validation_fraction=0.1, stop_after_epochs=20, max_num_epochs=2 ** 31 - 1, clip_max_norm=5.0, calibration_kernel=None, resume_training=False, force_first_round_loss=False, discard_prior_samples=False, use_combined_loss=False, retrain_from_scratch=False, show_train_summary=False, dataloader_kwargs=None)

Return density estimator that approximates the distribution \(p(\theta|x)\).

Parameters:

Name Type Description Default
num_atoms int

Number of atoms to use for classification.

10
training_batch_size int

Training batch size.

50
learning_rate float

Learning rate for Adam optimizer.

0.0005
validation_fraction float

The fraction of data to use for validation.

0.1
stop_after_epochs int

The number of epochs to wait for improvement on the validation set before terminating training.

20
max_num_epochs int

Maximum number of epochs to run. If reached, we stop training even when the validation loss is still decreasing. Otherwise, we train until validation loss increases (see also stop_after_epochs).

2 ** 31 - 1
clip_max_norm Optional[float]

Value at which to clip the total gradient norm in order to prevent exploding gradients. Use None for no clipping.

5.0
calibration_kernel Optional[Callable]

A function to calibrate the loss with respect to the simulations x. See Lueckmann, Gonçalves et al., NeurIPS 2017.

None
resume_training bool

Can be used in case training time is limited, e.g. on a cluster. If True, the split between train and validation set, the optimizer, the number of epochs, and the best validation log-prob will be restored from the last time .train() was called.

False
force_first_round_loss bool

If True, train with maximum likelihood, i.e., potentially ignoring the correction for using a proposal distribution different from the prior.

False
discard_prior_samples bool

Whether to discard samples simulated in round 1, i.e. from the prior. Training may be sped up by ignoring such less targeted samples.

False
use_combined_loss bool

Whether to train the neural net also on prior samples using maximum likelihood in addition to training it on all samples using atomic loss. The extra MLE loss helps prevent density leaking with bounded priors.

False
retrain_from_scratch bool

Whether to retrain the conditional density estimator for the posterior from scratch each round.

False
show_train_summary bool

Whether to print the number of epochs and validation loss and leakage after the training.

False
dataloader_kwargs Optional[Dict]

Additional or updated kwargs to be passed to the training and validation dataloaders (like, e.g., a collate_fn)

None

Returns:

Type Description
Module

Density estimator that approximates the distribution \(p(\theta|x)\).

Source code in sbi/inference/snpe/snpe_c.py
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
def train(
    self,
    num_atoms: int = 10,
    training_batch_size: int = 50,
    learning_rate: float = 5e-4,
    validation_fraction: float = 0.1,
    stop_after_epochs: int = 20,
    max_num_epochs: int = 2**31 - 1,
    clip_max_norm: Optional[float] = 5.0,
    calibration_kernel: Optional[Callable] = None,
    resume_training: bool = False,
    force_first_round_loss: bool = False,
    discard_prior_samples: bool = False,
    use_combined_loss: bool = False,
    retrain_from_scratch: bool = False,
    show_train_summary: bool = False,
    dataloader_kwargs: Optional[Dict] = None,
) -> nn.Module:
    r"""Return density estimator that approximates the distribution $p(\theta|x)$.

    Args:
        num_atoms: Number of atoms to use for classification.
        training_batch_size: Training batch size.
        learning_rate: Learning rate for Adam optimizer.
        validation_fraction: The fraction of data to use for validation.
        stop_after_epochs: The number of epochs to wait for improvement on the
            validation set before terminating training.
        max_num_epochs: Maximum number of epochs to run. If reached, we stop
            training even when the validation loss is still decreasing. Otherwise,
            we train until validation loss increases (see also `stop_after_epochs`).
        clip_max_norm: Value at which to clip the total gradient norm in order to
            prevent exploding gradients. Use None for no clipping.
        calibration_kernel: A function to calibrate the loss with respect to the
            simulations `x`. See Lueckmann, Gonçalves et al., NeurIPS 2017.
        resume_training: Can be used in case training time is limited, e.g. on a
            cluster. If `True`, the split between train and validation set, the
            optimizer, the number of epochs, and the best validation log-prob will
            be restored from the last time `.train()` was called.
        force_first_round_loss: If `True`, train with maximum likelihood,
            i.e., potentially ignoring the correction for using a proposal
            distribution different from the prior.
        discard_prior_samples: Whether to discard samples simulated in round 1, i.e.
            from the prior. Training may be sped up by ignoring such less targeted
            samples.
        use_combined_loss: Whether to train the neural net also on prior samples
            using maximum likelihood in addition to training it on all samples using
            atomic loss. The extra MLE loss helps prevent density leaking with
            bounded priors.
        retrain_from_scratch: Whether to retrain the conditional density
            estimator for the posterior from scratch each round.
        show_train_summary: Whether to print the number of epochs and validation
            loss and leakage after the training.
        dataloader_kwargs: Additional or updated kwargs to be passed to the training
            and validation dataloaders (like, e.g., a collate_fn)

    Returns:
        Density estimator that approximates the distribution $p(\theta|x)$.
    """

    # WARNING: sneaky trick ahead. We proxy the parent's `train` here,
    # requiring the signature to have `num_atoms`, save it for use below, and
    # continue. It's sneaky because we are using the object (self) as a namespace
    # to pass arguments between functions, and that's implicit state management.
    self._num_atoms = num_atoms
    self._use_combined_loss = use_combined_loss
    kwargs = del_entries(
        locals(),
        entries=("self", "__class__", "num_atoms", "use_combined_loss"),
    )

    self._round = max(self._data_round_index)

    if self._round > 0:
        # Set the proposal to the last proposal that was passed by the user. For
        # atomic SNPE, it does not matter what the proposal is. For non-atomic
        # SNPE, we only use the latest data that was passed, i.e. the one from the
        # last proposal.
        proposal = self._proposal_roundwise[-1]
        self.use_non_atomic_loss = (
            isinstance(proposal, DirectPosterior)
            and isinstance(proposal.posterior_estimator._distribution, mdn)
            and isinstance(self._neural_net._distribution, mdn)
            and check_dist_class(
                self._prior, class_to_check=(Uniform, MultivariateNormal)
            )[0]
        )

        algorithm = "non-atomic" if self.use_non_atomic_loss else "atomic"
        print(f"Using SNPE-C with {algorithm} loss")

        if self.use_non_atomic_loss:
            # Take care of z-scoring, pre-compute and store prior terms.
            self._set_state_for_mog_proposal()

    return super().train(**kwargs)

Bases: LikelihoodEstimator

Source code in sbi/inference/snle/snle_a.py
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
class SNLE_A(LikelihoodEstimator):
    def __init__(
        self,
        prior: Optional[Distribution] = None,
        density_estimator: Union[str, Callable] = "maf",
        device: str = "cpu",
        logging_level: Union[int, str] = "WARNING",
        summary_writer: Optional[TensorboardSummaryWriter] = None,
        show_progress_bars: bool = True,
    ):
        r"""Sequential Neural Likelihood [1].

        [1] Sequential Neural Likelihood: Fast Likelihood-free Inference with
        Autoregressive Flows_, Papamakarios et al., AISTATS 2019,
        https://arxiv.org/abs/1805.07226

        Args:
            prior: A probability distribution that expresses prior knowledge about the
                parameters, e.g. which ranges are meaningful for them. If `None`, the
                prior must be passed to `.build_posterior()`.
            density_estimator: If it is a string, use a pre-configured network of the
                provided type (one of nsf, maf, mdn, made). Alternatively, a function
                that builds a custom neural network can be provided. The function will
                be called with the first batch of simulations (theta, x), which can
                thus be used for shape inference and potentially for z-scoring. It
                needs to return a PyTorch `nn.Module` implementing the density
                estimator. The density estimator needs to provide the methods
                `.log_prob` and `.sample()`.
            device: Training device, e.g., "cpu", "cuda" or "cuda:{0, 1, ...}".
            logging_level: Minimum severity of messages to log. One of the strings
                INFO, WARNING, DEBUG, ERROR and CRITICAL.
            summary_writer: A tensorboard `SummaryWriter` to control, among others, log
                file location (default is `<current working directory>/logs`.)
            show_progress_bars: Whether to show a progressbar during simulation and
                sampling.
        """

        kwargs = del_entries(locals(), entries=("self", "__class__"))
        super().__init__(**kwargs)

__init__(prior=None, density_estimator='maf', device='cpu', logging_level='WARNING', summary_writer=None, show_progress_bars=True)

Sequential Neural Likelihood [1].

[1] Sequential Neural Likelihood: Fast Likelihood-free Inference with Autoregressive Flows_, Papamakarios et al., AISTATS 2019, https://arxiv.org/abs/1805.07226

Parameters:

Name Type Description Default
prior Optional[Distribution]

A probability distribution that expresses prior knowledge about the parameters, e.g. which ranges are meaningful for them. If None, the prior must be passed to .build_posterior().

None
density_estimator Union[str, Callable]

If it is a string, use a pre-configured network of the provided type (one of nsf, maf, mdn, made). Alternatively, a function that builds a custom neural network can be provided. The function will be called with the first batch of simulations (theta, x), which can thus be used for shape inference and potentially for z-scoring. It needs to return a PyTorch nn.Module implementing the density estimator. The density estimator needs to provide the methods .log_prob and .sample().

'maf'
device str

Training device, e.g., “cpu”, “cuda” or “cuda:{0, 1, …}”.

'cpu'
logging_level Union[int, str]

Minimum severity of messages to log. One of the strings INFO, WARNING, DEBUG, ERROR and CRITICAL.

'WARNING'
summary_writer Optional[TensorboardSummaryWriter]

A tensorboard SummaryWriter to control, among others, log file location (default is <current working directory>/logs.)

None
show_progress_bars bool

Whether to show a progressbar during simulation and sampling.

True
Source code in sbi/inference/snle/snle_a.py
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
def __init__(
    self,
    prior: Optional[Distribution] = None,
    density_estimator: Union[str, Callable] = "maf",
    device: str = "cpu",
    logging_level: Union[int, str] = "WARNING",
    summary_writer: Optional[TensorboardSummaryWriter] = None,
    show_progress_bars: bool = True,
):
    r"""Sequential Neural Likelihood [1].

    [1] Sequential Neural Likelihood: Fast Likelihood-free Inference with
    Autoregressive Flows_, Papamakarios et al., AISTATS 2019,
    https://arxiv.org/abs/1805.07226

    Args:
        prior: A probability distribution that expresses prior knowledge about the
            parameters, e.g. which ranges are meaningful for them. If `None`, the
            prior must be passed to `.build_posterior()`.
        density_estimator: If it is a string, use a pre-configured network of the
            provided type (one of nsf, maf, mdn, made). Alternatively, a function
            that builds a custom neural network can be provided. The function will
            be called with the first batch of simulations (theta, x), which can
            thus be used for shape inference and potentially for z-scoring. It
            needs to return a PyTorch `nn.Module` implementing the density
            estimator. The density estimator needs to provide the methods
            `.log_prob` and `.sample()`.
        device: Training device, e.g., "cpu", "cuda" or "cuda:{0, 1, ...}".
        logging_level: Minimum severity of messages to log. One of the strings
            INFO, WARNING, DEBUG, ERROR and CRITICAL.
        summary_writer: A tensorboard `SummaryWriter` to control, among others, log
            file location (default is `<current working directory>/logs`.)
        show_progress_bars: Whether to show a progressbar during simulation and
            sampling.
    """

    kwargs = del_entries(locals(), entries=("self", "__class__"))
    super().__init__(**kwargs)

Bases: RatioEstimator

Source code in sbi/inference/snre/snre_a.py
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
class SNRE_A(RatioEstimator):
    def __init__(
        self,
        prior: Optional[Distribution] = None,
        classifier: Union[str, Callable] = "resnet",
        device: str = "cpu",
        logging_level: Union[int, str] = "warning",
        summary_writer: Optional[TensorboardSummaryWriter] = None,
        show_progress_bars: bool = True,
    ):
        r"""AALR[1], here known as SNRE_A.

        [1] _Likelihood-free MCMC with Amortized Approximate Likelihood Ratios_, Hermans
            et al., ICML 2020, https://arxiv.org/abs/1903.04057

        Args:
            prior: A probability distribution that expresses prior knowledge about the
                parameters, e.g. which ranges are meaningful for them. If `None`, the
                prior must be passed to `.build_posterior()`.
            classifier: Classifier trained to approximate likelihood ratios. If it is
                a string, use a pre-configured network of the provided type (one of
                linear, mlp, resnet). Alternatively, a function that builds a custom
                neural network can be provided. The function will be called with the
                first batch of simulations (theta, x), which can thus be used for shape
                inference and potentially for z-scoring. It needs to return a PyTorch
                `nn.Module` implementing the classifier.
            device: Training device, e.g., "cpu", "cuda" or "cuda:{0, 1, ...}".
            logging_level: Minimum severity of messages to log. One of the strings
                INFO, WARNING, DEBUG, ERROR and CRITICAL.
            summary_writer: A tensorboard `SummaryWriter` to control, among others, log
                file location (default is `<current working directory>/logs`.)
            show_progress_bars: Whether to show a progressbar during simulation and
                sampling.
        """

        kwargs = del_entries(locals(), entries=("self", "__class__"))
        super().__init__(**kwargs)

    def train(
        self,
        training_batch_size: int = 50,
        learning_rate: float = 5e-4,
        validation_fraction: float = 0.1,
        stop_after_epochs: int = 20,
        max_num_epochs: int = 2**31 - 1,
        clip_max_norm: Optional[float] = 5.0,
        resume_training: bool = False,
        discard_prior_samples: bool = False,
        retrain_from_scratch: bool = False,
        show_train_summary: bool = False,
        dataloader_kwargs: Optional[Dict] = None,
        loss_kwargs: Dict[str, Any] = {},
    ) -> nn.Module:
        r"""Return classifier that approximates the ratio $p(\theta,x)/p(\theta)p(x)$.

        Args:
            training_batch_size: Training batch size.
            learning_rate: Learning rate for Adam optimizer.
            validation_fraction: The fraction of data to use for validation.
            stop_after_epochs: The number of epochs to wait for improvement on the
                validation set before terminating training.
            max_num_epochs: Maximum number of epochs to run. If reached, we stop
                training even when the validation loss is still decreasing. Otherwise,
                we train until validation loss increases (see also `stop_after_epochs`).
            clip_max_norm: Value at which to clip the total gradient norm in order to
                prevent exploding gradients. Use None for no clipping.
            resume_training: Can be used in case training time is limited, e.g. on a
                cluster. If `True`, the split between train and validation set, the
                optimizer, the number of epochs, and the best validation log-prob will
                be restored from the last time `.train()` was called.
            discard_prior_samples: Whether to discard samples simulated in round 1, i.e.
                from the prior. Training may be sped up by ignoring such less targeted
                samples.
            retrain_from_scratch: Whether to retrain the conditional density
                estimator for the posterior from scratch each round.
            show_train_summary: Whether to print the number of epochs and validation
                loss and leakage after the training.
            dataloader_kwargs: Additional or updated kwargs to be passed to the training
                and validation dataloaders (like, e.g., a collate_fn)
            loss_kwargs: Additional or updated kwargs to be passed to the self._loss fn.

        Returns:
            Classifier that approximates the ratio $p(\theta,x)/p(\theta)p(x)$.
        """

        # AALR is defined for `num_atoms=2`.
        # Proxy to `super().__call__` to ensure right parameter.
        kwargs = del_entries(locals(), entries=("self", "__class__"))
        return super().train(**kwargs, num_atoms=2)

    def _loss(self, theta: Tensor, x: Tensor, num_atoms: int) -> Tensor:
        """Returns the binary cross-entropy loss for the trained classifier.

        The classifier takes as input a $(\theta,x)$ pair. It is trained to predict 1
        if the pair was sampled from the joint $p(\theta,x)$, and to predict 0 if the
        pair was sampled from the marginals $p(\theta)p(x)$.
        """

        assert theta.shape[0] == x.shape[0], "Batch sizes for theta and x must match."
        batch_size = theta.shape[0]

        logits = self._classifier_logits(theta, x, num_atoms)
        likelihood = torch.sigmoid(logits).squeeze()

        # Alternating pairs where there is one sampled from the joint and one
        # sampled from the marginals. The first element is sampled from the
        # joint p(theta, x) and is labelled 1. The second element is sampled
        # from the marginals p(theta)p(x) and is labelled 0. And so on.
        labels = ones(2 * batch_size, device=self._device)  # two atoms
        labels[1::2] = 0.0

        # Binary cross entropy to learn the likelihood (AALR-specific)
        return nn.BCELoss()(likelihood, labels)

__init__(prior=None, classifier='resnet', device='cpu', logging_level='warning', summary_writer=None, show_progress_bars=True)

AALR[1], here known as SNRE_A.

[1] Likelihood-free MCMC with Amortized Approximate Likelihood Ratios, Hermans et al., ICML 2020, https://arxiv.org/abs/1903.04057

Parameters:

Name Type Description Default
prior Optional[Distribution]

A probability distribution that expresses prior knowledge about the parameters, e.g. which ranges are meaningful for them. If None, the prior must be passed to .build_posterior().

None
classifier Union[str, Callable]

Classifier trained to approximate likelihood ratios. If it is a string, use a pre-configured network of the provided type (one of linear, mlp, resnet). Alternatively, a function that builds a custom neural network can be provided. The function will be called with the first batch of simulations (theta, x), which can thus be used for shape inference and potentially for z-scoring. It needs to return a PyTorch nn.Module implementing the classifier.

'resnet'
device str

Training device, e.g., “cpu”, “cuda” or “cuda:{0, 1, …}”.

'cpu'
logging_level Union[int, str]

Minimum severity of messages to log. One of the strings INFO, WARNING, DEBUG, ERROR and CRITICAL.

'warning'
summary_writer Optional[TensorboardSummaryWriter]

A tensorboard SummaryWriter to control, among others, log file location (default is <current working directory>/logs.)

None
show_progress_bars bool

Whether to show a progressbar during simulation and sampling.

True
Source code in sbi/inference/snre/snre_a.py
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
def __init__(
    self,
    prior: Optional[Distribution] = None,
    classifier: Union[str, Callable] = "resnet",
    device: str = "cpu",
    logging_level: Union[int, str] = "warning",
    summary_writer: Optional[TensorboardSummaryWriter] = None,
    show_progress_bars: bool = True,
):
    r"""AALR[1], here known as SNRE_A.

    [1] _Likelihood-free MCMC with Amortized Approximate Likelihood Ratios_, Hermans
        et al., ICML 2020, https://arxiv.org/abs/1903.04057

    Args:
        prior: A probability distribution that expresses prior knowledge about the
            parameters, e.g. which ranges are meaningful for them. If `None`, the
            prior must be passed to `.build_posterior()`.
        classifier: Classifier trained to approximate likelihood ratios. If it is
            a string, use a pre-configured network of the provided type (one of
            linear, mlp, resnet). Alternatively, a function that builds a custom
            neural network can be provided. The function will be called with the
            first batch of simulations (theta, x), which can thus be used for shape
            inference and potentially for z-scoring. It needs to return a PyTorch
            `nn.Module` implementing the classifier.
        device: Training device, e.g., "cpu", "cuda" or "cuda:{0, 1, ...}".
        logging_level: Minimum severity of messages to log. One of the strings
            INFO, WARNING, DEBUG, ERROR and CRITICAL.
        summary_writer: A tensorboard `SummaryWriter` to control, among others, log
            file location (default is `<current working directory>/logs`.)
        show_progress_bars: Whether to show a progressbar during simulation and
            sampling.
    """

    kwargs = del_entries(locals(), entries=("self", "__class__"))
    super().__init__(**kwargs)

train(training_batch_size=50, learning_rate=0.0005, validation_fraction=0.1, stop_after_epochs=20, max_num_epochs=2 ** 31 - 1, clip_max_norm=5.0, resume_training=False, discard_prior_samples=False, retrain_from_scratch=False, show_train_summary=False, dataloader_kwargs=None, loss_kwargs={})

Return classifier that approximates the ratio \(p(\theta,x)/p(\theta)p(x)\).

Parameters:

Name Type Description Default
training_batch_size int

Training batch size.

50
learning_rate float

Learning rate for Adam optimizer.

0.0005
validation_fraction float

The fraction of data to use for validation.

0.1
stop_after_epochs int

The number of epochs to wait for improvement on the validation set before terminating training.

20
max_num_epochs int

Maximum number of epochs to run. If reached, we stop training even when the validation loss is still decreasing. Otherwise, we train until validation loss increases (see also stop_after_epochs).

2 ** 31 - 1
clip_max_norm Optional[float]

Value at which to clip the total gradient norm in order to prevent exploding gradients. Use None for no clipping.

5.0
resume_training bool

Can be used in case training time is limited, e.g. on a cluster. If True, the split between train and validation set, the optimizer, the number of epochs, and the best validation log-prob will be restored from the last time .train() was called.

False
discard_prior_samples bool

Whether to discard samples simulated in round 1, i.e. from the prior. Training may be sped up by ignoring such less targeted samples.

False
retrain_from_scratch bool

Whether to retrain the conditional density estimator for the posterior from scratch each round.

False
show_train_summary bool

Whether to print the number of epochs and validation loss and leakage after the training.

False
dataloader_kwargs Optional[Dict]

Additional or updated kwargs to be passed to the training and validation dataloaders (like, e.g., a collate_fn)

None
loss_kwargs Dict[str, Any]

Additional or updated kwargs to be passed to the self._loss fn.

{}

Returns:

Type Description
Module

Classifier that approximates the ratio \(p(\theta,x)/p(\theta)p(x)\).

Source code in sbi/inference/snre/snre_a.py
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
def train(
    self,
    training_batch_size: int = 50,
    learning_rate: float = 5e-4,
    validation_fraction: float = 0.1,
    stop_after_epochs: int = 20,
    max_num_epochs: int = 2**31 - 1,
    clip_max_norm: Optional[float] = 5.0,
    resume_training: bool = False,
    discard_prior_samples: bool = False,
    retrain_from_scratch: bool = False,
    show_train_summary: bool = False,
    dataloader_kwargs: Optional[Dict] = None,
    loss_kwargs: Dict[str, Any] = {},
) -> nn.Module:
    r"""Return classifier that approximates the ratio $p(\theta,x)/p(\theta)p(x)$.

    Args:
        training_batch_size: Training batch size.
        learning_rate: Learning rate for Adam optimizer.
        validation_fraction: The fraction of data to use for validation.
        stop_after_epochs: The number of epochs to wait for improvement on the
            validation set before terminating training.
        max_num_epochs: Maximum number of epochs to run. If reached, we stop
            training even when the validation loss is still decreasing. Otherwise,
            we train until validation loss increases (see also `stop_after_epochs`).
        clip_max_norm: Value at which to clip the total gradient norm in order to
            prevent exploding gradients. Use None for no clipping.
        resume_training: Can be used in case training time is limited, e.g. on a
            cluster. If `True`, the split between train and validation set, the
            optimizer, the number of epochs, and the best validation log-prob will
            be restored from the last time `.train()` was called.
        discard_prior_samples: Whether to discard samples simulated in round 1, i.e.
            from the prior. Training may be sped up by ignoring such less targeted
            samples.
        retrain_from_scratch: Whether to retrain the conditional density
            estimator for the posterior from scratch each round.
        show_train_summary: Whether to print the number of epochs and validation
            loss and leakage after the training.
        dataloader_kwargs: Additional or updated kwargs to be passed to the training
            and validation dataloaders (like, e.g., a collate_fn)
        loss_kwargs: Additional or updated kwargs to be passed to the self._loss fn.

    Returns:
        Classifier that approximates the ratio $p(\theta,x)/p(\theta)p(x)$.
    """

    # AALR is defined for `num_atoms=2`.
    # Proxy to `super().__call__` to ensure right parameter.
    kwargs = del_entries(locals(), entries=("self", "__class__"))
    return super().train(**kwargs, num_atoms=2)

Bases: RatioEstimator

Source code in sbi/inference/snre/snre_b.py
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
class SNRE_B(RatioEstimator):
    def __init__(
        self,
        prior: Optional[Distribution] = None,
        classifier: Union[str, Callable] = "resnet",
        device: str = "cpu",
        logging_level: Union[int, str] = "warning",
        summary_writer: Optional[TensorboardSummaryWriter] = None,
        show_progress_bars: bool = True,
    ):
        r"""SRE[1], here known as SNRE_B.

        [1] _On Contrastive Learning for Likelihood-free Inference_, Durkan et al.,
            ICML 2020, https://arxiv.org/pdf/2002.03712

        Args:
            prior: A probability distribution that expresses prior knowledge about the
                parameters, e.g. which ranges are meaningful for them. If `None`, the
                prior must be passed to `.build_posterior()`.
            classifier: Classifier trained to approximate likelihood ratios. If it is
                a string, use a pre-configured network of the provided type (one of
                linear, mlp, resnet). Alternatively, a function that builds a custom
                neural network can be provided. The function will be called with the
                first batch of simulations (theta, x), which can thus be used for shape
                inference and potentially for z-scoring. It needs to return a PyTorch
                `nn.Module` implementing the classifier.
            device: Training device, e.g., "cpu", "cuda" or "cuda:{0, 1, ...}".
            logging_level: Minimum severity of messages to log. One of the strings
                INFO, WARNING, DEBUG, ERROR and CRITICAL.
            summary_writer: A tensorboard `SummaryWriter` to control, among others, log
                file location (default is `<current working directory>/logs`.)
            show_progress_bars: Whether to show a progressbar during simulation and
                sampling.
        """

        kwargs = del_entries(locals(), entries=("self", "__class__"))
        super().__init__(**kwargs)

    def train(
        self,
        num_atoms: int = 10,
        training_batch_size: int = 50,
        learning_rate: float = 5e-4,
        validation_fraction: float = 0.1,
        stop_after_epochs: int = 20,
        max_num_epochs: int = 2**31 - 1,
        clip_max_norm: Optional[float] = 5.0,
        resume_training: bool = False,
        discard_prior_samples: bool = False,
        retrain_from_scratch: bool = False,
        show_train_summary: bool = False,
        dataloader_kwargs: Optional[Dict] = None,
    ) -> nn.Module:
        r"""Return classifier that approximates the ratio $p(\theta,x)/p(\theta)p(x)$.

        Args:
            num_atoms: Number of atoms to use for classification.
            training_batch_size: Training batch size.
            learning_rate: Learning rate for Adam optimizer.
            validation_fraction: The fraction of data to use for validation.
            stop_after_epochs: The number of epochs to wait for improvement on the
                validation set before terminating training.
            max_num_epochs: Maximum number of epochs to run. If reached, we stop
                training even when the validation loss is still decreasing. Otherwise,
                we train until validation loss increases (see also `stop_after_epochs`).
            clip_max_norm: Value at which to clip the total gradient norm in order to
                prevent exploding gradients. Use None for no clipping.
            resume_training: Can be used in case training time is limited, e.g. on a
                cluster. If `True`, the split between train and validation set, the
                optimizer, the number of epochs, and the best validation log-prob will
                be restored from the last time `.train()` was called.
            discard_prior_samples: Whether to discard samples simulated in round 1, i.e.
                from the prior. Training may be sped up by ignoring such less targeted
                samples.
            retrain_from_scratch: Whether to retrain the conditional density
                estimator for the posterior from scratch each round.
            show_train_summary: Whether to print the number of epochs and validation
                loss and leakage after the training.
            dataloader_kwargs: Additional or updated kwargs to be passed to the training
                and validation dataloaders (like, e.g., a collate_fn)

        Returns:
            Classifier that approximates the ratio $p(\theta,x)/p(\theta)p(x)$.
        """
        kwargs = del_entries(locals(), entries=("self", "__class__"))
        return super().train(**kwargs)

    def _loss(self, theta: Tensor, x: Tensor, num_atoms: int) -> Tensor:
        r"""Return cross-entropy (via softmax activation) loss for 1-out-of-`num_atoms`
        classification.

        The classifier takes as input `num_atoms` $(\theta,x)$ pairs. Out of these
        pairs, one pair was sampled from the joint $p(\theta,x)$ and all others from the
        marginals $p(\theta)p(x)$. The classifier is trained to predict which of the
        pairs was sampled from the joint $p(\theta,x)$.
        """

        assert theta.shape[0] == x.shape[0], "Batch sizes for theta and x must match."
        batch_size = theta.shape[0]
        logits = self._classifier_logits(theta, x, num_atoms)

        # For 1-out-of-`num_atoms` classification each datapoint consists
        # of `num_atoms` points, with one of them being the correct one.
        # We have a batch of `batch_size` such datapoints.
        logits = logits.reshape(batch_size, num_atoms)

        # Index 0 is the theta-x-pair sampled from the joint p(theta,x) and hence the
        # "correct" one for the 1-out-of-N classification.
        log_prob = logits[:, 0] - torch.logsumexp(logits, dim=-1)

        return -torch.mean(log_prob)

__init__(prior=None, classifier='resnet', device='cpu', logging_level='warning', summary_writer=None, show_progress_bars=True)

SRE[1], here known as SNRE_B.

[1] On Contrastive Learning for Likelihood-free Inference, Durkan et al., ICML 2020, https://arxiv.org/pdf/2002.03712

Parameters:

Name Type Description Default
prior Optional[Distribution]

A probability distribution that expresses prior knowledge about the parameters, e.g. which ranges are meaningful for them. If None, the prior must be passed to .build_posterior().

None
classifier Union[str, Callable]

Classifier trained to approximate likelihood ratios. If it is a string, use a pre-configured network of the provided type (one of linear, mlp, resnet). Alternatively, a function that builds a custom neural network can be provided. The function will be called with the first batch of simulations (theta, x), which can thus be used for shape inference and potentially for z-scoring. It needs to return a PyTorch nn.Module implementing the classifier.

'resnet'
device str

Training device, e.g., “cpu”, “cuda” or “cuda:{0, 1, …}”.

'cpu'
logging_level Union[int, str]

Minimum severity of messages to log. One of the strings INFO, WARNING, DEBUG, ERROR and CRITICAL.

'warning'
summary_writer Optional[TensorboardSummaryWriter]

A tensorboard SummaryWriter to control, among others, log file location (default is <current working directory>/logs.)

None
show_progress_bars bool

Whether to show a progressbar during simulation and sampling.

True
Source code in sbi/inference/snre/snre_b.py
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
def __init__(
    self,
    prior: Optional[Distribution] = None,
    classifier: Union[str, Callable] = "resnet",
    device: str = "cpu",
    logging_level: Union[int, str] = "warning",
    summary_writer: Optional[TensorboardSummaryWriter] = None,
    show_progress_bars: bool = True,
):
    r"""SRE[1], here known as SNRE_B.

    [1] _On Contrastive Learning for Likelihood-free Inference_, Durkan et al.,
        ICML 2020, https://arxiv.org/pdf/2002.03712

    Args:
        prior: A probability distribution that expresses prior knowledge about the
            parameters, e.g. which ranges are meaningful for them. If `None`, the
            prior must be passed to `.build_posterior()`.
        classifier: Classifier trained to approximate likelihood ratios. If it is
            a string, use a pre-configured network of the provided type (one of
            linear, mlp, resnet). Alternatively, a function that builds a custom
            neural network can be provided. The function will be called with the
            first batch of simulations (theta, x), which can thus be used for shape
            inference and potentially for z-scoring. It needs to return a PyTorch
            `nn.Module` implementing the classifier.
        device: Training device, e.g., "cpu", "cuda" or "cuda:{0, 1, ...}".
        logging_level: Minimum severity of messages to log. One of the strings
            INFO, WARNING, DEBUG, ERROR and CRITICAL.
        summary_writer: A tensorboard `SummaryWriter` to control, among others, log
            file location (default is `<current working directory>/logs`.)
        show_progress_bars: Whether to show a progressbar during simulation and
            sampling.
    """

    kwargs = del_entries(locals(), entries=("self", "__class__"))
    super().__init__(**kwargs)

train(num_atoms=10, training_batch_size=50, learning_rate=0.0005, validation_fraction=0.1, stop_after_epochs=20, max_num_epochs=2 ** 31 - 1, clip_max_norm=5.0, resume_training=False, discard_prior_samples=False, retrain_from_scratch=False, show_train_summary=False, dataloader_kwargs=None)

Return classifier that approximates the ratio \(p(\theta,x)/p(\theta)p(x)\).

Parameters:

Name Type Description Default
num_atoms int

Number of atoms to use for classification.

10
training_batch_size int

Training batch size.

50
learning_rate float

Learning rate for Adam optimizer.

0.0005
validation_fraction float

The fraction of data to use for validation.

0.1
stop_after_epochs int

The number of epochs to wait for improvement on the validation set before terminating training.

20
max_num_epochs int

Maximum number of epochs to run. If reached, we stop training even when the validation loss is still decreasing. Otherwise, we train until validation loss increases (see also stop_after_epochs).

2 ** 31 - 1
clip_max_norm Optional[float]

Value at which to clip the total gradient norm in order to prevent exploding gradients. Use None for no clipping.

5.0
resume_training bool

Can be used in case training time is limited, e.g. on a cluster. If True, the split between train and validation set, the optimizer, the number of epochs, and the best validation log-prob will be restored from the last time .train() was called.

False
discard_prior_samples bool

Whether to discard samples simulated in round 1, i.e. from the prior. Training may be sped up by ignoring such less targeted samples.

False
retrain_from_scratch bool

Whether to retrain the conditional density estimator for the posterior from scratch each round.

False
show_train_summary bool

Whether to print the number of epochs and validation loss and leakage after the training.

False
dataloader_kwargs Optional[Dict]

Additional or updated kwargs to be passed to the training and validation dataloaders (like, e.g., a collate_fn)

None

Returns:

Type Description
Module

Classifier that approximates the ratio \(p(\theta,x)/p(\theta)p(x)\).

Source code in sbi/inference/snre/snre_b.py
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
def train(
    self,
    num_atoms: int = 10,
    training_batch_size: int = 50,
    learning_rate: float = 5e-4,
    validation_fraction: float = 0.1,
    stop_after_epochs: int = 20,
    max_num_epochs: int = 2**31 - 1,
    clip_max_norm: Optional[float] = 5.0,
    resume_training: bool = False,
    discard_prior_samples: bool = False,
    retrain_from_scratch: bool = False,
    show_train_summary: bool = False,
    dataloader_kwargs: Optional[Dict] = None,
) -> nn.Module:
    r"""Return classifier that approximates the ratio $p(\theta,x)/p(\theta)p(x)$.

    Args:
        num_atoms: Number of atoms to use for classification.
        training_batch_size: Training batch size.
        learning_rate: Learning rate for Adam optimizer.
        validation_fraction: The fraction of data to use for validation.
        stop_after_epochs: The number of epochs to wait for improvement on the
            validation set before terminating training.
        max_num_epochs: Maximum number of epochs to run. If reached, we stop
            training even when the validation loss is still decreasing. Otherwise,
            we train until validation loss increases (see also `stop_after_epochs`).
        clip_max_norm: Value at which to clip the total gradient norm in order to
            prevent exploding gradients. Use None for no clipping.
        resume_training: Can be used in case training time is limited, e.g. on a
            cluster. If `True`, the split between train and validation set, the
            optimizer, the number of epochs, and the best validation log-prob will
            be restored from the last time `.train()` was called.
        discard_prior_samples: Whether to discard samples simulated in round 1, i.e.
            from the prior. Training may be sped up by ignoring such less targeted
            samples.
        retrain_from_scratch: Whether to retrain the conditional density
            estimator for the posterior from scratch each round.
        show_train_summary: Whether to print the number of epochs and validation
            loss and leakage after the training.
        dataloader_kwargs: Additional or updated kwargs to be passed to the training
            and validation dataloaders (like, e.g., a collate_fn)

    Returns:
        Classifier that approximates the ratio $p(\theta,x)/p(\theta)p(x)$.
    """
    kwargs = del_entries(locals(), entries=("self", "__class__"))
    return super().train(**kwargs)

Bases: RatioEstimator

Source code in sbi/inference/snre/snre_c.py
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
class SNRE_C(RatioEstimator):
    def __init__(
        self,
        prior: Optional[Distribution] = None,
        classifier: Union[str, Callable] = "resnet",
        device: str = "cpu",
        logging_level: Union[int, str] = "warning",
        summary_writer: Optional[TensorboardSummaryWriter] = None,
        show_progress_bars: bool = True,
    ):
        r"""NRE-C[1] is a generalization of the non-sequential (amortized) versions of
        SNRE_A and SNRE_B. We call the algorithm SNRE_C within `sbi`.

        NRE-C:
        (1) like SNRE_B, features a "multiclass" loss function where several marginally
            drawn parameter-data pairs are contrasted against a jointly drawn pair.
        (2) like AALR/NRE_A, i.e., the non-sequential version of SNRE_A, it encourages
            the approximate ratio $p(\theta,x)/p(\theta)p(x)$, accessed through
            `.potential()` within `sbi`, to be exact at optimum. This addresses the
            issue that SNRE_B estimates this ratio only up to an arbitrary function
            (normalizing constant) of the data $x$.

        Just like for all ratio estimation algorithms, the sequential version of SNRE_C
        will be estimated only up to a function (normalizing constant) of the data $x$
        in rounds after the first.

        [1] _Contrastive Neural Ratio Estimation_, Benajmin Kurt Miller, et. al.,
            NeurIPS 2022, https://arxiv.org/abs/2210.06170

        Args:
            prior: A probability distribution that expresses prior knowledge about the
                parameters, e.g. which ranges are meaningful for them. If `None`, the
                prior must be passed to `.build_posterior()`.
            classifier: Classifier trained to approximate likelihood ratios. If it is
                a string, use a pre-configured network of the provided type (one of
                linear, mlp, resnet). Alternatively, a function that builds a custom
                neural network can be provided. The function will be called with the
                first batch of simulations (theta, x), which can thus be used for shape
                inference and potentially for z-scoring. It needs to return a PyTorch
                `nn.Module` implementing the classifier.
            device: Training device, e.g., "cpu", "cuda" or "cuda:{0, 1, ...}".
            logging_level: Minimum severity of messages to log. One of the strings
                INFO, WARNING, DEBUG, ERROR and CRITICAL.
            summary_writer: A tensorboard `SummaryWriter` to control, among others, log
                file location (default is `<current working directory>/logs`.)
            show_progress_bars: Whether to show a progressbar during simulation and
                sampling.
        """

        kwargs = del_entries(locals(), entries=("self", "__class__"))
        super().__init__(**kwargs)

    def train(
        self,
        num_classes: int = 5,
        gamma: float = 1.0,
        training_batch_size: int = 50,
        learning_rate: float = 5e-4,
        validation_fraction: float = 0.1,
        stop_after_epochs: int = 20,
        max_num_epochs: int = 2**31 - 1,
        clip_max_norm: Optional[float] = 5.0,
        resume_training: bool = False,
        discard_prior_samples: bool = False,
        retrain_from_scratch: bool = False,
        show_train_summary: bool = False,
        dataloader_kwargs: Optional[Dict] = None,
    ) -> nn.Module:
        r"""Return classifier that approximates the ratio $p(\theta,x)/p(\theta)p(x)$.

        Args:
            num_classes: Number of theta to classify against, corresponds to $K$ in
                _Contrastive Neural Ratio Estimation_. Minimum value is 1. Similar to
                `num_atoms` for SNRE_B except SNRE_C has an additional independently
                drawn sample. The total number of alternative parameters `NRE-C` "sees"
                is $2K-1$ or `2 * num_classes - 1` divided between two loss terms.
            gamma: Determines the relative weight of the sum of all $K$ dependently
                drawn classes against the marginally drawn one. Specifically,
                $p(y=k) :=p_K$, $p(y=0) := p_0$, $p_0 = 1 - K p_K$, and finally
                $\gamma := K p_K / p_0$.
            training_batch_size: Training batch size.
            learning_rate: Learning rate for Adam optimizer.
            validation_fraction: The fraction of data to use for validation.
            stop_after_epochs: The number of epochs to wait for improvement on the
                validation set before terminating training.
            max_num_epochs: Maximum number of epochs to run. If reached, we stop
                training even when the validation loss is still decreasing. Otherwise,
                we train until validation loss increases (see also `stop_after_epochs`).
            clip_max_norm: Value at which to clip the total gradient norm in order to
                prevent exploding gradients. Use None for no clipping.
            exclude_invalid_x: Whether to exclude simulation outputs `x=NaN` or `x=±∞`
                during training. Expect errors, silent or explicit, when `False`.
            resume_training: Can be used in case training time is limited, e.g. on a
                cluster. If `True`, the split between train and validation set, the
                optimizer, the number of epochs, and the best validation log-prob will
                be restored from the last time `.train()` was called.
            discard_prior_samples: Whether to discard samples simulated in round 1, i.e.
                from the prior. Training may be sped up by ignoring such less targeted
                samples.
            retrain_from_scratch: Whether to retrain the conditional density
                estimator for the posterior from scratch each round.
            show_train_summary: Whether to print the number of epochs and validation
                loss and leakage after the training.
            dataloader_kwargs: Additional or updated kwargs to be passed to the training
                and validation dataloaders (like, e.g., a collate_fn)

        Returns:
            Classifier that approximates the ratio $p(\theta,x)/p(\theta)p(x)$.
        """
        kwargs = del_entries(locals(), entries=("self", "__class__"))
        kwargs["num_atoms"] = kwargs.pop("num_classes") + 1
        kwargs["loss_kwargs"] = {"gamma": kwargs.pop("gamma")}
        return super().train(**kwargs)

    def _loss(
        self, theta: Tensor, x: Tensor, num_atoms: int, gamma: float
    ) -> torch.Tensor:
        r"""Return cross-entropy loss (via ''multi-class sigmoid'' activation) for
        1-out-of-`K + 1` classification.

        At optimum, this loss function returns the exact likelihood-to-evidence ratio
        in the first round.
        Details of loss computation are described in Contrastive Neural Ratio
        Estimation[1]. The paper does not discuss the sequential case.

        [1] _Contrastive Neural Ratio Estimation_, Benajmin Kurt Miller, et. al.,
            NeurIPS 2022, https://arxiv.org/abs/2210.06170
        """

        # Reminder: K = num_classes
        # The algorithm is written with K, so we convert back to K format rather than
        # reasoning in num_atoms.
        num_classes = num_atoms - 1
        assert num_classes >= 1, f"num_classes = {num_classes} must be greater than 1."

        assert theta.shape[0] == x.shape[0], "Batch sizes for theta and x must match."
        batch_size = theta.shape[0]

        # We append an contrastive theta to the marginal case because we will remove
        # the jointly drawn
        # sample in the logits_marginal[:, 0] position. That makes the remaining sample
        # marginally drawn.
        # We have a batch of `batch_size` datapoints.
        logits_marginal = self._classifier_logits(theta, x, num_classes + 1).reshape(
            batch_size, num_classes + 1
        )
        logits_joint = self._classifier_logits(theta, x, num_classes).reshape(
            batch_size, num_classes
        )

        dtype = logits_marginal.dtype
        device = logits_marginal.device

        # Index 0 is the theta-x-pair sampled from the joint p(theta,x) and hence
        # we remove the jointly drawn sample from the logits_marginal
        logits_marginal = logits_marginal[:, 1:]
        # ... and retain it in the logits_joint. Now we have two arrays with K choices.

        # To use logsumexp, we extend the denominator logits with loggamma
        loggamma = torch.tensor(gamma, dtype=dtype, device=device).log()
        logK = torch.tensor(num_classes, dtype=dtype, device=device).log()
        denominator_marginal = torch.concat(
            [loggamma + logits_marginal, logK.expand((batch_size, 1))],
            dim=-1,
        )
        denominator_joint = torch.concat(
            [loggamma + logits_joint, logK.expand((batch_size, 1))],
            dim=-1,
        )

        # Compute the contributions to the loss from each term in the classification.
        log_prob_marginal = logK - torch.logsumexp(denominator_marginal, dim=-1)
        log_prob_joint = (
            loggamma + logits_joint[:, 0] - torch.logsumexp(denominator_joint, dim=-1)
        )

        # relative weights. p_marginal := p_0, and p_joint := p_K * K from the notation.
        p_marginal, p_joint = self._get_prior_probs_marginal_and_joint(gamma)
        return -torch.mean(p_marginal * log_prob_marginal + p_joint * log_prob_joint)

    @staticmethod
    def _get_prior_probs_marginal_and_joint(gamma: float) -> Tuple[float, float]:
        r"""Return a tuple (p_marginal, p_joint) where `p_marginal := `$p_0$,
        `p_joint := `$p_K \cdot K$.

        We let the joint (dependently drawn) class to be equally likely across K
        options. The marginal class is therefore restricted to get the remaining
        probability.
        """
        p_joint = gamma / (1 + gamma)
        p_marginal = 1 / (1 + gamma)
        return p_marginal, p_joint

__init__(prior=None, classifier='resnet', device='cpu', logging_level='warning', summary_writer=None, show_progress_bars=True)

NRE-C[1] is a generalization of the non-sequential (amortized) versions of SNRE_A and SNRE_B. We call the algorithm SNRE_C within sbi.

NRE-C: (1) like SNRE_B, features a “multiclass” loss function where several marginally drawn parameter-data pairs are contrasted against a jointly drawn pair. (2) like AALR/NRE_A, i.e., the non-sequential version of SNRE_A, it encourages the approximate ratio \(p(\theta,x)/p(\theta)p(x)\), accessed through .potential() within sbi, to be exact at optimum. This addresses the issue that SNRE_B estimates this ratio only up to an arbitrary function (normalizing constant) of the data \(x\).

Just like for all ratio estimation algorithms, the sequential version of SNRE_C will be estimated only up to a function (normalizing constant) of the data \(x\) in rounds after the first.

[1] Contrastive Neural Ratio Estimation, Benajmin Kurt Miller, et. al., NeurIPS 2022, https://arxiv.org/abs/2210.06170

Parameters:

Name Type Description Default
prior Optional[Distribution]

A probability distribution that expresses prior knowledge about the parameters, e.g. which ranges are meaningful for them. If None, the prior must be passed to .build_posterior().

None
classifier Union[str, Callable]

Classifier trained to approximate likelihood ratios. If it is a string, use a pre-configured network of the provided type (one of linear, mlp, resnet). Alternatively, a function that builds a custom neural network can be provided. The function will be called with the first batch of simulations (theta, x), which can thus be used for shape inference and potentially for z-scoring. It needs to return a PyTorch nn.Module implementing the classifier.

'resnet'
device str

Training device, e.g., “cpu”, “cuda” or “cuda:{0, 1, …}”.

'cpu'
logging_level Union[int, str]

Minimum severity of messages to log. One of the strings INFO, WARNING, DEBUG, ERROR and CRITICAL.

'warning'
summary_writer Optional[TensorboardSummaryWriter]

A tensorboard SummaryWriter to control, among others, log file location (default is <current working directory>/logs.)

None
show_progress_bars bool

Whether to show a progressbar during simulation and sampling.

True
Source code in sbi/inference/snre/snre_c.py
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
def __init__(
    self,
    prior: Optional[Distribution] = None,
    classifier: Union[str, Callable] = "resnet",
    device: str = "cpu",
    logging_level: Union[int, str] = "warning",
    summary_writer: Optional[TensorboardSummaryWriter] = None,
    show_progress_bars: bool = True,
):
    r"""NRE-C[1] is a generalization of the non-sequential (amortized) versions of
    SNRE_A and SNRE_B. We call the algorithm SNRE_C within `sbi`.

    NRE-C:
    (1) like SNRE_B, features a "multiclass" loss function where several marginally
        drawn parameter-data pairs are contrasted against a jointly drawn pair.
    (2) like AALR/NRE_A, i.e., the non-sequential version of SNRE_A, it encourages
        the approximate ratio $p(\theta,x)/p(\theta)p(x)$, accessed through
        `.potential()` within `sbi`, to be exact at optimum. This addresses the
        issue that SNRE_B estimates this ratio only up to an arbitrary function
        (normalizing constant) of the data $x$.

    Just like for all ratio estimation algorithms, the sequential version of SNRE_C
    will be estimated only up to a function (normalizing constant) of the data $x$
    in rounds after the first.

    [1] _Contrastive Neural Ratio Estimation_, Benajmin Kurt Miller, et. al.,
        NeurIPS 2022, https://arxiv.org/abs/2210.06170

    Args:
        prior: A probability distribution that expresses prior knowledge about the
            parameters, e.g. which ranges are meaningful for them. If `None`, the
            prior must be passed to `.build_posterior()`.
        classifier: Classifier trained to approximate likelihood ratios. If it is
            a string, use a pre-configured network of the provided type (one of
            linear, mlp, resnet). Alternatively, a function that builds a custom
            neural network can be provided. The function will be called with the
            first batch of simulations (theta, x), which can thus be used for shape
            inference and potentially for z-scoring. It needs to return a PyTorch
            `nn.Module` implementing the classifier.
        device: Training device, e.g., "cpu", "cuda" or "cuda:{0, 1, ...}".
        logging_level: Minimum severity of messages to log. One of the strings
            INFO, WARNING, DEBUG, ERROR and CRITICAL.
        summary_writer: A tensorboard `SummaryWriter` to control, among others, log
            file location (default is `<current working directory>/logs`.)
        show_progress_bars: Whether to show a progressbar during simulation and
            sampling.
    """

    kwargs = del_entries(locals(), entries=("self", "__class__"))
    super().__init__(**kwargs)

train(num_classes=5, gamma=1.0, training_batch_size=50, learning_rate=0.0005, validation_fraction=0.1, stop_after_epochs=20, max_num_epochs=2 ** 31 - 1, clip_max_norm=5.0, resume_training=False, discard_prior_samples=False, retrain_from_scratch=False, show_train_summary=False, dataloader_kwargs=None)

Return classifier that approximates the ratio \(p(\theta,x)/p(\theta)p(x)\).

Parameters:

Name Type Description Default
num_classes int

Number of theta to classify against, corresponds to \(K\) in Contrastive Neural Ratio Estimation. Minimum value is 1. Similar to num_atoms for SNRE_B except SNRE_C has an additional independently drawn sample. The total number of alternative parameters NRE-C “sees” is \(2K-1\) or 2 * num_classes - 1 divided between two loss terms.

5
gamma float

Determines the relative weight of the sum of all \(K\) dependently drawn classes against the marginally drawn one. Specifically, \(p(y=k) :=p_K\), \(p(y=0) := p_0\), \(p_0 = 1 - K p_K\), and finally \(\gamma := K p_K / p_0\).

1.0
training_batch_size int

Training batch size.

50
learning_rate float

Learning rate for Adam optimizer.

0.0005
validation_fraction float

The fraction of data to use for validation.

0.1
stop_after_epochs int

The number of epochs to wait for improvement on the validation set before terminating training.

20
max_num_epochs int

Maximum number of epochs to run. If reached, we stop training even when the validation loss is still decreasing. Otherwise, we train until validation loss increases (see also stop_after_epochs).

2 ** 31 - 1
clip_max_norm Optional[float]

Value at which to clip the total gradient norm in order to prevent exploding gradients. Use None for no clipping.

5.0
exclude_invalid_x

Whether to exclude simulation outputs x=NaN or x=±∞ during training. Expect errors, silent or explicit, when False.

required
resume_training bool

Can be used in case training time is limited, e.g. on a cluster. If True, the split between train and validation set, the optimizer, the number of epochs, and the best validation log-prob will be restored from the last time .train() was called.

False
discard_prior_samples bool

Whether to discard samples simulated in round 1, i.e. from the prior. Training may be sped up by ignoring such less targeted samples.

False
retrain_from_scratch bool

Whether to retrain the conditional density estimator for the posterior from scratch each round.

False
show_train_summary bool

Whether to print the number of epochs and validation loss and leakage after the training.

False
dataloader_kwargs Optional[Dict]

Additional or updated kwargs to be passed to the training and validation dataloaders (like, e.g., a collate_fn)

None

Returns:

Type Description
Module

Classifier that approximates the ratio \(p(\theta,x)/p(\theta)p(x)\).

Source code in sbi/inference/snre/snre_c.py
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
def train(
    self,
    num_classes: int = 5,
    gamma: float = 1.0,
    training_batch_size: int = 50,
    learning_rate: float = 5e-4,
    validation_fraction: float = 0.1,
    stop_after_epochs: int = 20,
    max_num_epochs: int = 2**31 - 1,
    clip_max_norm: Optional[float] = 5.0,
    resume_training: bool = False,
    discard_prior_samples: bool = False,
    retrain_from_scratch: bool = False,
    show_train_summary: bool = False,
    dataloader_kwargs: Optional[Dict] = None,
) -> nn.Module:
    r"""Return classifier that approximates the ratio $p(\theta,x)/p(\theta)p(x)$.

    Args:
        num_classes: Number of theta to classify against, corresponds to $K$ in
            _Contrastive Neural Ratio Estimation_. Minimum value is 1. Similar to
            `num_atoms` for SNRE_B except SNRE_C has an additional independently
            drawn sample. The total number of alternative parameters `NRE-C` "sees"
            is $2K-1$ or `2 * num_classes - 1` divided between two loss terms.
        gamma: Determines the relative weight of the sum of all $K$ dependently
            drawn classes against the marginally drawn one. Specifically,
            $p(y=k) :=p_K$, $p(y=0) := p_0$, $p_0 = 1 - K p_K$, and finally
            $\gamma := K p_K / p_0$.
        training_batch_size: Training batch size.
        learning_rate: Learning rate for Adam optimizer.
        validation_fraction: The fraction of data to use for validation.
        stop_after_epochs: The number of epochs to wait for improvement on the
            validation set before terminating training.
        max_num_epochs: Maximum number of epochs to run. If reached, we stop
            training even when the validation loss is still decreasing. Otherwise,
            we train until validation loss increases (see also `stop_after_epochs`).
        clip_max_norm: Value at which to clip the total gradient norm in order to
            prevent exploding gradients. Use None for no clipping.
        exclude_invalid_x: Whether to exclude simulation outputs `x=NaN` or `x=±∞`
            during training. Expect errors, silent or explicit, when `False`.
        resume_training: Can be used in case training time is limited, e.g. on a
            cluster. If `True`, the split between train and validation set, the
            optimizer, the number of epochs, and the best validation log-prob will
            be restored from the last time `.train()` was called.
        discard_prior_samples: Whether to discard samples simulated in round 1, i.e.
            from the prior. Training may be sped up by ignoring such less targeted
            samples.
        retrain_from_scratch: Whether to retrain the conditional density
            estimator for the posterior from scratch each round.
        show_train_summary: Whether to print the number of epochs and validation
            loss and leakage after the training.
        dataloader_kwargs: Additional or updated kwargs to be passed to the training
            and validation dataloaders (like, e.g., a collate_fn)

    Returns:
        Classifier that approximates the ratio $p(\theta,x)/p(\theta)p(x)$.
    """
    kwargs = del_entries(locals(), entries=("self", "__class__"))
    kwargs["num_atoms"] = kwargs.pop("num_classes") + 1
    kwargs["loss_kwargs"] = {"gamma": kwargs.pop("gamma")}
    return super().train(**kwargs)

Bases: SNRE_A

Source code in sbi/inference/snre/bnre.py
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
class BNRE(SNRE_A):
    def __init__(
        self,
        prior: Optional[Distribution] = None,
        classifier: Union[str, Callable] = "resnet",
        device: str = "cpu",
        logging_level: Union[int, str] = "warning",
        summary_writer: Optional[TensorboardSummaryWriter] = None,
        show_progress_bars: bool = True,
    ):
        r"""Balanced neural ratio estimation (BNRE)[1]. BNRE is a variation of NRE
        aiming to produce more conservative posterior approximations

        [1] Delaunoy, A., Hermans, J., Rozet, F., Wehenkel, A., & Louppe, G..
        Towards Reliable Simulation-Based Inference with Balanced Neural Ratio
        Estimation.
        NeurIPS 2022. https://arxiv.org/abs/2208.13624

        Args:
            prior: A probability distribution that expresses prior knowledge about the
                parameters, e.g. which ranges are meaningful for them. If `None`, the
                prior must be passed to `.build_posterior()`.
            classifier: Classifier trained to approximate likelihood ratios. If it is
                a string, use a pre-configured network of the provided type (one of
                linear, mlp, resnet). Alternatively, a function that builds a custom
                neural network can be provided. The function will be called with the
                first batch of simulations $(\theta, x)$, which can thus be used for
                shape inference and potentially for z-scoring. It needs to return a
                PyTorch `nn.Module` implementing the classifier.
            device: Training device, e.g., "cpu", "cuda" or "cuda:{0, 1, ...}".
            logging_level: Minimum severity of messages to log. One of the strings
                INFO, WARNING, DEBUG, ERROR and CRITICAL.
            summary_writer: A tensorboard `SummaryWriter` to control, among others, log
                file location (default is `<current working directory>/logs`.)
            show_progress_bars: Whether to show a progressbar during simulation and
                sampling.
        """

        kwargs = del_entries(locals(), entries=("self", "__class__"))
        super().__init__(**kwargs)

    def train(
        self,
        regularization_strength: float = 100.0,
        training_batch_size: int = 50,
        learning_rate: float = 5e-4,
        validation_fraction: float = 0.1,
        stop_after_epochs: int = 20,
        max_num_epochs: int = 2**31 - 1,
        clip_max_norm: Optional[float] = 5.0,
        resume_training: bool = False,
        discard_prior_samples: bool = False,
        retrain_from_scratch: bool = False,
        show_train_summary: bool = False,
        dataloader_kwargs: Optional[Dict] = None,
    ) -> nn.Module:
        r"""Return classifier that approximates the ratio $p(\theta,x)/p(\theta)p(x)$.
        Args:

            regularization_strength: The multiplicative coefficient applied to the
                balancing regularizer ($\lambda$).
            training_batch_size: Training batch size.
            learning_rate: Learning rate for Adam optimizer.
            validation_fraction: The fraction of data to use for validation.
            stop_after_epochs: The number of epochs to wait for improvement on the
                validation set before terminating training.
            max_num_epochs: Maximum number of epochs to run. If reached, we stop
                training even when the validation loss is still decreasing. Otherwise,
                we train until validation loss increases (see also `stop_after_epochs`).
            clip_max_norm: Value at which to clip the total gradient norm in order to
                prevent exploding gradients. Use None for no clipping.
            exclude_invalid_x: Whether to exclude simulation outputs `x=NaN` or `x=±∞`
                during training. Expect errors, silent or explicit, when `False`.
            resume_training: Can be used in case training time is limited, e.g. on a
                cluster. If `True`, the split between train and validation set, the
                optimizer, the number of epochs, and the best validation log-prob will
                be restored from the last time `.train()` was called.
            discard_prior_samples: Whether to discard samples simulated in round 1, i.e.
                from the prior. Training may be sped up by ignoring such less targeted
                samples.
            retrain_from_scratch: Whether to retrain the conditional density
                estimator for the posterior from scratch each round.
            show_train_summary: Whether to print the number of epochs and validation
                loss and leakage after the training.
            dataloader_kwargs: Additional or updated kwargs to be passed to the training
                and validation dataloaders (like, e.g., a collate_fn)
        Returns:
            Classifier that approximates the ratio $p(\theta,x)/p(\theta)p(x)$.
        """
        kwargs = del_entries(locals(), entries=("self", "__class__"))
        kwargs["loss_kwargs"] = {
            "regularization_strength": kwargs.pop("regularization_strength")
        }
        return super().train(**kwargs)

    def _loss(
        self, theta: Tensor, x: Tensor, num_atoms: int, regularization_strength: float
    ) -> Tensor:
        """Returns the binary cross-entropy loss for the trained classifier.

        The classifier takes as input a $(\theta,x)$ pair. It is trained to predict 1
        if the pair was sampled from the joint $p(\theta,x)$, and to predict 0 if the
        pair was sampled from the marginals $p(\theta)p(x)$.
        """

        assert theta.shape[0] == x.shape[0], "Batch sizes for theta and x must match."
        batch_size = theta.shape[0]

        logits = self._classifier_logits(theta, x, num_atoms)
        likelihood = torch.sigmoid(logits).squeeze()

        # Alternating pairs where there is one sampled from the joint and one
        # sampled from the marginals. The first element is sampled from the
        # joint p(theta, x) and is labelled 1. The second element is sampled
        # from the marginals p(theta)p(x) and is labelled 0. And so on.
        labels = ones(2 * batch_size, device=self._device)  # two atoms
        labels[1::2] = 0.0

        # Binary cross entropy to learn the likelihood (AALR-specific)
        bce = nn.BCELoss()(likelihood, labels)

        # Balancing regularizer
        regularizer = (
            (torch.sigmoid(logits[0::2]) + torch.sigmoid(logits[1::2]) - 1)
            .mean()
            .square()
        )

        return bce + regularization_strength * regularizer

__init__(prior=None, classifier='resnet', device='cpu', logging_level='warning', summary_writer=None, show_progress_bars=True)

Balanced neural ratio estimation (BNRE)[1]. BNRE is a variation of NRE aiming to produce more conservative posterior approximations

[1] Delaunoy, A., Hermans, J., Rozet, F., Wehenkel, A., & Louppe, G.. Towards Reliable Simulation-Based Inference with Balanced Neural Ratio Estimation. NeurIPS 2022. https://arxiv.org/abs/2208.13624

Parameters:

Name Type Description Default
prior Optional[Distribution]

A probability distribution that expresses prior knowledge about the parameters, e.g. which ranges are meaningful for them. If None, the prior must be passed to .build_posterior().

None
classifier Union[str, Callable]

Classifier trained to approximate likelihood ratios. If it is a string, use a pre-configured network of the provided type (one of linear, mlp, resnet). Alternatively, a function that builds a custom neural network can be provided. The function will be called with the first batch of simulations \((\theta, x)\), which can thus be used for shape inference and potentially for z-scoring. It needs to return a PyTorch nn.Module implementing the classifier.

'resnet'
device str

Training device, e.g., “cpu”, “cuda” or “cuda:{0, 1, …}”.

'cpu'
logging_level Union[int, str]

Minimum severity of messages to log. One of the strings INFO, WARNING, DEBUG, ERROR and CRITICAL.

'warning'
summary_writer Optional[TensorboardSummaryWriter]

A tensorboard SummaryWriter to control, among others, log file location (default is <current working directory>/logs.)

None
show_progress_bars bool

Whether to show a progressbar during simulation and sampling.

True
Source code in sbi/inference/snre/bnre.py
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
def __init__(
    self,
    prior: Optional[Distribution] = None,
    classifier: Union[str, Callable] = "resnet",
    device: str = "cpu",
    logging_level: Union[int, str] = "warning",
    summary_writer: Optional[TensorboardSummaryWriter] = None,
    show_progress_bars: bool = True,
):
    r"""Balanced neural ratio estimation (BNRE)[1]. BNRE is a variation of NRE
    aiming to produce more conservative posterior approximations

    [1] Delaunoy, A., Hermans, J., Rozet, F., Wehenkel, A., & Louppe, G..
    Towards Reliable Simulation-Based Inference with Balanced Neural Ratio
    Estimation.
    NeurIPS 2022. https://arxiv.org/abs/2208.13624

    Args:
        prior: A probability distribution that expresses prior knowledge about the
            parameters, e.g. which ranges are meaningful for them. If `None`, the
            prior must be passed to `.build_posterior()`.
        classifier: Classifier trained to approximate likelihood ratios. If it is
            a string, use a pre-configured network of the provided type (one of
            linear, mlp, resnet). Alternatively, a function that builds a custom
            neural network can be provided. The function will be called with the
            first batch of simulations $(\theta, x)$, which can thus be used for
            shape inference and potentially for z-scoring. It needs to return a
            PyTorch `nn.Module` implementing the classifier.
        device: Training device, e.g., "cpu", "cuda" or "cuda:{0, 1, ...}".
        logging_level: Minimum severity of messages to log. One of the strings
            INFO, WARNING, DEBUG, ERROR and CRITICAL.
        summary_writer: A tensorboard `SummaryWriter` to control, among others, log
            file location (default is `<current working directory>/logs`.)
        show_progress_bars: Whether to show a progressbar during simulation and
            sampling.
    """

    kwargs = del_entries(locals(), entries=("self", "__class__"))
    super().__init__(**kwargs)

train(regularization_strength=100.0, training_batch_size=50, learning_rate=0.0005, validation_fraction=0.1, stop_after_epochs=20, max_num_epochs=2 ** 31 - 1, clip_max_norm=5.0, resume_training=False, discard_prior_samples=False, retrain_from_scratch=False, show_train_summary=False, dataloader_kwargs=None)

Return classifier that approximates the ratio \(p(\theta,x)/p(\theta)p(x)\). Args:

regularization_strength: The multiplicative coefficient applied to the
    balancing regularizer ($\lambda$).
training_batch_size: Training batch size.
learning_rate: Learning rate for Adam optimizer.
validation_fraction: The fraction of data to use for validation.
stop_after_epochs: The number of epochs to wait for improvement on the
    validation set before terminating training.
max_num_epochs: Maximum number of epochs to run. If reached, we stop
    training even when the validation loss is still decreasing. Otherwise,
    we train until validation loss increases (see also `stop_after_epochs`).
clip_max_norm: Value at which to clip the total gradient norm in order to
    prevent exploding gradients. Use None for no clipping.
exclude_invalid_x: Whether to exclude simulation outputs `x=NaN` or `x=±∞`
    during training. Expect errors, silent or explicit, when `False`.
resume_training: Can be used in case training time is limited, e.g. on a
    cluster. If `True`, the split between train and validation set, the
    optimizer, the number of epochs, and the best validation log-prob will
    be restored from the last time `.train()` was called.
discard_prior_samples: Whether to discard samples simulated in round 1, i.e.
    from the prior. Training may be sped up by ignoring such less targeted
    samples.
retrain_from_scratch: Whether to retrain the conditional density
    estimator for the posterior from scratch each round.
show_train_summary: Whether to print the number of epochs and validation
    loss and leakage after the training.
dataloader_kwargs: Additional or updated kwargs to be passed to the training
    and validation dataloaders (like, e.g., a collate_fn)

Returns: Classifier that approximates the ratio \(p(\theta,x)/p(\theta)p(x)\).

Source code in sbi/inference/snre/bnre.py
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
def train(
    self,
    regularization_strength: float = 100.0,
    training_batch_size: int = 50,
    learning_rate: float = 5e-4,
    validation_fraction: float = 0.1,
    stop_after_epochs: int = 20,
    max_num_epochs: int = 2**31 - 1,
    clip_max_norm: Optional[float] = 5.0,
    resume_training: bool = False,
    discard_prior_samples: bool = False,
    retrain_from_scratch: bool = False,
    show_train_summary: bool = False,
    dataloader_kwargs: Optional[Dict] = None,
) -> nn.Module:
    r"""Return classifier that approximates the ratio $p(\theta,x)/p(\theta)p(x)$.
    Args:

        regularization_strength: The multiplicative coefficient applied to the
            balancing regularizer ($\lambda$).
        training_batch_size: Training batch size.
        learning_rate: Learning rate for Adam optimizer.
        validation_fraction: The fraction of data to use for validation.
        stop_after_epochs: The number of epochs to wait for improvement on the
            validation set before terminating training.
        max_num_epochs: Maximum number of epochs to run. If reached, we stop
            training even when the validation loss is still decreasing. Otherwise,
            we train until validation loss increases (see also `stop_after_epochs`).
        clip_max_norm: Value at which to clip the total gradient norm in order to
            prevent exploding gradients. Use None for no clipping.
        exclude_invalid_x: Whether to exclude simulation outputs `x=NaN` or `x=±∞`
            during training. Expect errors, silent or explicit, when `False`.
        resume_training: Can be used in case training time is limited, e.g. on a
            cluster. If `True`, the split between train and validation set, the
            optimizer, the number of epochs, and the best validation log-prob will
            be restored from the last time `.train()` was called.
        discard_prior_samples: Whether to discard samples simulated in round 1, i.e.
            from the prior. Training may be sped up by ignoring such less targeted
            samples.
        retrain_from_scratch: Whether to retrain the conditional density
            estimator for the posterior from scratch each round.
        show_train_summary: Whether to print the number of epochs and validation
            loss and leakage after the training.
        dataloader_kwargs: Additional or updated kwargs to be passed to the training
            and validation dataloaders (like, e.g., a collate_fn)
    Returns:
        Classifier that approximates the ratio $p(\theta,x)/p(\theta)p(x)$.
    """
    kwargs = del_entries(locals(), entries=("self", "__class__"))
    kwargs["loss_kwargs"] = {
        "regularization_strength": kwargs.pop("regularization_strength")
    }
    return super().train(**kwargs)

Bases: ABCBASE

Source code in sbi/inference/abc/mcabc.py
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
class MCABC(ABCBASE):
    def __init__(
        self,
        simulator: Callable,
        prior,
        distance: Union[str, Callable] = "l2",
        num_workers: int = 1,
        simulation_batch_size: int = 1,
        show_progress_bars: bool = True,
    ):
        r"""Monte-Carlo Approximate Bayesian Computation (Rejection ABC) [1].

        [1] Pritchard, J. K., Seielstad, M. T., Perez-Lezaun, A., & Feldman, M. W.
        (1999). Population growth of human Y chromosomes: a study of Y chromosome
        microsatellites. Molecular biology and evolution, 16(12), 1791-1798.

        Args:
            simulator: A function that takes parameters $\theta$ and maps them to
                simulations, or observations, `x`, $\mathrm{sim}(\theta)\to x$. Any
                regular Python callable (i.e. function or class with `__call__` method)
                can be used.
            prior: A probability distribution that expresses prior knowledge about the
                parameters, e.g. which ranges are meaningful for them. Any
                object with `.log_prob()`and `.sample()` (for example, a PyTorch
                distribution) can be used.
            distance: Distance function to compare observed and simulated data. Can be
                a custom function or one of `l1`, `l2`, `mse`.
            num_workers: Number of parallel workers to use for simulations.
            simulation_batch_size: Number of parameter sets that the simulator
                maps to data x at once. If None, we simulate all parameter sets at the
                same time. If >= 1, the simulator has to process data of shape
                (simulation_batch_size, parameter_dimension).
            show_progress_bars: Whether to show a progressbar during simulation and
                sampling.
        """

        super().__init__(
            simulator=simulator,
            prior=prior,
            distance=distance,
            num_workers=num_workers,
            simulation_batch_size=simulation_batch_size,
            show_progress_bars=show_progress_bars,
        )

    def __call__(
        self,
        x_o: Union[Tensor, ndarray],
        num_simulations: int,
        eps: Optional[float] = None,
        quantile: Optional[float] = None,
        lra: bool = False,
        sass: bool = False,
        sass_fraction: float = 0.25,
        sass_expansion_degree: int = 1,
        kde: bool = False,
        kde_kwargs: Dict[str, Any] = {},
        return_summary: bool = False,
    ) -> Union[Tuple[Tensor, dict], Tuple[KDEWrapper, dict], Tensor, KDEWrapper]:
        r"""Run MCABC and return accepted parameters or KDE object fitted on them.

        Args:
            x_o: Observed data.
            num_simulations: Number of simulations to run.
            eps: Acceptance threshold $\epsilon$ for distance between observed and
                simulated data.
            quantile: Upper quantile of smallest distances for which the corresponding
                parameters are returned, e.g, q=0.01 will return the top 1%. Exactly
                one of quantile or `eps` have to be passed.
            lra: Whether to run linear regression adjustment as in Beaumont et al. 2002
            sass: Whether to determine semi-automatic summary statistics as in
                Fearnhead & Prangle 2012.
            sass_fraction: Fraction of simulation budget used for the initial sass run.
            sass_expansion_degree: Degree of the polynomial feature expansion for the
                sass regression, default 1 - no expansion.
            kde: Whether to run KDE on the accepted parameters to return a KDE
                object from which one can sample.
            kde_kwargs: kwargs for performing KDE:
                'bandwidth='; either a float, or a string naming a bandwidth
                heuristics, e.g., 'cv' (cross validation), 'silvermann' or 'scott',
                default 'cv'.
                'transform': transform applied to the parameters before doing KDE.
                'sample_weights': weights associated with samples. See 'get_kde' for
                more details
            return_summary: Whether to return the distances and data corresponding to
                the accepted parameters.

        Returns:
            theta (if kde False): accepted parameters
            kde (if kde True): KDE object based on accepted parameters from which one
                can .sample() and .log_prob().
            summary (if summary True): dictionary containing the accepted paramters (if
                kde True), distances and simulated data x.
        """

        # Exactly one of eps or quantile need to be passed.
        assert (eps is not None) ^ (
            quantile is not None
        ), "Eps or quantile must be passed, but not both."

        # Run SASS and change the simulator and x_o accordingly.
        if sass:
            num_pilot_simulations = int(sass_fraction * num_simulations)
            self.logger.info(
                f"Running SASS with {num_pilot_simulations} pilot samples."
            )
            num_simulations -= num_pilot_simulations

            pilot_theta = self.prior.sample((num_pilot_simulations,))
            pilot_x = self._batched_simulator(pilot_theta)

            sass_transform = self.get_sass_transform(
                pilot_theta, pilot_x, sass_expansion_degree
            )

            simulator = lambda theta: sass_transform(self._batched_simulator(theta))
            x_o = sass_transform(x_o)
        else:
            simulator = self._batched_simulator

        # Simulate and calculate distances.
        theta = self.prior.sample((num_simulations,))
        x = simulator(theta)

        # Infer shape of x to test and set x_o.
        self.x_shape = x[0].unsqueeze(0).shape
        self.x_o = process_x(x_o, self.x_shape)

        distances = self.distance(self.x_o, x)

        # Select based on acceptance threshold epsilon.
        if eps is not None:
            is_accepted = distances < eps
            num_accepted = is_accepted.sum().item()
            assert num_accepted > 0, f"No parameters accepted, eps={eps} too small"

            theta_accepted = theta[is_accepted]
            distances_accepted = distances[is_accepted]
            x_accepted = x[is_accepted]

        # Select based on quantile on sorted distances.
        elif quantile is not None:
            num_top_samples = int(num_simulations * quantile)
            sort_idx = torch.argsort(distances)
            theta_accepted = theta[sort_idx][:num_top_samples]
            distances_accepted = distances[sort_idx][:num_top_samples]
            x_accepted = x[sort_idx][:num_top_samples]

        else:
            raise ValueError("One of epsilon or quantile has to be passed.")

        # Maybe adjust theta with LRA.
        if lra:
            self.logger.info("Running Linear regression adjustment.")
            final_theta = self.run_lra(theta_accepted, x_accepted, observation=self.x_o)
        else:
            final_theta = theta_accepted

        if kde:
            self.logger.info(
                f"""KDE on {final_theta.shape[0]} samples with bandwidth option
                {kde_kwargs["bandwidth"] if "bandwidth" in kde_kwargs else "cv"}.
                Beware that KDE can give unreliable results when used with too few
                samples and in high dimensions."""
            )

            kde_dist = get_kde(final_theta, **kde_kwargs)

            if return_summary:
                return (
                    kde_dist,
                    dict(theta=final_theta, distances=distances_accepted, x=x_accepted),
                )
            else:
                return kde_dist
        elif return_summary:
            return final_theta, dict(distances=distances_accepted, x=x_accepted)
        else:
            return final_theta

__call__(x_o, num_simulations, eps=None, quantile=None, lra=False, sass=False, sass_fraction=0.25, sass_expansion_degree=1, kde=False, kde_kwargs={}, return_summary=False)

Run MCABC and return accepted parameters or KDE object fitted on them.

Parameters:

Name Type Description Default
x_o Union[Tensor, ndarray]

Observed data.

required
num_simulations int

Number of simulations to run.

required
eps Optional[float]

Acceptance threshold \(\epsilon\) for distance between observed and simulated data.

None
quantile Optional[float]

Upper quantile of smallest distances for which the corresponding parameters are returned, e.g, q=0.01 will return the top 1%. Exactly one of quantile or eps have to be passed.

None
lra bool

Whether to run linear regression adjustment as in Beaumont et al. 2002

False
sass bool

Whether to determine semi-automatic summary statistics as in Fearnhead & Prangle 2012.

False
sass_fraction float

Fraction of simulation budget used for the initial sass run.

0.25
sass_expansion_degree int

Degree of the polynomial feature expansion for the sass regression, default 1 - no expansion.

1
kde bool

Whether to run KDE on the accepted parameters to return a KDE object from which one can sample.

False
kde_kwargs Dict[str, Any]

kwargs for performing KDE: ‘bandwidth=’; either a float, or a string naming a bandwidth heuristics, e.g., ‘cv’ (cross validation), ‘silvermann’ or ‘scott’, default ‘cv’. ‘transform’: transform applied to the parameters before doing KDE. ‘sample_weights’: weights associated with samples. See ‘get_kde’ for more details

{}
return_summary bool

Whether to return the distances and data corresponding to the accepted parameters.

False

Returns:

Name Type Description
theta if kde False

accepted parameters

kde if kde True

KDE object based on accepted parameters from which one can .sample() and .log_prob().

summary if summary True

dictionary containing the accepted paramters (if kde True), distances and simulated data x.

Source code in sbi/inference/abc/mcabc.py
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
def __call__(
    self,
    x_o: Union[Tensor, ndarray],
    num_simulations: int,
    eps: Optional[float] = None,
    quantile: Optional[float] = None,
    lra: bool = False,
    sass: bool = False,
    sass_fraction: float = 0.25,
    sass_expansion_degree: int = 1,
    kde: bool = False,
    kde_kwargs: Dict[str, Any] = {},
    return_summary: bool = False,
) -> Union[Tuple[Tensor, dict], Tuple[KDEWrapper, dict], Tensor, KDEWrapper]:
    r"""Run MCABC and return accepted parameters or KDE object fitted on them.

    Args:
        x_o: Observed data.
        num_simulations: Number of simulations to run.
        eps: Acceptance threshold $\epsilon$ for distance between observed and
            simulated data.
        quantile: Upper quantile of smallest distances for which the corresponding
            parameters are returned, e.g, q=0.01 will return the top 1%. Exactly
            one of quantile or `eps` have to be passed.
        lra: Whether to run linear regression adjustment as in Beaumont et al. 2002
        sass: Whether to determine semi-automatic summary statistics as in
            Fearnhead & Prangle 2012.
        sass_fraction: Fraction of simulation budget used for the initial sass run.
        sass_expansion_degree: Degree of the polynomial feature expansion for the
            sass regression, default 1 - no expansion.
        kde: Whether to run KDE on the accepted parameters to return a KDE
            object from which one can sample.
        kde_kwargs: kwargs for performing KDE:
            'bandwidth='; either a float, or a string naming a bandwidth
            heuristics, e.g., 'cv' (cross validation), 'silvermann' or 'scott',
            default 'cv'.
            'transform': transform applied to the parameters before doing KDE.
            'sample_weights': weights associated with samples. See 'get_kde' for
            more details
        return_summary: Whether to return the distances and data corresponding to
            the accepted parameters.

    Returns:
        theta (if kde False): accepted parameters
        kde (if kde True): KDE object based on accepted parameters from which one
            can .sample() and .log_prob().
        summary (if summary True): dictionary containing the accepted paramters (if
            kde True), distances and simulated data x.
    """

    # Exactly one of eps or quantile need to be passed.
    assert (eps is not None) ^ (
        quantile is not None
    ), "Eps or quantile must be passed, but not both."

    # Run SASS and change the simulator and x_o accordingly.
    if sass:
        num_pilot_simulations = int(sass_fraction * num_simulations)
        self.logger.info(
            f"Running SASS with {num_pilot_simulations} pilot samples."
        )
        num_simulations -= num_pilot_simulations

        pilot_theta = self.prior.sample((num_pilot_simulations,))
        pilot_x = self._batched_simulator(pilot_theta)

        sass_transform = self.get_sass_transform(
            pilot_theta, pilot_x, sass_expansion_degree
        )

        simulator = lambda theta: sass_transform(self._batched_simulator(theta))
        x_o = sass_transform(x_o)
    else:
        simulator = self._batched_simulator

    # Simulate and calculate distances.
    theta = self.prior.sample((num_simulations,))
    x = simulator(theta)

    # Infer shape of x to test and set x_o.
    self.x_shape = x[0].unsqueeze(0).shape
    self.x_o = process_x(x_o, self.x_shape)

    distances = self.distance(self.x_o, x)

    # Select based on acceptance threshold epsilon.
    if eps is not None:
        is_accepted = distances < eps
        num_accepted = is_accepted.sum().item()
        assert num_accepted > 0, f"No parameters accepted, eps={eps} too small"

        theta_accepted = theta[is_accepted]
        distances_accepted = distances[is_accepted]
        x_accepted = x[is_accepted]

    # Select based on quantile on sorted distances.
    elif quantile is not None:
        num_top_samples = int(num_simulations * quantile)
        sort_idx = torch.argsort(distances)
        theta_accepted = theta[sort_idx][:num_top_samples]
        distances_accepted = distances[sort_idx][:num_top_samples]
        x_accepted = x[sort_idx][:num_top_samples]

    else:
        raise ValueError("One of epsilon or quantile has to be passed.")

    # Maybe adjust theta with LRA.
    if lra:
        self.logger.info("Running Linear regression adjustment.")
        final_theta = self.run_lra(theta_accepted, x_accepted, observation=self.x_o)
    else:
        final_theta = theta_accepted

    if kde:
        self.logger.info(
            f"""KDE on {final_theta.shape[0]} samples with bandwidth option
            {kde_kwargs["bandwidth"] if "bandwidth" in kde_kwargs else "cv"}.
            Beware that KDE can give unreliable results when used with too few
            samples and in high dimensions."""
        )

        kde_dist = get_kde(final_theta, **kde_kwargs)

        if return_summary:
            return (
                kde_dist,
                dict(theta=final_theta, distances=distances_accepted, x=x_accepted),
            )
        else:
            return kde_dist
    elif return_summary:
        return final_theta, dict(distances=distances_accepted, x=x_accepted)
    else:
        return final_theta

__init__(simulator, prior, distance='l2', num_workers=1, simulation_batch_size=1, show_progress_bars=True)

Monte-Carlo Approximate Bayesian Computation (Rejection ABC) [1].

[1] Pritchard, J. K., Seielstad, M. T., Perez-Lezaun, A., & Feldman, M. W. (1999). Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Molecular biology and evolution, 16(12), 1791-1798.

Parameters:

Name Type Description Default
simulator Callable

A function that takes parameters \(\theta\) and maps them to simulations, or observations, x, \(\mathrm{sim}(\theta)\to x\). Any regular Python callable (i.e. function or class with __call__ method) can be used.

required
prior

A probability distribution that expresses prior knowledge about the parameters, e.g. which ranges are meaningful for them. Any object with .log_prob()and .sample() (for example, a PyTorch distribution) can be used.

required
distance Union[str, Callable]

Distance function to compare observed and simulated data. Can be a custom function or one of l1, l2, mse.

'l2'
num_workers int

Number of parallel workers to use for simulations.

1
simulation_batch_size int

Number of parameter sets that the simulator maps to data x at once. If None, we simulate all parameter sets at the same time. If >= 1, the simulator has to process data of shape (simulation_batch_size, parameter_dimension).

1
show_progress_bars bool

Whether to show a progressbar during simulation and sampling.

True
Source code in sbi/inference/abc/mcabc.py
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
def __init__(
    self,
    simulator: Callable,
    prior,
    distance: Union[str, Callable] = "l2",
    num_workers: int = 1,
    simulation_batch_size: int = 1,
    show_progress_bars: bool = True,
):
    r"""Monte-Carlo Approximate Bayesian Computation (Rejection ABC) [1].

    [1] Pritchard, J. K., Seielstad, M. T., Perez-Lezaun, A., & Feldman, M. W.
    (1999). Population growth of human Y chromosomes: a study of Y chromosome
    microsatellites. Molecular biology and evolution, 16(12), 1791-1798.

    Args:
        simulator: A function that takes parameters $\theta$ and maps them to
            simulations, or observations, `x`, $\mathrm{sim}(\theta)\to x$. Any
            regular Python callable (i.e. function or class with `__call__` method)
            can be used.
        prior: A probability distribution that expresses prior knowledge about the
            parameters, e.g. which ranges are meaningful for them. Any
            object with `.log_prob()`and `.sample()` (for example, a PyTorch
            distribution) can be used.
        distance: Distance function to compare observed and simulated data. Can be
            a custom function or one of `l1`, `l2`, `mse`.
        num_workers: Number of parallel workers to use for simulations.
        simulation_batch_size: Number of parameter sets that the simulator
            maps to data x at once. If None, we simulate all parameter sets at the
            same time. If >= 1, the simulator has to process data of shape
            (simulation_batch_size, parameter_dimension).
        show_progress_bars: Whether to show a progressbar during simulation and
            sampling.
    """

    super().__init__(
        simulator=simulator,
        prior=prior,
        distance=distance,
        num_workers=num_workers,
        simulation_batch_size=simulation_batch_size,
        show_progress_bars=show_progress_bars,
    )

Bases: ABCBASE

Source code in sbi/inference/abc/smcabc.py
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
class SMCABC(ABCBASE):
    def __init__(
        self,
        simulator: Callable,
        prior: Distribution,
        distance: Union[str, Callable] = "l2",
        num_workers: int = 1,
        simulation_batch_size: int = 1,
        show_progress_bars: bool = True,
        kernel: Optional[str] = "gaussian",
        algorithm_variant: str = "C",
    ):
        r"""Sequential Monte Carlo Approximate Bayesian Computation.

        We distinguish between three different SMC methods here:
            - A: Toni et al. 2010 (Phd Thesis)
            - B: Sisson et al. 2007 (with correction from 2009)
            - C: Beaumont et al. 2009

        In Toni et al. 2010 we find an overview of the differences on page 34:
            - B: same as A except for resampling of weights if the effective sampling
                size is too small.
            - C: same as A except for calculation of the covariance of the perturbation
                kernel: the kernel covariance is a scaled version of the covariance of
                the previous population.

        Args:
            simulator: A function that takes parameters $\theta$ and maps them to
                simulations, or observations, `x`, $\mathrm{sim}(\theta)\to x$. Any
                regular Python callable (i.e. function or class with `__call__` method)
                can be used.
            prior: A probability distribution that expresses prior knowledge about the
                parameters, e.g. which ranges are meaningful for them. Any
                object with `.log_prob()`and `.sample()` (for example, a PyTorch
                distribution) can be used.
            distance: Distance function to compare observed and simulated data. Can be
                a custom function or one of `l1`, `l2`, `mse`.
            num_workers: Number of parallel workers to use for simulations.
            simulation_batch_size: Number of parameter sets that the simulator
                maps to data x at once. If None, we simulate all parameter sets at the
                same time. If >= 1, the simulator has to process data of shape
                (simulation_batch_size, parameter_dimension).
            show_progress_bars: Whether to show a progressbar during simulation and
                sampling.
            kernel: Perturbation kernel.
            algorithm_variant: Indicating the choice of algorithm variant, A, B, or C.

        """

        super().__init__(
            simulator=simulator,
            prior=prior,
            distance=distance,
            num_workers=num_workers,
            simulation_batch_size=simulation_batch_size,
            show_progress_bars=show_progress_bars,
        )

        kernels = ("gaussian", "uniform")
        assert (
            kernel in kernels
        ), f"Kernel '{kernel}' not supported. Choose one from {kernels}."
        self.kernel = kernel

        algorithm_variants = ("A", "B", "C")
        assert algorithm_variant in algorithm_variants, (
            f"SMCABC variant '{algorithm_variant}' not supported, choose one from"
            " {algorithm_variants}."
        )
        self.algorithm_variant = algorithm_variant
        self.distance_to_x0 = None
        self.simulation_counter = 0
        self.num_simulations = 0

        # Define simulator that keeps track of budget.
        def simulate_with_budget(theta):
            self.simulation_counter += theta.shape[0]
            return self._batched_simulator(theta)

        self._simulate_with_budget = simulate_with_budget

    def __call__(
        self,
        x_o: Union[Tensor, ndarray],
        num_particles: int,
        num_initial_pop: int,
        num_simulations: int,
        epsilon_decay: float,
        distance_based_decay: bool = False,
        ess_min: Optional[float] = None,
        kernel_variance_scale: float = 1.0,
        use_last_pop_samples: bool = True,
        return_summary: bool = False,
        kde: bool = False,
        kde_kwargs: Dict[str, Any] = {},
        kde_sample_weights: bool = False,
        lra: bool = False,
        lra_with_weights: bool = False,
        sass: bool = False,
        sass_fraction: float = 0.25,
        sass_expansion_degree: int = 1,
    ) -> Union[Tensor, KDEWrapper, Tuple[Tensor, dict], Tuple[KDEWrapper, dict]]:
        r"""Run SMCABC and return accepted parameters or KDE object fitted on them.

        Args:
            x_o: Observed data.
            num_particles: Number of particles in each population.
            num_initial_pop: Number of simulations used for initial population.
            num_simulations: Total number of possible simulations.
            epsilon_decay: Factor with which the acceptance threshold $\epsilon$ decays.
            distance_based_decay: Whether the $\epsilon$ decay is constant over
                populations or calculated from the previous populations distribution of
                distances.
            ess_min: Threshold of effective sampling size for resampling weights. Not
                used when None (default).
            kernel_variance_scale: Factor for scaling the perturbation kernel variance.
            use_last_pop_samples: Whether to fill up the current population with
                samples from the previous population when the budget is used up. If
                False, the current population is discarded and the previous population
                is returned.
            lra: Whether to run linear regression adjustment as in Beaumont et al. 2002
            lra_with_weights: Whether to run lra as weighted linear regression with SMC
                weights
            sass: Whether to determine semi-automatic summary statistics as in
                Fearnhead & Prangle 2012.
            sass_fraction: Fraction of simulation budget used for the initial sass run.
            sass_expansion_degree: Degree of the polynomial feature expansion for the
                sass regression, default 1 - no expansion.
            kde: Whether to run KDE on the accepted parameters to return a KDE
                object from which one can sample.
            kde_kwargs: kwargs for performing KDE:
                'bandwidth='; either a float, or a string naming a bandwidth
                heuristics, e.g., 'cv' (cross validation), 'silvermann' or 'scott',
                default 'cv'.
                'transform': transform applied to the parameters before doing KDE.
                'sample_weights': weights associated with samples. See 'get_kde' for
                more details
            kde_sample_weights: Whether perform weighted KDE with SMC weights or on raw
                particles.
            return_summary: Whether to return a dictionary with all accepted particles,
                weights, etc. at the end.

        Returns:
            theta (if kde False): accepted parameters of the last population.
            kde (if kde True): KDE object fitted on accepted parameters, from which one
                can .sample() and .log_prob().
            summary (if return_summary True): dictionary containing the accepted
                paramters (if kde True), distances and simulated data x of all
                populations.
        """

        pop_idx = 0
        self.num_simulations = num_simulations

        # Pilot run for SASS.
        if sass:
            num_pilot_simulations = int(sass_fraction * num_simulations)
            self.logger.info(
                f"Running SASS with {num_pilot_simulations} pilot samples."
            )
            sass_transform = self.run_sass_set_xo(
                num_particles, num_pilot_simulations, x_o, lra, sass_expansion_degree
            )
            # Udpate simulator and xo
            x_o = sass_transform(self.x_o)

            def sass_simulator(theta):
                self.simulation_counter += theta.shape[0]
                return sass_transform(self._batched_simulator(theta))

            self._simulate_with_budget = sass_simulator

        # run initial population
        particles, epsilon, distances, x = self._set_xo_and_sample_initial_population(
            x_o, num_particles, num_initial_pop
        )
        log_weights = torch.log(1 / num_particles * ones(num_particles))

        self.logger.info(
            (
                f"population={pop_idx}, eps={epsilon}, ess={1.0}, "
                f"num_sims={num_initial_pop}"
            )
        )

        all_particles = [particles]
        all_log_weights = [log_weights]
        all_distances = [distances]
        all_epsilons = [epsilon]
        all_x = [x]

        while self.simulation_counter < self.num_simulations:
            pop_idx += 1
            # Decay based on quantile of distances from previous pop.
            if distance_based_decay:
                epsilon = self._get_next_epsilon(
                    all_distances[pop_idx - 1], epsilon_decay
                )
            # Constant decay.
            else:
                epsilon *= epsilon_decay

            # Get kernel variance from previous pop.
            self.kernel_variance = self.get_kernel_variance(
                all_particles[pop_idx - 1],
                torch.exp(all_log_weights[pop_idx - 1]),
                samples_per_dim=500,
                kernel_variance_scale=kernel_variance_scale,
            )
            particles, log_weights, distances, x = self._sample_next_population(
                particles=all_particles[pop_idx - 1],
                log_weights=all_log_weights[pop_idx - 1],
                distances=all_distances[pop_idx - 1],
                epsilon=epsilon,
                x=all_x[pop_idx - 1],
                use_last_pop_samples=use_last_pop_samples,
            )

            # Resample population if effective sampling size is too small.
            if ess_min is not None:
                particles, log_weights = self.resample_if_ess_too_small(
                    particles, log_weights, ess_min, pop_idx
                )

            self.logger.info(
                (
                    f"population={pop_idx} done: eps={epsilon:.6f},"
                    f" num_sims={self.simulation_counter}."
                )
            )

            # collect results
            all_particles.append(particles)
            all_log_weights.append(log_weights)
            all_distances.append(distances)
            all_epsilons.append(epsilon)
            all_x.append(x)

        # Maybe run LRA and adjust weights.
        if lra:
            self.logger.info("Running Linear regression adjustment.")
            adjusted_particles, adjusted_weights = self.run_lra_update_weights(
                particles=all_particles[-1],
                xs=all_x[-1],
                observation=process_x(x_o),
                log_weights=all_log_weights[-1],
                lra_with_weights=lra_with_weights,
            )
            final_particles = adjusted_particles
        else:
            final_particles = all_particles[-1]

        if kde:
            self.logger.info(
                f"""KDE on {final_particles.shape[0]} samples with bandwidth option
                {kde_kwargs["bandwidth"] if "bandwidth" in kde_kwargs else "cv"}.
                Beware that KDE can give unreliable results when used with too few
                samples and in high dimensions."""
            )
            # Maybe get particles weights from last population for weighted KDE.
            if kde_sample_weights:
                kde_kwargs["sample_weights"] = all_log_weights[-1].exp()

            kde_dist = get_kde(final_particles, **kde_kwargs)

            if return_summary:
                return (
                    kde_dist,
                    dict(
                        particles=all_particles,
                        weights=all_log_weights,
                        epsilons=all_epsilons,
                        distances=all_distances,
                        xs=all_x,
                    ),
                )
            else:
                return kde_dist

        if return_summary:
            return (
                final_particles,
                dict(
                    particles=all_particles,
                    weights=all_log_weights,
                    epsilons=all_epsilons,
                    distances=all_distances,
                    xs=all_x,
                ),
            )
        else:
            return final_particles

    def _set_xo_and_sample_initial_population(
        self,
        x_o: Array,
        num_particles: int,
        num_initial_pop: int,
    ) -> Tuple[Tensor, float, Tensor, Tensor]:
        """Return particles, epsilon and distances of initial population."""

        assert (
            num_particles <= num_initial_pop
        ), "number of initial round simulations must be greater than population size"

        theta = self.prior.sample((num_initial_pop,))
        x = self._simulate_with_budget(theta)

        # Infer x shape to test and set x_o.
        self.x_shape = x[0].unsqueeze(0).shape
        self.x_o = process_x(x_o, self.x_shape)

        distances = self.distance(self.x_o, x)
        sortidx = torch.argsort(distances)
        particles = theta[sortidx][:num_particles]
        # Take last accepted distance as epsilon.
        initial_epsilon = distances[sortidx][num_particles - 1]

        if not torch.isfinite(initial_epsilon):
            initial_epsilon = 1e8

        return (
            particles,
            initial_epsilon,
            distances[sortidx][:num_particles],
            x[sortidx][:num_particles],
        )

    def _sample_next_population(
        self,
        particles: Tensor,
        log_weights: Tensor,
        distances: Tensor,
        epsilon: float,
        x: Tensor,
        use_last_pop_samples: bool = True,
    ) -> Tuple[Tensor, Tensor, Tensor, Tensor]:
        """Return particles, weights and distances of new population."""

        new_particles = []
        new_log_weights = []
        new_distances = []
        new_x = []

        num_accepted_particles = 0
        num_particles = particles.shape[0]

        while num_accepted_particles < num_particles:
            # Upperbound for batch size to not exceed simulation budget.
            num_batch = min(
                num_particles - num_accepted_particles,
                self.num_simulations - self.simulation_counter,
            )

            # Sample from previous population and perturb.
            particle_candidates = self._sample_and_perturb(
                particles, torch.exp(log_weights), num_samples=num_batch
            )
            # Simulate and select based on distance.
            x_candidates = self._simulate_with_budget(particle_candidates)
            dists = self.distance(self.x_o, x_candidates)
            is_accepted = dists <= epsilon
            num_accepted_batch = is_accepted.sum().item()

            if num_accepted_batch > 0:
                new_particles.append(particle_candidates[is_accepted])
                new_log_weights.append(
                    self._calculate_new_log_weights(
                        particle_candidates[is_accepted],
                        particles,
                        log_weights,
                    )
                )
                new_distances.append(dists[is_accepted])
                new_x.append(x_candidates[is_accepted])
                num_accepted_particles += num_accepted_batch

            # If simulation budget was exceeded and we still need particles, take
            # previous population or fill up with previous population.
            if (
                self.simulation_counter >= self.num_simulations
                and num_accepted_particles < num_particles
            ):
                if use_last_pop_samples:
                    num_remaining = num_particles - num_accepted_particles
                    self.logger.info(
                        f"""Simulation Budget exceeded, filling up with {num_remaining}
                        samples from last population."""
                    )
                    # Some new particles have been accepted already, therefore
                    # fill up the remaining once with old particles and weights.
                    new_particles.append(particles[:num_remaining, :])
                    # Recalculate weights with new particles.
                    new_log_weights = [
                        self._calculate_new_log_weights(
                            torch.cat(new_particles),
                            particles,
                            log_weights,
                        )
                    ]
                    new_distances.append(distances[:num_remaining])
                    new_x.append(x[:num_remaining])
                else:
                    self.logger.info(
                        "Simulation Budget exceeded, returning previous population."
                    )
                    new_particles = [particles]
                    new_log_weights = [log_weights]
                    new_distances = [distances]
                    new_x = [x]

                break

        # collect lists of tensors into tensors
        new_particles = torch.cat(new_particles)
        new_log_weights = torch.cat(new_log_weights)
        new_distances = torch.cat(new_distances)
        new_x = torch.cat(new_x)

        # normalize the new weights
        new_log_weights -= torch.logsumexp(new_log_weights, dim=0)

        # Return sorted wrt distances.
        sort_idx = torch.argsort(new_distances)

        return (
            new_particles[sort_idx],
            new_log_weights[sort_idx],
            new_distances[sort_idx],
            new_x[sort_idx],
        )

    def _get_next_epsilon(self, distances: Tensor, quantile: float) -> float:
        """Return epsilon for next round based on quantile of this round's distances.

        Note: distances are made unique to avoid repeated distances from simulations
        that result in the same observation.

        Args:
            distances: The distances accepted in this round.
            quantile: Quantile in the distance distribution to determine new epsilon.

        Returns:
            epsilon: Epsilon for the next population.
        """
        # Take unique distances to skip same distances simulations (return is sorted).
        distances = torch.unique(distances)
        # Cumsum as cdf proxy.
        distances_cdf = torch.cumsum(distances, dim=0) / distances.sum()
        # Take the q quantile of distances.
        try:
            qidx = torch.where(distances_cdf >= quantile)[0][0]
        except IndexError:
            self.logger.warning(
                (
                    f"Accepted unique distances={distances} don't match "
                    f"quantile={quantile:.2f}. Selecting last distance."
                )
            )
            qidx = -1

        # The new epsilon is given by that distance.
        return distances[qidx].item()

    def _calculate_new_log_weights(
        self,
        new_particles: Tensor,
        old_particles: Tensor,
        old_log_weights: Tensor,
    ) -> Tensor:
        """Return new log weights following formulas in publications A,B anc C."""

        # Prior can be batched across new particles.
        prior_log_probs = self.prior.log_prob(new_particles)

        # Contstruct function to get kernel log prob for given old particle.
        # The kernel is centered on each old particle as in all three variants (A,B,C).
        def kernel_log_prob(new_particle):
            return self.get_new_kernel(old_particles).log_prob(new_particle)

        # We still have to loop over particles here because
        # the kernel log probs are already batched across old particles.
        log_weighted_sum = tensor(
            [
                torch.logsumexp(old_log_weights + kernel_log_prob(new_particle), dim=0)
                for new_particle in new_particles
            ],
            dtype=torch.float32,
        )
        # new weights are prior probs over weighted sum:
        return prior_log_probs - log_weighted_sum

    @staticmethod
    def sample_from_population_with_weights(
        particles: Tensor, weights: Tensor, num_samples: int = 1
    ) -> Tensor:
        """Return samples from particles sampled with weights."""

        # define multinomial with weights as probs
        multi = Multinomial(probs=weights)
        # sample num samples, with replacement
        samples = multi.sample(sample_shape=torch.Size((num_samples,)))
        # get indices of success trials
        indices = torch.where(samples)[1]
        # return those indices from trace
        return particles[indices]

    def _sample_and_perturb(
        self, particles: Tensor, weights: Tensor, num_samples: int = 1
    ) -> Tensor:
        """Sample and perturb batch of new parameters from trace.

        Reject sampled and perturbed parameters outside of prior.
        """

        num_accepted = 0
        parameters = []
        while num_accepted < num_samples:
            parms = self.sample_from_population_with_weights(
                particles, weights, num_samples=num_samples - num_accepted
            )

            # Create kernel on params and perturb.
            parms_perturbed = self.get_new_kernel(parms).sample()

            is_within_prior = within_support(self.prior, parms_perturbed)
            num_accepted += int(is_within_prior.sum().item())

            if num_accepted > 0:
                parameters.append(parms_perturbed[is_within_prior])

        return torch.cat(parameters)

    def get_kernel_variance(
        self,
        particles: Tensor,
        weights: Tensor,
        samples_per_dim: int = 100,
        kernel_variance_scale: float = 1.0,
    ) -> Tensor:
        if self.kernel == "gaussian":
            # For variant C, Beaumont et al. 2009, the kernel variance comes from the
            # previous population.
            if self.algorithm_variant == "C":
                # Calculate weighted covariance of particles.
                population_cov = torch.tensor(
                    np.atleast_2d(np.cov(particles, rowvar=False, aweights=weights)),
                    dtype=torch.float32,
                )
                # Make sure variance is nonsingular.
                try:
                    torch.cholesky(kernel_variance_scale * population_cov)
                except RuntimeError:
                    self.logger.warning(
                        """"Singular particle covariance, using unit covariance."""
                    )
                    population_cov = torch.eye(particles.shape[1])
                return kernel_variance_scale * population_cov
            # While for Toni et al. and Sisson et al. it comes from the parameter
            # ranges.
            elif self.algorithm_variant in ("A", "B"):
                particle_ranges = self.get_particle_ranges(
                    particles, weights, samples_per_dim=samples_per_dim
                )
                return kernel_variance_scale * torch.diag(particle_ranges)
            else:
                raise ValueError(f"Variant, '{self.algorithm_variant}' not supported.")
        elif self.kernel == "uniform":
            # Variance spans the range of parameters for every dimension.
            return kernel_variance_scale * self.get_particle_ranges(
                particles, weights, samples_per_dim=samples_per_dim
            )
        else:
            raise ValueError(f"Kernel, '{self.kernel}' not supported.")

    def get_new_kernel(self, thetas: Tensor) -> Distribution:
        """Return new kernel distribution for a given set of paramters."""

        if self.kernel == "gaussian":
            assert self.kernel_variance.ndim == 2
            return MultivariateNormal(
                loc=thetas, covariance_matrix=self.kernel_variance
            )

        elif self.kernel == "uniform":
            low = thetas - self.kernel_variance
            high = thetas + self.kernel_variance
            # Move batch shape to event shape to get Uniform that is multivariate in
            # parameter dimension.
            return Uniform(low=low, high=high).to_event(1)
        else:
            raise ValueError(f"Kernel, '{self.kernel}' not supported.")

    def resample_if_ess_too_small(
        self,
        particles: Tensor,
        log_weights: Tensor,
        ess_min: float,
        pop_idx: int,
    ) -> Tuple[Tensor, Tensor]:
        """Return resampled particles and uniform weights if effectice sampling size is
        too small.
        """

        num_particles = particles.shape[0]
        ess = (1 / torch.sum(torch.exp(2.0 * log_weights), dim=0)) / num_particles
        # Resampling of weights for low ESS only for Sisson et al. 2007.
        if ess < ess_min:
            self.logger.info(f"ESS={ess:.2f} too low, resampling pop {pop_idx}...")
            # First resample, then set to uniform weights as in Sisson et al. 2007.
            particles = self.sample_from_population_with_weights(
                particles, torch.exp(log_weights), num_samples=num_particles
            )
            log_weights = torch.log(1 / num_particles * ones(num_particles))

        return particles, log_weights

    def run_lra_update_weights(
        self,
        particles: Tensor,
        xs: Tensor,
        observation: Tensor,
        log_weights: Tensor,
        lra_with_weights: bool,
    ) -> Tuple[Tensor, Tensor]:
        """Return particles and weights adjusted with LRA.

        Runs (weighted) linear regression from xs onto particles to adjust the
        particles.

        Updates the SMC weights according to the new particles.
        """

        adjusted_particels = self.run_lra(
            theta=particles,
            x=xs,
            observation=observation,
            sample_weight=log_weights.exp() if lra_with_weights else None,
        )

        # Update SMC weights with LRA adjusted weights
        adjusted_log_weights = self._calculate_new_log_weights(
            new_particles=adjusted_particels,
            old_particles=particles,
            old_log_weights=log_weights,
        )

        return adjusted_particels, adjusted_log_weights

    def run_sass_set_xo(
        self,
        num_particles: int,
        num_pilot_simulations: int,
        x_o,
        lra: bool = False,
        sass_expansion_degree: int = 1,
    ) -> Callable:
        """Return transform for semi-automatic summary statistics.

        Runs an single round of rejection abc with fixed budget and accepts
        num_particles simulations to run the regression for sass.

        Sets self.x_o once the x_shape can be derived from simulations.
        """
        (
            pilot_particles,
            _,
            _,
            pilot_xs,
        ) = self._set_xo_and_sample_initial_population(
            x_o, num_particles, num_pilot_simulations
        )
        # Adjust with LRA.
        if lra:
            pilot_particles = self.run_lra(pilot_particles, pilot_xs, self.x_o)
        sass_transform = self.get_sass_transform(
            pilot_particles,
            pilot_xs,
            expansion_degree=sass_expansion_degree,
            sample_weight=None,
        )
        return sass_transform

    def get_particle_ranges(
        self, particles: Tensor, weights: Tensor, samples_per_dim: int = 100
    ) -> Tensor:
        """Return range of particles in each parameter dimension."""

        # get weighted samples
        samples = self.sample_from_population_with_weights(
            particles,
            weights,
            num_samples=samples_per_dim * particles.shape[1],
        )

        # Variance spans the range of particles for every dimension.
        particle_ranges = samples.max(0).values - samples.min(0).values
        assert particle_ranges.ndim < 2
        return particle_ranges

__call__(x_o, num_particles, num_initial_pop, num_simulations, epsilon_decay, distance_based_decay=False, ess_min=None, kernel_variance_scale=1.0, use_last_pop_samples=True, return_summary=False, kde=False, kde_kwargs={}, kde_sample_weights=False, lra=False, lra_with_weights=False, sass=False, sass_fraction=0.25, sass_expansion_degree=1)

Run SMCABC and return accepted parameters or KDE object fitted on them.

Parameters:

Name Type Description Default
x_o Union[Tensor, ndarray]

Observed data.

required
num_particles int

Number of particles in each population.

required
num_initial_pop int

Number of simulations used for initial population.

required
num_simulations int

Total number of possible simulations.

required
epsilon_decay float

Factor with which the acceptance threshold \(\epsilon\) decays.

required
distance_based_decay bool

Whether the \(\epsilon\) decay is constant over populations or calculated from the previous populations distribution of distances.

False
ess_min Optional[float]

Threshold of effective sampling size for resampling weights. Not used when None (default).

None
kernel_variance_scale float

Factor for scaling the perturbation kernel variance.

1.0
use_last_pop_samples bool

Whether to fill up the current population with samples from the previous population when the budget is used up. If False, the current population is discarded and the previous population is returned.

True
lra bool

Whether to run linear regression adjustment as in Beaumont et al. 2002

False
lra_with_weights bool

Whether to run lra as weighted linear regression with SMC weights

False
sass bool

Whether to determine semi-automatic summary statistics as in Fearnhead & Prangle 2012.

False
sass_fraction float

Fraction of simulation budget used for the initial sass run.

0.25
sass_expansion_degree int

Degree of the polynomial feature expansion for the sass regression, default 1 - no expansion.

1
kde bool

Whether to run KDE on the accepted parameters to return a KDE object from which one can sample.

False
kde_kwargs Dict[str, Any]

kwargs for performing KDE: ‘bandwidth=’; either a float, or a string naming a bandwidth heuristics, e.g., ‘cv’ (cross validation), ‘silvermann’ or ‘scott’, default ‘cv’. ‘transform’: transform applied to the parameters before doing KDE. ‘sample_weights’: weights associated with samples. See ‘get_kde’ for more details

{}
kde_sample_weights bool

Whether perform weighted KDE with SMC weights or on raw particles.

False
return_summary bool

Whether to return a dictionary with all accepted particles, weights, etc. at the end.

False

Returns:

Name Type Description
theta if kde False

accepted parameters of the last population.

kde if kde True

KDE object fitted on accepted parameters, from which one can .sample() and .log_prob().

summary if return_summary True

dictionary containing the accepted paramters (if kde True), distances and simulated data x of all populations.

Source code in sbi/inference/abc/smcabc.py
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
def __call__(
    self,
    x_o: Union[Tensor, ndarray],
    num_particles: int,
    num_initial_pop: int,
    num_simulations: int,
    epsilon_decay: float,
    distance_based_decay: bool = False,
    ess_min: Optional[float] = None,
    kernel_variance_scale: float = 1.0,
    use_last_pop_samples: bool = True,
    return_summary: bool = False,
    kde: bool = False,
    kde_kwargs: Dict[str, Any] = {},
    kde_sample_weights: bool = False,
    lra: bool = False,
    lra_with_weights: bool = False,
    sass: bool = False,
    sass_fraction: float = 0.25,
    sass_expansion_degree: int = 1,
) -> Union[Tensor, KDEWrapper, Tuple[Tensor, dict], Tuple[KDEWrapper, dict]]:
    r"""Run SMCABC and return accepted parameters or KDE object fitted on them.

    Args:
        x_o: Observed data.
        num_particles: Number of particles in each population.
        num_initial_pop: Number of simulations used for initial population.
        num_simulations: Total number of possible simulations.
        epsilon_decay: Factor with which the acceptance threshold $\epsilon$ decays.
        distance_based_decay: Whether the $\epsilon$ decay is constant over
            populations or calculated from the previous populations distribution of
            distances.
        ess_min: Threshold of effective sampling size for resampling weights. Not
            used when None (default).
        kernel_variance_scale: Factor for scaling the perturbation kernel variance.
        use_last_pop_samples: Whether to fill up the current population with
            samples from the previous population when the budget is used up. If
            False, the current population is discarded and the previous population
            is returned.
        lra: Whether to run linear regression adjustment as in Beaumont et al. 2002
        lra_with_weights: Whether to run lra as weighted linear regression with SMC
            weights
        sass: Whether to determine semi-automatic summary statistics as in
            Fearnhead & Prangle 2012.
        sass_fraction: Fraction of simulation budget used for the initial sass run.
        sass_expansion_degree: Degree of the polynomial feature expansion for the
            sass regression, default 1 - no expansion.
        kde: Whether to run KDE on the accepted parameters to return a KDE
            object from which one can sample.
        kde_kwargs: kwargs for performing KDE:
            'bandwidth='; either a float, or a string naming a bandwidth
            heuristics, e.g., 'cv' (cross validation), 'silvermann' or 'scott',
            default 'cv'.
            'transform': transform applied to the parameters before doing KDE.
            'sample_weights': weights associated with samples. See 'get_kde' for
            more details
        kde_sample_weights: Whether perform weighted KDE with SMC weights or on raw
            particles.
        return_summary: Whether to return a dictionary with all accepted particles,
            weights, etc. at the end.

    Returns:
        theta (if kde False): accepted parameters of the last population.
        kde (if kde True): KDE object fitted on accepted parameters, from which one
            can .sample() and .log_prob().
        summary (if return_summary True): dictionary containing the accepted
            paramters (if kde True), distances and simulated data x of all
            populations.
    """

    pop_idx = 0
    self.num_simulations = num_simulations

    # Pilot run for SASS.
    if sass:
        num_pilot_simulations = int(sass_fraction * num_simulations)
        self.logger.info(
            f"Running SASS with {num_pilot_simulations} pilot samples."
        )
        sass_transform = self.run_sass_set_xo(
            num_particles, num_pilot_simulations, x_o, lra, sass_expansion_degree
        )
        # Udpate simulator and xo
        x_o = sass_transform(self.x_o)

        def sass_simulator(theta):
            self.simulation_counter += theta.shape[0]
            return sass_transform(self._batched_simulator(theta))

        self._simulate_with_budget = sass_simulator

    # run initial population
    particles, epsilon, distances, x = self._set_xo_and_sample_initial_population(
        x_o, num_particles, num_initial_pop
    )
    log_weights = torch.log(1 / num_particles * ones(num_particles))

    self.logger.info(
        (
            f"population={pop_idx}, eps={epsilon}, ess={1.0}, "
            f"num_sims={num_initial_pop}"
        )
    )

    all_particles = [particles]
    all_log_weights = [log_weights]
    all_distances = [distances]
    all_epsilons = [epsilon]
    all_x = [x]

    while self.simulation_counter < self.num_simulations:
        pop_idx += 1
        # Decay based on quantile of distances from previous pop.
        if distance_based_decay:
            epsilon = self._get_next_epsilon(
                all_distances[pop_idx - 1], epsilon_decay
            )
        # Constant decay.
        else:
            epsilon *= epsilon_decay

        # Get kernel variance from previous pop.
        self.kernel_variance = self.get_kernel_variance(
            all_particles[pop_idx - 1],
            torch.exp(all_log_weights[pop_idx - 1]),
            samples_per_dim=500,
            kernel_variance_scale=kernel_variance_scale,
        )
        particles, log_weights, distances, x = self._sample_next_population(
            particles=all_particles[pop_idx - 1],
            log_weights=all_log_weights[pop_idx - 1],
            distances=all_distances[pop_idx - 1],
            epsilon=epsilon,
            x=all_x[pop_idx - 1],
            use_last_pop_samples=use_last_pop_samples,
        )

        # Resample population if effective sampling size is too small.
        if ess_min is not None:
            particles, log_weights = self.resample_if_ess_too_small(
                particles, log_weights, ess_min, pop_idx
            )

        self.logger.info(
            (
                f"population={pop_idx} done: eps={epsilon:.6f},"
                f" num_sims={self.simulation_counter}."
            )
        )

        # collect results
        all_particles.append(particles)
        all_log_weights.append(log_weights)
        all_distances.append(distances)
        all_epsilons.append(epsilon)
        all_x.append(x)

    # Maybe run LRA and adjust weights.
    if lra:
        self.logger.info("Running Linear regression adjustment.")
        adjusted_particles, adjusted_weights = self.run_lra_update_weights(
            particles=all_particles[-1],
            xs=all_x[-1],
            observation=process_x(x_o),
            log_weights=all_log_weights[-1],
            lra_with_weights=lra_with_weights,
        )
        final_particles = adjusted_particles
    else:
        final_particles = all_particles[-1]

    if kde:
        self.logger.info(
            f"""KDE on {final_particles.shape[0]} samples with bandwidth option
            {kde_kwargs["bandwidth"] if "bandwidth" in kde_kwargs else "cv"}.
            Beware that KDE can give unreliable results when used with too few
            samples and in high dimensions."""
        )
        # Maybe get particles weights from last population for weighted KDE.
        if kde_sample_weights:
            kde_kwargs["sample_weights"] = all_log_weights[-1].exp()

        kde_dist = get_kde(final_particles, **kde_kwargs)

        if return_summary:
            return (
                kde_dist,
                dict(
                    particles=all_particles,
                    weights=all_log_weights,
                    epsilons=all_epsilons,
                    distances=all_distances,
                    xs=all_x,
                ),
            )
        else:
            return kde_dist

    if return_summary:
        return (
            final_particles,
            dict(
                particles=all_particles,
                weights=all_log_weights,
                epsilons=all_epsilons,
                distances=all_distances,
                xs=all_x,
            ),
        )
    else:
        return final_particles

__init__(simulator, prior, distance='l2', num_workers=1, simulation_batch_size=1, show_progress_bars=True, kernel='gaussian', algorithm_variant='C')

Sequential Monte Carlo Approximate Bayesian Computation.

We distinguish between three different SMC methods here
  • A: Toni et al. 2010 (Phd Thesis)
  • B: Sisson et al. 2007 (with correction from 2009)
  • C: Beaumont et al. 2009

In Toni et al. 2010 we find an overview of the differences on page 34: - B: same as A except for resampling of weights if the effective sampling size is too small. - C: same as A except for calculation of the covariance of the perturbation kernel: the kernel covariance is a scaled version of the covariance of the previous population.

Parameters:

Name Type Description Default
simulator Callable

A function that takes parameters \(\theta\) and maps them to simulations, or observations, x, \(\mathrm{sim}(\theta)\to x\). Any regular Python callable (i.e. function or class with __call__ method) can be used.

required
prior Distribution

A probability distribution that expresses prior knowledge about the parameters, e.g. which ranges are meaningful for them. Any object with .log_prob()and .sample() (for example, a PyTorch distribution) can be used.

required
distance Union[str, Callable]

Distance function to compare observed and simulated data. Can be a custom function or one of l1, l2, mse.

'l2'
num_workers int

Number of parallel workers to use for simulations.

1
simulation_batch_size int

Number of parameter sets that the simulator maps to data x at once. If None, we simulate all parameter sets at the same time. If >= 1, the simulator has to process data of shape (simulation_batch_size, parameter_dimension).

1
show_progress_bars bool

Whether to show a progressbar during simulation and sampling.

True
kernel Optional[str]

Perturbation kernel.

'gaussian'
algorithm_variant str

Indicating the choice of algorithm variant, A, B, or C.

'C'
Source code in sbi/inference/abc/smcabc.py
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
def __init__(
    self,
    simulator: Callable,
    prior: Distribution,
    distance: Union[str, Callable] = "l2",
    num_workers: int = 1,
    simulation_batch_size: int = 1,
    show_progress_bars: bool = True,
    kernel: Optional[str] = "gaussian",
    algorithm_variant: str = "C",
):
    r"""Sequential Monte Carlo Approximate Bayesian Computation.

    We distinguish between three different SMC methods here:
        - A: Toni et al. 2010 (Phd Thesis)
        - B: Sisson et al. 2007 (with correction from 2009)
        - C: Beaumont et al. 2009

    In Toni et al. 2010 we find an overview of the differences on page 34:
        - B: same as A except for resampling of weights if the effective sampling
            size is too small.
        - C: same as A except for calculation of the covariance of the perturbation
            kernel: the kernel covariance is a scaled version of the covariance of
            the previous population.

    Args:
        simulator: A function that takes parameters $\theta$ and maps them to
            simulations, or observations, `x`, $\mathrm{sim}(\theta)\to x$. Any
            regular Python callable (i.e. function or class with `__call__` method)
            can be used.
        prior: A probability distribution that expresses prior knowledge about the
            parameters, e.g. which ranges are meaningful for them. Any
            object with `.log_prob()`and `.sample()` (for example, a PyTorch
            distribution) can be used.
        distance: Distance function to compare observed and simulated data. Can be
            a custom function or one of `l1`, `l2`, `mse`.
        num_workers: Number of parallel workers to use for simulations.
        simulation_batch_size: Number of parameter sets that the simulator
            maps to data x at once. If None, we simulate all parameter sets at the
            same time. If >= 1, the simulator has to process data of shape
            (simulation_batch_size, parameter_dimension).
        show_progress_bars: Whether to show a progressbar during simulation and
            sampling.
        kernel: Perturbation kernel.
        algorithm_variant: Indicating the choice of algorithm variant, A, B, or C.

    """

    super().__init__(
        simulator=simulator,
        prior=prior,
        distance=distance,
        num_workers=num_workers,
        simulation_batch_size=simulation_batch_size,
        show_progress_bars=show_progress_bars,
    )

    kernels = ("gaussian", "uniform")
    assert (
        kernel in kernels
    ), f"Kernel '{kernel}' not supported. Choose one from {kernels}."
    self.kernel = kernel

    algorithm_variants = ("A", "B", "C")
    assert algorithm_variant in algorithm_variants, (
        f"SMCABC variant '{algorithm_variant}' not supported, choose one from"
        " {algorithm_variants}."
    )
    self.algorithm_variant = algorithm_variant
    self.distance_to_x0 = None
    self.simulation_counter = 0
    self.num_simulations = 0

    # Define simulator that keeps track of budget.
    def simulate_with_budget(theta):
        self.simulation_counter += theta.shape[0]
        return self._batched_simulator(theta)

    self._simulate_with_budget = simulate_with_budget

get_new_kernel(thetas)

Return new kernel distribution for a given set of paramters.

Source code in sbi/inference/abc/smcabc.py
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
def get_new_kernel(self, thetas: Tensor) -> Distribution:
    """Return new kernel distribution for a given set of paramters."""

    if self.kernel == "gaussian":
        assert self.kernel_variance.ndim == 2
        return MultivariateNormal(
            loc=thetas, covariance_matrix=self.kernel_variance
        )

    elif self.kernel == "uniform":
        low = thetas - self.kernel_variance
        high = thetas + self.kernel_variance
        # Move batch shape to event shape to get Uniform that is multivariate in
        # parameter dimension.
        return Uniform(low=low, high=high).to_event(1)
    else:
        raise ValueError(f"Kernel, '{self.kernel}' not supported.")

get_particle_ranges(particles, weights, samples_per_dim=100)

Return range of particles in each parameter dimension.

Source code in sbi/inference/abc/smcabc.py
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
def get_particle_ranges(
    self, particles: Tensor, weights: Tensor, samples_per_dim: int = 100
) -> Tensor:
    """Return range of particles in each parameter dimension."""

    # get weighted samples
    samples = self.sample_from_population_with_weights(
        particles,
        weights,
        num_samples=samples_per_dim * particles.shape[1],
    )

    # Variance spans the range of particles for every dimension.
    particle_ranges = samples.max(0).values - samples.min(0).values
    assert particle_ranges.ndim < 2
    return particle_ranges

resample_if_ess_too_small(particles, log_weights, ess_min, pop_idx)

Return resampled particles and uniform weights if effectice sampling size is too small.

Source code in sbi/inference/abc/smcabc.py
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
def resample_if_ess_too_small(
    self,
    particles: Tensor,
    log_weights: Tensor,
    ess_min: float,
    pop_idx: int,
) -> Tuple[Tensor, Tensor]:
    """Return resampled particles and uniform weights if effectice sampling size is
    too small.
    """

    num_particles = particles.shape[0]
    ess = (1 / torch.sum(torch.exp(2.0 * log_weights), dim=0)) / num_particles
    # Resampling of weights for low ESS only for Sisson et al. 2007.
    if ess < ess_min:
        self.logger.info(f"ESS={ess:.2f} too low, resampling pop {pop_idx}...")
        # First resample, then set to uniform weights as in Sisson et al. 2007.
        particles = self.sample_from_population_with_weights(
            particles, torch.exp(log_weights), num_samples=num_particles
        )
        log_weights = torch.log(1 / num_particles * ones(num_particles))

    return particles, log_weights

run_lra_update_weights(particles, xs, observation, log_weights, lra_with_weights)

Return particles and weights adjusted with LRA.

Runs (weighted) linear regression from xs onto particles to adjust the particles.

Updates the SMC weights according to the new particles.

Source code in sbi/inference/abc/smcabc.py
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
def run_lra_update_weights(
    self,
    particles: Tensor,
    xs: Tensor,
    observation: Tensor,
    log_weights: Tensor,
    lra_with_weights: bool,
) -> Tuple[Tensor, Tensor]:
    """Return particles and weights adjusted with LRA.

    Runs (weighted) linear regression from xs onto particles to adjust the
    particles.

    Updates the SMC weights according to the new particles.
    """

    adjusted_particels = self.run_lra(
        theta=particles,
        x=xs,
        observation=observation,
        sample_weight=log_weights.exp() if lra_with_weights else None,
    )

    # Update SMC weights with LRA adjusted weights
    adjusted_log_weights = self._calculate_new_log_weights(
        new_particles=adjusted_particels,
        old_particles=particles,
        old_log_weights=log_weights,
    )

    return adjusted_particels, adjusted_log_weights

run_sass_set_xo(num_particles, num_pilot_simulations, x_o, lra=False, sass_expansion_degree=1)

Return transform for semi-automatic summary statistics.

Runs an single round of rejection abc with fixed budget and accepts num_particles simulations to run the regression for sass.

Sets self.x_o once the x_shape can be derived from simulations.

Source code in sbi/inference/abc/smcabc.py
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
def run_sass_set_xo(
    self,
    num_particles: int,
    num_pilot_simulations: int,
    x_o,
    lra: bool = False,
    sass_expansion_degree: int = 1,
) -> Callable:
    """Return transform for semi-automatic summary statistics.

    Runs an single round of rejection abc with fixed budget and accepts
    num_particles simulations to run the regression for sass.

    Sets self.x_o once the x_shape can be derived from simulations.
    """
    (
        pilot_particles,
        _,
        _,
        pilot_xs,
    ) = self._set_xo_and_sample_initial_population(
        x_o, num_particles, num_pilot_simulations
    )
    # Adjust with LRA.
    if lra:
        pilot_particles = self.run_lra(pilot_particles, pilot_xs, self.x_o)
    sass_transform = self.get_sass_transform(
        pilot_particles,
        pilot_xs,
        expansion_degree=sass_expansion_degree,
        sample_weight=None,
    )
    return sass_transform

sample_from_population_with_weights(particles, weights, num_samples=1) staticmethod

Return samples from particles sampled with weights.

Source code in sbi/inference/abc/smcabc.py
507
508
509
510
511
512
513
514
515
516
517
518
519
520
@staticmethod
def sample_from_population_with_weights(
    particles: Tensor, weights: Tensor, num_samples: int = 1
) -> Tensor:
    """Return samples from particles sampled with weights."""

    # define multinomial with weights as probs
    multi = Multinomial(probs=weights)
    # sample num samples, with replacement
    samples = multi.sample(sample_shape=torch.Size((num_samples,)))
    # get indices of success trials
    indices = torch.where(samples)[1]
    # return those indices from trace
    return particles[indices]

Posteriors

Bases: NeuralPosterior

Posterior \(p(\theta|x_o)\) with log_prob() and sample() methods, only applicable to SNPE.

SNPE trains a neural network to directly approximate the posterior distribution. However, for bounded priors, the neural network can have leakage: it puts non-zero mass in regions where the prior is zero. The DirectPosterior class wraps the trained network to deal with these cases.

Specifically, this class offers the following functionality:
- correct the calculation of the log probability such that it compensates for the leakage.
- reject samples that lie outside of the prior bounds.

This class can not be used in combination with SNLE or SNRE.

Source code in sbi/inference/posteriors/direct_posterior.py
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
class DirectPosterior(NeuralPosterior):
    r"""Posterior $p(\theta|x_o)$ with `log_prob()` and `sample()` methods, only
    applicable to SNPE.<br/><br/>
    SNPE trains a neural network to directly approximate the posterior distribution.
    However, for bounded priors, the neural network can have leakage: it puts non-zero
    mass in regions where the prior is zero. The `DirectPosterior` class wraps the
    trained network to deal with these cases.<br/><br/>
    Specifically, this class offers the following functionality:<br/>
    - correct the calculation of the log probability such that it compensates for the
      leakage.<br/>
    - reject samples that lie outside of the prior bounds.<br/><br/>
    This class can not be used in combination with SNLE or SNRE.
    """

    def __init__(
        self,
        posterior_estimator: flows.Flow,
        prior: Distribution,
        max_sampling_batch_size: int = 10_000,
        device: Optional[str] = None,
        x_shape: Optional[torch.Size] = None,
        enable_transform: bool = True,
    ):
        """
        Args:
            prior: Prior distribution with `.log_prob()` and `.sample()`.
            posterior_estimator: The trained neural posterior.
            max_sampling_batch_size: Batchsize of samples being drawn from
                the proposal at every iteration.
            device: Training device, e.g., "cpu", "cuda" or "cuda:0". If None,
                `potential_fn.device` is used.
            x_shape: Shape of a single simulator output. If passed, it is used to check
                the shape of the observed data and give a descriptive error.
            enable_transform: Whether to transform parameters to unconstrained space
                during MAP optimization. When False, an identity transform will be
                returned for `theta_transform`.
        """
        # Because `DirectPosterior` does not take the `potential_fn` as input, it
        # builds it itself. The `potential_fn` and `theta_transform` are used only for
        # obtaining the MAP.
        check_prior(prior)
        potential_fn, theta_transform = posterior_estimator_based_potential(
            posterior_estimator,
            prior,
            x_o=None,
            enable_transform=enable_transform,
        )

        super().__init__(
            potential_fn=potential_fn,
            theta_transform=theta_transform,
            device=device,
            x_shape=x_shape,
        )

        self.prior = prior
        self.posterior_estimator = posterior_estimator

        self.max_sampling_batch_size = max_sampling_batch_size
        self._leakage_density_correction_factor = None

        self._purpose = """It samples the posterior network and rejects samples that
            lie outside of the prior bounds."""

    def sample(
        self,
        sample_shape: Shape = torch.Size(),
        x: Optional[Tensor] = None,
        max_sampling_batch_size: int = 10_000,
        sample_with: Optional[str] = None,
        show_progress_bars: bool = True,
    ) -> Tensor:
        r"""Return samples from posterior distribution $p(\theta|x)$.

        Args:
            sample_shape: Desired shape of samples that are drawn from posterior. If
                sample_shape is multidimensional we simply draw `sample_shape.numel()`
                samples and then reshape into the desired shape.
            sample_with: This argument only exists to keep backward-compatibility with
                `sbi` v0.17.2 or older. If it is set, we instantly raise an error.
            show_progress_bars: Whether to show sampling progress monitor.
        """

        num_samples = torch.Size(sample_shape).numel()
        x = self._x_else_default_x(x)
        max_sampling_batch_size = (
            self.max_sampling_batch_size
            if max_sampling_batch_size is None
            else max_sampling_batch_size
        )

        if sample_with is not None:
            raise ValueError(
                f"You set `sample_with={sample_with}`. As of sbi v0.18.0, setting "
                f"`sample_with` is no longer supported. You have to rerun "
                f"`.build_posterior(sample_with={sample_with}).`"
            )

        samples = accept_reject_sample(
            proposal=self.posterior_estimator,
            accept_reject_fn=lambda theta: within_support(self.prior, theta),
            num_samples=num_samples,
            show_progress_bars=show_progress_bars,
            max_sampling_batch_size=max_sampling_batch_size,
            proposal_sampling_kwargs={"context": x},
            alternative_method="build_posterior(..., sample_with='mcmc')",
        )[0]
        return samples

    def log_prob(
        self,
        theta: Tensor,
        x: Optional[Tensor] = None,
        norm_posterior: bool = True,
        track_gradients: bool = False,
        leakage_correction_params: Optional[dict] = None,
    ) -> Tensor:
        r"""Returns the log-probability of the posterior $p(\theta|x)$.

        Args:
            theta: Parameters $\theta$.
            norm_posterior: Whether to enforce a normalized posterior density.
                Renormalization of the posterior is useful when some
                probability falls out or leaks out of the prescribed prior support.
                The normalizing factor is calculated via rejection sampling, so if you
                need speedier but unnormalized log posterior estimates set here
                `norm_posterior=False`. The returned log posterior is set to
                -∞ outside of the prior support regardless of this setting.
            track_gradients: Whether the returned tensor supports tracking gradients.
                This can be helpful for e.g. sensitivity analysis, but increases memory
                consumption.
            leakage_correction_params: A `dict` of keyword arguments to override the
                default values of `leakage_correction()`. Possible options are:
                `num_rejection_samples`, `force_update`, `show_progress_bars`, and
                `rejection_sampling_batch_size`.
                These parameters only have an effect if `norm_posterior=True`.

        Returns:
            `(len(θ),)`-shaped log posterior probability $\log p(\theta|x)$ for θ in the
            support of the prior, -∞ (corresponding to 0 probability) outside.
        """
        x = self._x_else_default_x(x)

        # TODO Train exited here, entered after sampling?
        self.posterior_estimator.eval()

        theta = ensure_theta_batched(torch.as_tensor(theta))
        theta_repeated, x_repeated = match_theta_and_x_batch_shapes(theta, x)

        with torch.set_grad_enabled(track_gradients):
            # Evaluate on device, move back to cpu for comparison with prior.
            unnorm_log_prob = self.posterior_estimator.log_prob(
                theta_repeated, context=x_repeated
            )

            # Force probability to be zero outside prior support.
            in_prior_support = within_support(self.prior, theta_repeated)

            masked_log_prob = torch.where(
                in_prior_support,
                unnorm_log_prob,
                torch.tensor(float("-inf"), dtype=torch.float32, device=self._device),
            )

            if leakage_correction_params is None:
                leakage_correction_params = dict()  # use defaults
            log_factor = (
                log(self.leakage_correction(x=x, **leakage_correction_params))
                if norm_posterior
                else 0
            )

            return masked_log_prob - log_factor

    @torch.no_grad()
    def leakage_correction(
        self,
        x: Tensor,
        num_rejection_samples: int = 10_000,
        force_update: bool = False,
        show_progress_bars: bool = False,
        rejection_sampling_batch_size: int = 10_000,
    ) -> Tensor:
        r"""Return leakage correction factor for a leaky posterior density estimate.

        The factor is estimated from the acceptance probability during rejection
        sampling from the posterior.

        This is to avoid re-estimating the acceptance probability from scratch
        whenever `log_prob` is called and `norm_posterior=True`. Here, it
        is estimated only once for `self.default_x` and saved for later. We
        re-evaluate only whenever a new `x` is passed.

        Arguments:
            num_rejection_samples: Number of samples used to estimate correction factor.
            show_progress_bars: Whether to show a progress bar during sampling.
            rejection_sampling_batch_size: Batch size for rejection sampling.

        Returns:
            Saved or newly-estimated correction factor (as a scalar `Tensor`).
        """

        def acceptance_at(x: Tensor) -> Tensor:
            return accept_reject_sample(
                proposal=self.posterior_estimator,
                accept_reject_fn=lambda theta: within_support(self.prior, theta),
                num_samples=num_rejection_samples,
                show_progress_bars=show_progress_bars,
                sample_for_correction_factor=True,
                max_sampling_batch_size=rejection_sampling_batch_size,
                proposal_sampling_kwargs={"context": x},
            )[1]

        # Check if the provided x matches the default x (short-circuit on identity).
        is_new_x = self.default_x is None or (
            x is not self.default_x and (x != self.default_x).any()
        )

        not_saved_at_default_x = self._leakage_density_correction_factor is None

        if is_new_x:  # Calculate at x; don't save.
            return acceptance_at(x)
        elif not_saved_at_default_x or force_update:  # Calculate at default_x; save.
            assert self.default_x is not None
            self._leakage_density_correction_factor = acceptance_at(self.default_x)

        return self._leakage_density_correction_factor  # type: ignore

    def map(
        self,
        x: Optional[Tensor] = None,
        num_iter: int = 1_000,
        num_to_optimize: int = 100,
        learning_rate: float = 0.01,
        init_method: Union[str, Tensor] = "posterior",
        num_init_samples: int = 1_000,
        save_best_every: int = 10,
        show_progress_bars: bool = False,
        force_update: bool = False,
    ) -> Tensor:
        r"""Returns the maximum-a-posteriori estimate (MAP).

        The method can be interrupted (Ctrl-C) when the user sees that the
        log-probability converges. The best estimate will be saved in `self._map` and
        can be accessed with `self.map()`. The MAP is obtained by running gradient
        ascent from a given number of starting positions (samples from the posterior
        with the highest log-probability). After the optimization is done, we select the
        parameter set that has the highest log-probability after the optimization.

        Warning: The default values used by this function are not well-tested. They
        might require hand-tuning for the problem at hand.

        For developers: if the prior is a `BoxUniform`, we carry out the optimization
        in unbounded space and transform the result back into bounded space.

        Args:
            x: Deprecated - use `.set_default_x()` prior to `.map()`.
            num_iter: Number of optimization steps that the algorithm takes
                to find the MAP.
            learning_rate: Learning rate of the optimizer.
            init_method: How to select the starting parameters for the optimization. If
                it is a string, it can be either [`posterior`, `prior`], which samples
                the respective distribution `num_init_samples` times. If it is a
                tensor, the tensor will be used as init locations.
            num_init_samples: Draw this number of samples from the posterior and
                evaluate the log-probability of all of them.
            num_to_optimize: From the drawn `num_init_samples`, use the
                `num_to_optimize` with highest log-probability as the initial points
                for the optimization.
            save_best_every: The best log-probability is computed, saved in the
                `map`-attribute, and printed every `save_best_every`-th iteration.
                Computing the best log-probability creates a significant overhead
                (thus, the default is `10`.)
            show_progress_bars: Whether to show a progressbar during sampling from the
                posterior.
            force_update: Whether to re-calculate the MAP when x is unchanged and
                have a cached value.
            log_prob_kwargs: Will be empty for SNLE and SNRE. Will contain
                {'norm_posterior': True} for SNPE.

        Returns:
            The MAP estimate.
        """
        return super().map(
            x=x,
            num_iter=num_iter,
            num_to_optimize=num_to_optimize,
            learning_rate=learning_rate,
            init_method=init_method,
            num_init_samples=num_init_samples,
            save_best_every=save_best_every,
            show_progress_bars=show_progress_bars,
            force_update=force_update,
        )

__init__(posterior_estimator, prior, max_sampling_batch_size=10000, device=None, x_shape=None, enable_transform=True)

Parameters:

Name Type Description Default
prior Distribution

Prior distribution with .log_prob() and .sample().

required
posterior_estimator Flow

The trained neural posterior.

required
max_sampling_batch_size int

Batchsize of samples being drawn from the proposal at every iteration.

10000
device Optional[str]

Training device, e.g., “cpu”, “cuda” or “cuda:0”. If None, potential_fn.device is used.

None
x_shape Optional[Size]

Shape of a single simulator output. If passed, it is used to check the shape of the observed data and give a descriptive error.

None
enable_transform bool

Whether to transform parameters to unconstrained space during MAP optimization. When False, an identity transform will be returned for theta_transform.

True
Source code in sbi/inference/posteriors/direct_posterior.py
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
def __init__(
    self,
    posterior_estimator: flows.Flow,
    prior: Distribution,
    max_sampling_batch_size: int = 10_000,
    device: Optional[str] = None,
    x_shape: Optional[torch.Size] = None,
    enable_transform: bool = True,
):
    """
    Args:
        prior: Prior distribution with `.log_prob()` and `.sample()`.
        posterior_estimator: The trained neural posterior.
        max_sampling_batch_size: Batchsize of samples being drawn from
            the proposal at every iteration.
        device: Training device, e.g., "cpu", "cuda" or "cuda:0". If None,
            `potential_fn.device` is used.
        x_shape: Shape of a single simulator output. If passed, it is used to check
            the shape of the observed data and give a descriptive error.
        enable_transform: Whether to transform parameters to unconstrained space
            during MAP optimization. When False, an identity transform will be
            returned for `theta_transform`.
    """
    # Because `DirectPosterior` does not take the `potential_fn` as input, it
    # builds it itself. The `potential_fn` and `theta_transform` are used only for
    # obtaining the MAP.
    check_prior(prior)
    potential_fn, theta_transform = posterior_estimator_based_potential(
        posterior_estimator,
        prior,
        x_o=None,
        enable_transform=enable_transform,
    )

    super().__init__(
        potential_fn=potential_fn,
        theta_transform=theta_transform,
        device=device,
        x_shape=x_shape,
    )

    self.prior = prior
    self.posterior_estimator = posterior_estimator

    self.max_sampling_batch_size = max_sampling_batch_size
    self._leakage_density_correction_factor = None

    self._purpose = """It samples the posterior network and rejects samples that
        lie outside of the prior bounds."""

leakage_correction(x, num_rejection_samples=10000, force_update=False, show_progress_bars=False, rejection_sampling_batch_size=10000)

Return leakage correction factor for a leaky posterior density estimate.

The factor is estimated from the acceptance probability during rejection sampling from the posterior.

This is to avoid re-estimating the acceptance probability from scratch whenever log_prob is called and norm_posterior=True. Here, it is estimated only once for self.default_x and saved for later. We re-evaluate only whenever a new x is passed.

Parameters:

Name Type Description Default
num_rejection_samples int

Number of samples used to estimate correction factor.

10000
show_progress_bars bool

Whether to show a progress bar during sampling.

False
rejection_sampling_batch_size int

Batch size for rejection sampling.

10000

Returns:

Type Description
Tensor

Saved or newly-estimated correction factor (as a scalar Tensor).

Source code in sbi/inference/posteriors/direct_posterior.py
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
@torch.no_grad()
def leakage_correction(
    self,
    x: Tensor,
    num_rejection_samples: int = 10_000,
    force_update: bool = False,
    show_progress_bars: bool = False,
    rejection_sampling_batch_size: int = 10_000,
) -> Tensor:
    r"""Return leakage correction factor for a leaky posterior density estimate.

    The factor is estimated from the acceptance probability during rejection
    sampling from the posterior.

    This is to avoid re-estimating the acceptance probability from scratch
    whenever `log_prob` is called and `norm_posterior=True`. Here, it
    is estimated only once for `self.default_x` and saved for later. We
    re-evaluate only whenever a new `x` is passed.

    Arguments:
        num_rejection_samples: Number of samples used to estimate correction factor.
        show_progress_bars: Whether to show a progress bar during sampling.
        rejection_sampling_batch_size: Batch size for rejection sampling.

    Returns:
        Saved or newly-estimated correction factor (as a scalar `Tensor`).
    """

    def acceptance_at(x: Tensor) -> Tensor:
        return accept_reject_sample(
            proposal=self.posterior_estimator,
            accept_reject_fn=lambda theta: within_support(self.prior, theta),
            num_samples=num_rejection_samples,
            show_progress_bars=show_progress_bars,
            sample_for_correction_factor=True,
            max_sampling_batch_size=rejection_sampling_batch_size,
            proposal_sampling_kwargs={"context": x},
        )[1]

    # Check if the provided x matches the default x (short-circuit on identity).
    is_new_x = self.default_x is None or (
        x is not self.default_x and (x != self.default_x).any()
    )

    not_saved_at_default_x = self._leakage_density_correction_factor is None

    if is_new_x:  # Calculate at x; don't save.
        return acceptance_at(x)
    elif not_saved_at_default_x or force_update:  # Calculate at default_x; save.
        assert self.default_x is not None
        self._leakage_density_correction_factor = acceptance_at(self.default_x)

    return self._leakage_density_correction_factor  # type: ignore

log_prob(theta, x=None, norm_posterior=True, track_gradients=False, leakage_correction_params=None)

Returns the log-probability of the posterior \(p(\theta|x)\).

Parameters:

Name Type Description Default
theta Tensor

Parameters \(\theta\).

required
norm_posterior bool

Whether to enforce a normalized posterior density. Renormalization of the posterior is useful when some probability falls out or leaks out of the prescribed prior support. The normalizing factor is calculated via rejection sampling, so if you need speedier but unnormalized log posterior estimates set here norm_posterior=False. The returned log posterior is set to -∞ outside of the prior support regardless of this setting.

True
track_gradients bool

Whether the returned tensor supports tracking gradients. This can be helpful for e.g. sensitivity analysis, but increases memory consumption.

False
leakage_correction_params Optional[dict]

A dict of keyword arguments to override the default values of leakage_correction(). Possible options are: num_rejection_samples, force_update, show_progress_bars, and rejection_sampling_batch_size. These parameters only have an effect if norm_posterior=True.

None

Returns:

Type Description
Tensor

(len(θ),)-shaped log posterior probability \(\log p(\theta|x)\) for θ in the

Tensor

support of the prior, -∞ (corresponding to 0 probability) outside.

Source code in sbi/inference/posteriors/direct_posterior.py
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
def log_prob(
    self,
    theta: Tensor,
    x: Optional[Tensor] = None,
    norm_posterior: bool = True,
    track_gradients: bool = False,
    leakage_correction_params: Optional[dict] = None,
) -> Tensor:
    r"""Returns the log-probability of the posterior $p(\theta|x)$.

    Args:
        theta: Parameters $\theta$.
        norm_posterior: Whether to enforce a normalized posterior density.
            Renormalization of the posterior is useful when some
            probability falls out or leaks out of the prescribed prior support.
            The normalizing factor is calculated via rejection sampling, so if you
            need speedier but unnormalized log posterior estimates set here
            `norm_posterior=False`. The returned log posterior is set to
            -∞ outside of the prior support regardless of this setting.
        track_gradients: Whether the returned tensor supports tracking gradients.
            This can be helpful for e.g. sensitivity analysis, but increases memory
            consumption.
        leakage_correction_params: A `dict` of keyword arguments to override the
            default values of `leakage_correction()`. Possible options are:
            `num_rejection_samples`, `force_update`, `show_progress_bars`, and
            `rejection_sampling_batch_size`.
            These parameters only have an effect if `norm_posterior=True`.

    Returns:
        `(len(θ),)`-shaped log posterior probability $\log p(\theta|x)$ for θ in the
        support of the prior, -∞ (corresponding to 0 probability) outside.
    """
    x = self._x_else_default_x(x)

    # TODO Train exited here, entered after sampling?
    self.posterior_estimator.eval()

    theta = ensure_theta_batched(torch.as_tensor(theta))
    theta_repeated, x_repeated = match_theta_and_x_batch_shapes(theta, x)

    with torch.set_grad_enabled(track_gradients):
        # Evaluate on device, move back to cpu for comparison with prior.
        unnorm_log_prob = self.posterior_estimator.log_prob(
            theta_repeated, context=x_repeated
        )

        # Force probability to be zero outside prior support.
        in_prior_support = within_support(self.prior, theta_repeated)

        masked_log_prob = torch.where(
            in_prior_support,
            unnorm_log_prob,
            torch.tensor(float("-inf"), dtype=torch.float32, device=self._device),
        )

        if leakage_correction_params is None:
            leakage_correction_params = dict()  # use defaults
        log_factor = (
            log(self.leakage_correction(x=x, **leakage_correction_params))
            if norm_posterior
            else 0
        )

        return masked_log_prob - log_factor

map(x=None, num_iter=1000, num_to_optimize=100, learning_rate=0.01, init_method='posterior', num_init_samples=1000, save_best_every=10, show_progress_bars=False, force_update=False)

Returns the maximum-a-posteriori estimate (MAP).

The method can be interrupted (Ctrl-C) when the user sees that the log-probability converges. The best estimate will be saved in self._map and can be accessed with self.map(). The MAP is obtained by running gradient ascent from a given number of starting positions (samples from the posterior with the highest log-probability). After the optimization is done, we select the parameter set that has the highest log-probability after the optimization.

Warning: The default values used by this function are not well-tested. They might require hand-tuning for the problem at hand.

For developers: if the prior is a BoxUniform, we carry out the optimization in unbounded space and transform the result back into bounded space.

Parameters:

Name Type Description Default
x Optional[Tensor]

Deprecated - use .set_default_x() prior to .map().

None
num_iter int

Number of optimization steps that the algorithm takes to find the MAP.

1000
learning_rate float

Learning rate of the optimizer.

0.01
init_method Union[str, Tensor]

How to select the starting parameters for the optimization. If it is a string, it can be either [posterior, prior], which samples the respective distribution num_init_samples times. If it is a tensor, the tensor will be used as init locations.

'posterior'
num_init_samples int

Draw this number of samples from the posterior and evaluate the log-probability of all of them.

1000
num_to_optimize int

From the drawn num_init_samples, use the num_to_optimize with highest log-probability as the initial points for the optimization.

100
save_best_every int

The best log-probability is computed, saved in the map-attribute, and printed every save_best_every-th iteration. Computing the best log-probability creates a significant overhead (thus, the default is 10.)

10
show_progress_bars bool

Whether to show a progressbar during sampling from the posterior.

False
force_update bool

Whether to re-calculate the MAP when x is unchanged and have a cached value.

False
log_prob_kwargs

Will be empty for SNLE and SNRE. Will contain {‘norm_posterior’: True} for SNPE.

required

Returns:

Type Description
Tensor

The MAP estimate.

Source code in sbi/inference/posteriors/direct_posterior.py
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
def map(
    self,
    x: Optional[Tensor] = None,
    num_iter: int = 1_000,
    num_to_optimize: int = 100,
    learning_rate: float = 0.01,
    init_method: Union[str, Tensor] = "posterior",
    num_init_samples: int = 1_000,
    save_best_every: int = 10,
    show_progress_bars: bool = False,
    force_update: bool = False,
) -> Tensor:
    r"""Returns the maximum-a-posteriori estimate (MAP).

    The method can be interrupted (Ctrl-C) when the user sees that the
    log-probability converges. The best estimate will be saved in `self._map` and
    can be accessed with `self.map()`. The MAP is obtained by running gradient
    ascent from a given number of starting positions (samples from the posterior
    with the highest log-probability). After the optimization is done, we select the
    parameter set that has the highest log-probability after the optimization.

    Warning: The default values used by this function are not well-tested. They
    might require hand-tuning for the problem at hand.

    For developers: if the prior is a `BoxUniform`, we carry out the optimization
    in unbounded space and transform the result back into bounded space.

    Args:
        x: Deprecated - use `.set_default_x()` prior to `.map()`.
        num_iter: Number of optimization steps that the algorithm takes
            to find the MAP.
        learning_rate: Learning rate of the optimizer.
        init_method: How to select the starting parameters for the optimization. If
            it is a string, it can be either [`posterior`, `prior`], which samples
            the respective distribution `num_init_samples` times. If it is a
            tensor, the tensor will be used as init locations.
        num_init_samples: Draw this number of samples from the posterior and
            evaluate the log-probability of all of them.
        num_to_optimize: From the drawn `num_init_samples`, use the
            `num_to_optimize` with highest log-probability as the initial points
            for the optimization.
        save_best_every: The best log-probability is computed, saved in the
            `map`-attribute, and printed every `save_best_every`-th iteration.
            Computing the best log-probability creates a significant overhead
            (thus, the default is `10`.)
        show_progress_bars: Whether to show a progressbar during sampling from the
            posterior.
        force_update: Whether to re-calculate the MAP when x is unchanged and
            have a cached value.
        log_prob_kwargs: Will be empty for SNLE and SNRE. Will contain
            {'norm_posterior': True} for SNPE.

    Returns:
        The MAP estimate.
    """
    return super().map(
        x=x,
        num_iter=num_iter,
        num_to_optimize=num_to_optimize,
        learning_rate=learning_rate,
        init_method=init_method,
        num_init_samples=num_init_samples,
        save_best_every=save_best_every,
        show_progress_bars=show_progress_bars,
        force_update=force_update,
    )

sample(sample_shape=torch.Size(), x=None, max_sampling_batch_size=10000, sample_with=None, show_progress_bars=True)

Return samples from posterior distribution \(p(\theta|x)\).

Parameters:

Name Type Description Default
sample_shape Shape

Desired shape of samples that are drawn from posterior. If sample_shape is multidimensional we simply draw sample_shape.numel() samples and then reshape into the desired shape.

Size()
sample_with Optional[str]

This argument only exists to keep backward-compatibility with sbi v0.17.2 or older. If it is set, we instantly raise an error.

None
show_progress_bars bool

Whether to show sampling progress monitor.

True
Source code in sbi/inference/posteriors/direct_posterior.py
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
def sample(
    self,
    sample_shape: Shape = torch.Size(),
    x: Optional[Tensor] = None,
    max_sampling_batch_size: int = 10_000,
    sample_with: Optional[str] = None,
    show_progress_bars: bool = True,
) -> Tensor:
    r"""Return samples from posterior distribution $p(\theta|x)$.

    Args:
        sample_shape: Desired shape of samples that are drawn from posterior. If
            sample_shape is multidimensional we simply draw `sample_shape.numel()`
            samples and then reshape into the desired shape.
        sample_with: This argument only exists to keep backward-compatibility with
            `sbi` v0.17.2 or older. If it is set, we instantly raise an error.
        show_progress_bars: Whether to show sampling progress monitor.
    """

    num_samples = torch.Size(sample_shape).numel()
    x = self._x_else_default_x(x)
    max_sampling_batch_size = (
        self.max_sampling_batch_size
        if max_sampling_batch_size is None
        else max_sampling_batch_size
    )

    if sample_with is not None:
        raise ValueError(
            f"You set `sample_with={sample_with}`. As of sbi v0.18.0, setting "
            f"`sample_with` is no longer supported. You have to rerun "
            f"`.build_posterior(sample_with={sample_with}).`"
        )

    samples = accept_reject_sample(
        proposal=self.posterior_estimator,
        accept_reject_fn=lambda theta: within_support(self.prior, theta),
        num_samples=num_samples,
        show_progress_bars=show_progress_bars,
        max_sampling_batch_size=max_sampling_batch_size,
        proposal_sampling_kwargs={"context": x},
        alternative_method="build_posterior(..., sample_with='mcmc')",
    )[0]
    return samples

Bases: NeuralPosterior

Provides importance sampling to sample from the posterior.

SNLE or SNRE train neural networks to approximate the likelihood(-ratios). ImportanceSamplingPosterior allows to estimate the posterior log-probability by estimating the normlalization constant with importance sampling. It also allows to perform importance sampling (with .sample()) and to draw approximate samples with sampling-importance-resampling (SIR) (with .sir_sample())

Source code in sbi/inference/posteriors/importance_posterior.py
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
class ImportanceSamplingPosterior(NeuralPosterior):
    r"""Provides importance sampling to sample from the posterior.<br/><br/>
    SNLE or SNRE train neural networks to approximate the likelihood(-ratios).
    `ImportanceSamplingPosterior` allows to estimate the posterior log-probability by
    estimating the normlalization constant with importance sampling. It also allows to
    perform importance sampling (with `.sample()`) and to draw approximate samples with
    sampling-importance-resampling (SIR) (with `.sir_sample()`)
    """

    def __init__(
        self,
        potential_fn: Callable,
        proposal: Any,
        theta_transform: Optional[TorchTransform] = None,
        method: str = "sir",
        oversampling_factor: int = 32,
        max_sampling_batch_size: int = 10_000,
        device: Optional[str] = None,
        x_shape: Optional[torch.Size] = None,
    ):
        """
        Args:
            potential_fn: The potential function from which to draw samples.
            proposal: The proposal distribution.
            theta_transform: Transformation that is applied to parameters. Is not used
                during but only when calling `.map()`.
            method: Either of [`sir`|`importance`]. This sets the behavior of the
                `.sample()` method. With `sir`, approximate posterior samples are
                generated with sampling importance resampling (SIR). With
                `importance`, the `.sample()` method returns a tuple of samples and
                corresponding importance weights.
            oversampling_factor: Number of proposed samples from which only one is
                selected based on its importance weight.
            max_sampling_batch_size: The batch size of samples being drawn from the
                proposal at every iteration.
            device: Device on which to sample, e.g., "cpu", "cuda" or "cuda:0". If
                None, `potential_fn.device` is used.
            x_shape: Shape of a single simulator output. If passed, it is used to check
                the shape of the observed data and give a descriptive error.
        """
        super().__init__(
            potential_fn,
            theta_transform=theta_transform,
            device=device,
            x_shape=x_shape,
        )

        self.proposal = proposal
        self._normalization_constant = None
        self.method = method

        self.oversampling_factor = oversampling_factor
        self.max_sampling_batch_size = max_sampling_batch_size

        self._purpose = (
            "It provides sampling-importance resampling (SIR) to .sample() from the "
            "posterior and can evaluate the _unnormalized_ posterior density with "
            ".log_prob()."
        )

    def log_prob(
        self,
        theta: Tensor,
        x: Optional[Tensor] = None,
        track_gradients: bool = False,
        normalization_constant_params: Optional[dict] = None,
    ) -> Tensor:
        r"""Returns the log-probability of theta under the posterior.

        The normalization constant is estimated with importance sampling.

        Args:
            theta: Parameters $\theta$.
            track_gradients: Whether the returned tensor supports tracking gradients.
                This can be helpful for e.g. sensitivity analysis, but increases memory
                consumption.
            normalization_constant_params: Parameters passed on to
                `estimate_normalization_constant()`.

        Returns:
            `len($\theta$)`-shaped log-probability.
        """
        x = self._x_else_default_x(x)
        self.potential_fn.set_x(x)

        theta = ensure_theta_batched(torch.as_tensor(theta))

        with torch.set_grad_enabled(track_gradients):
            potential_values = self.potential_fn(
                theta.to(self._device), track_gradients=track_gradients
            )

            if normalization_constant_params is None:
                normalization_constant_params = dict()  # use defaults
            normalization_constant = self.estimate_normalization_constant(
                x, **normalization_constant_params
            )

            return (potential_values - torch.log(normalization_constant)).to(
                self._device
            )

    @torch.no_grad()
    def estimate_normalization_constant(
        self, x: Tensor, num_samples: int = 10_000, force_update: bool = False
    ) -> Tensor:
        """Returns the normalization constant via importance sampling.

        Args:
            num_samples: Number of importance samples used for the estimate.
            force_update: Whether to re-calculate the normlization constant when x is
                unchanged and have a cached value.
        """
        # Check if the provided x matches the default x (short-circuit on identity).
        is_new_x = self.default_x is None or (
            x is not self.default_x and (x != self.default_x).any()
        )

        not_saved_at_default_x = self._normalization_constant is None

        if is_new_x:  # Calculate at x; don't save.
            _, log_importance_weights = importance_sample(
                self.potential_fn,
                proposal=self.proposal,
                num_samples=num_samples,
            )
            return torch.mean(torch.exp(log_importance_weights))
        elif not_saved_at_default_x or force_update:  # Calculate at default_x; save.
            assert self.default_x is not None
            _, log_importance_weights = importance_sample(
                self.potential_fn,
                proposal=self.proposal,
                num_samples=num_samples,
            )
            self._normalization_constant = torch.mean(torch.exp(log_importance_weights))

        return self._normalization_constant.to(self._device)  # type: ignore

    def sample(
        self,
        sample_shape: Shape = torch.Size(),
        x: Optional[Tensor] = None,
        oversampling_factor: int = 32,
        max_sampling_batch_size: int = 10_000,
        sample_with: Optional[str] = None,
    ) -> Union[Tensor, Tuple[Tensor, Tensor]]:
        """Return samples from the approximate posterior distribution.

        Args:
            sample_shape: _description_
            x: _description_
        """
        if sample_with is not None:
            raise ValueError(
                f"You set `sample_with={sample_with}`. As of sbi v0.18.0, setting "
                f"`sample_with` is no longer supported. You have to rerun "
                f"`.build_posterior(sample_with={sample_with}).`"
            )

        self.potential_fn.set_x(self._x_else_default_x(x))

        if self.method == "sir":
            return self._sir_sample(
                sample_shape,
                oversampling_factor=oversampling_factor,
                max_sampling_batch_size=max_sampling_batch_size,
            )
        elif self.method == "importance":
            return self._importance_sample(sample_shape)
        else:
            raise NameError

    def _importance_sample(
        self,
        sample_shape: Shape = torch.Size(),
    ) -> Tuple[Tensor, Tensor]:
        """Returns samples from the proposal and log of their importance weights.

        Args:
            sample_shape: Desired shape of samples that are drawn from posterior.
            sample_with: This argument only exists to keep backward-compatibility with
                `sbi` v0.17.2 or older. If it is set, we instantly raise an error.

        Returns:
            Samples and logarithm of corresponding importance weights.
        """
        num_samples = torch.Size(sample_shape).numel()
        samples, log_importance_weights = importance_sample(
            self.potential_fn,
            proposal=self.proposal,
            num_samples=num_samples,
        )

        samples = samples.reshape((*sample_shape, -1)).to(self._device)
        return samples, log_importance_weights.to(self._device)

    def _sir_sample(
        self,
        sample_shape: Shape = torch.Size(),
        oversampling_factor: int = 32,
        max_sampling_batch_size: int = 10_000,
        show_progress_bars: bool = True,
    ):
        r"""Returns approximate samples from posterior $p(\theta|x)$ via SIR.

        Args:
            sample_shape: Desired shape of samples that are drawn from posterior. If
                sample_shape is multidimensional we simply draw `sample_shape.numel()`
                samples and then reshape into the desired shape.
            x: Observed data.
            sample_with: This argument only exists to keep backward-compatibility with
                `sbi` v0.17.2 or older. If it is set, we instantly raise an error.
            oversampling_factor: Number of proposed samples form which only one is
                selected based on its importance weight.
            max_sampling_batch_size: The batchsize of samples being drawn from
                the proposal at every iteration. Used only in `sir_sample()`.
            show_progress_bars: Whether to show sampling progress monitor.

        Returns:
            Samples from posterior.
        """
        # Replace arguments that were not passed with their default.
        oversampling_factor = (
            self.oversampling_factor
            if oversampling_factor is None
            else oversampling_factor
        )
        max_sampling_batch_size = (
            self.max_sampling_batch_size
            if max_sampling_batch_size is None
            else max_sampling_batch_size
        )

        num_samples = torch.Size(sample_shape).numel()
        samples = sampling_importance_resampling(
            self.potential_fn,
            proposal=self.proposal,
            num_samples=num_samples,
            oversampling_factor=oversampling_factor,
            show_progress_bars=show_progress_bars,
            max_sampling_batch_size=max_sampling_batch_size,
            device=self._device,
        )

        return samples.reshape((*sample_shape, -1)).to(self._device)

    def map(
        self,
        x: Optional[Tensor] = None,
        num_iter: int = 1_000,
        num_to_optimize: int = 100,
        learning_rate: float = 0.01,
        init_method: Union[str, Tensor] = "proposal",
        num_init_samples: int = 1_000,
        save_best_every: int = 10,
        show_progress_bars: bool = False,
        force_update: bool = False,
    ) -> Tensor:
        r"""Returns the maximum-a-posteriori estimate (MAP).

        The method can be interrupted (Ctrl-C) when the user sees that the
        log-probability converges. The best estimate will be saved in `self._map` and
        can be accessed with `self.map()`. The MAP is obtained by running gradient
        ascent from a given number of starting positions (samples from the posterior
        with the highest log-probability). After the optimization is done, we select the
        parameter set that has the highest log-probability after the optimization.

        Warning: The default values used by this function are not well-tested. They
        might require hand-tuning for the problem at hand.

        For developers: if the prior is a `BoxUniform`, we carry out the optimization
        in unbounded space and transform the result back into bounded space.

        Args:
            x: Deprecated - use `.set_default_x()` prior to `.map()`.
            num_iter: Number of optimization steps that the algorithm takes
                to find the MAP.
            learning_rate: Learning rate of the optimizer.
            init_method: How to select the starting parameters for the optimization. If
                it is a string, it can be either [`posterior`, `prior`], which samples
                the respective distribution `num_init_samples` times. If it is a
                tensor, the tensor will be used as init locations.
            num_init_samples: Draw this number of samples from the posterior and
                evaluate the log-probability of all of them.
            num_to_optimize: From the drawn `num_init_samples`, use the
                `num_to_optimize` with highest log-probability as the initial points
                for the optimization.
            save_best_every: The best log-probability is computed, saved in the
                `map`-attribute, and printed every `save_best_every`-th iteration.
                Computing the best log-probability creates a significant overhead
                (thus, the default is `10`.)
            show_progress_bars: Whether to show a progressbar during sampling from the
                posterior.
            force_update: Whether to re-calculate the MAP when x is unchanged and
                have a cached value.
            log_prob_kwargs: Will be empty for SNLE and SNRE. Will contain
                {'norm_posterior': True} for SNPE.

        Returns:
            The MAP estimate.
        """
        return super().map(
            x=x,
            num_iter=num_iter,
            num_to_optimize=num_to_optimize,
            learning_rate=learning_rate,
            init_method=init_method,
            num_init_samples=num_init_samples,
            save_best_every=save_best_every,
            show_progress_bars=show_progress_bars,
            force_update=force_update,
        )

__init__(potential_fn, proposal, theta_transform=None, method='sir', oversampling_factor=32, max_sampling_batch_size=10000, device=None, x_shape=None)

Parameters:

Name Type Description Default
potential_fn Callable

The potential function from which to draw samples.

required
proposal Any

The proposal distribution.

required
theta_transform Optional[TorchTransform]

Transformation that is applied to parameters. Is not used during but only when calling .map().

None
method str

Either of [sir|importance]. This sets the behavior of the .sample() method. With sir, approximate posterior samples are generated with sampling importance resampling (SIR). With importance, the .sample() method returns a tuple of samples and corresponding importance weights.

'sir'
oversampling_factor int

Number of proposed samples from which only one is selected based on its importance weight.

32
max_sampling_batch_size int

The batch size of samples being drawn from the proposal at every iteration.

10000
device Optional[str]

Device on which to sample, e.g., “cpu”, “cuda” or “cuda:0”. If None, potential_fn.device is used.

None
x_shape Optional[Size]

Shape of a single simulator output. If passed, it is used to check the shape of the observed data and give a descriptive error.

None
Source code in sbi/inference/posteriors/importance_posterior.py
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
def __init__(
    self,
    potential_fn: Callable,
    proposal: Any,
    theta_transform: Optional[TorchTransform] = None,
    method: str = "sir",
    oversampling_factor: int = 32,
    max_sampling_batch_size: int = 10_000,
    device: Optional[str] = None,
    x_shape: Optional[torch.Size] = None,
):
    """
    Args:
        potential_fn: The potential function from which to draw samples.
        proposal: The proposal distribution.
        theta_transform: Transformation that is applied to parameters. Is not used
            during but only when calling `.map()`.
        method: Either of [`sir`|`importance`]. This sets the behavior of the
            `.sample()` method. With `sir`, approximate posterior samples are
            generated with sampling importance resampling (SIR). With
            `importance`, the `.sample()` method returns a tuple of samples and
            corresponding importance weights.
        oversampling_factor: Number of proposed samples from which only one is
            selected based on its importance weight.
        max_sampling_batch_size: The batch size of samples being drawn from the
            proposal at every iteration.
        device: Device on which to sample, e.g., "cpu", "cuda" or "cuda:0". If
            None, `potential_fn.device` is used.
        x_shape: Shape of a single simulator output. If passed, it is used to check
            the shape of the observed data and give a descriptive error.
    """
    super().__init__(
        potential_fn,
        theta_transform=theta_transform,
        device=device,
        x_shape=x_shape,
    )

    self.proposal = proposal
    self._normalization_constant = None
    self.method = method

    self.oversampling_factor = oversampling_factor
    self.max_sampling_batch_size = max_sampling_batch_size

    self._purpose = (
        "It provides sampling-importance resampling (SIR) to .sample() from the "
        "posterior and can evaluate the _unnormalized_ posterior density with "
        ".log_prob()."
    )

estimate_normalization_constant(x, num_samples=10000, force_update=False)

Returns the normalization constant via importance sampling.

Parameters:

Name Type Description Default
num_samples int

Number of importance samples used for the estimate.

10000
force_update bool

Whether to re-calculate the normlization constant when x is unchanged and have a cached value.

False
Source code in sbi/inference/posteriors/importance_posterior.py
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
@torch.no_grad()
def estimate_normalization_constant(
    self, x: Tensor, num_samples: int = 10_000, force_update: bool = False
) -> Tensor:
    """Returns the normalization constant via importance sampling.

    Args:
        num_samples: Number of importance samples used for the estimate.
        force_update: Whether to re-calculate the normlization constant when x is
            unchanged and have a cached value.
    """
    # Check if the provided x matches the default x (short-circuit on identity).
    is_new_x = self.default_x is None or (
        x is not self.default_x and (x != self.default_x).any()
    )

    not_saved_at_default_x = self._normalization_constant is None

    if is_new_x:  # Calculate at x; don't save.
        _, log_importance_weights = importance_sample(
            self.potential_fn,
            proposal=self.proposal,
            num_samples=num_samples,
        )
        return torch.mean(torch.exp(log_importance_weights))
    elif not_saved_at_default_x or force_update:  # Calculate at default_x; save.
        assert self.default_x is not None
        _, log_importance_weights = importance_sample(
            self.potential_fn,
            proposal=self.proposal,
            num_samples=num_samples,
        )
        self._normalization_constant = torch.mean(torch.exp(log_importance_weights))

    return self._normalization_constant.to(self._device)  # type: ignore

log_prob(theta, x=None, track_gradients=False, normalization_constant_params=None)

Returns the log-probability of theta under the posterior.

The normalization constant is estimated with importance sampling.

Parameters:

Name Type Description Default
theta Tensor

Parameters \(\theta\).

required
track_gradients bool

Whether the returned tensor supports tracking gradients. This can be helpful for e.g. sensitivity analysis, but increases memory consumption.

False
normalization_constant_params Optional[dict]

Parameters passed on to estimate_normalization_constant().

None

Returns:

Type Description
Tensor

len($\theta$)-shaped log-probability.

Source code in sbi/inference/posteriors/importance_posterior.py
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
def log_prob(
    self,
    theta: Tensor,
    x: Optional[Tensor] = None,
    track_gradients: bool = False,
    normalization_constant_params: Optional[dict] = None,
) -> Tensor:
    r"""Returns the log-probability of theta under the posterior.

    The normalization constant is estimated with importance sampling.

    Args:
        theta: Parameters $\theta$.
        track_gradients: Whether the returned tensor supports tracking gradients.
            This can be helpful for e.g. sensitivity analysis, but increases memory
            consumption.
        normalization_constant_params: Parameters passed on to
            `estimate_normalization_constant()`.

    Returns:
        `len($\theta$)`-shaped log-probability.
    """
    x = self._x_else_default_x(x)
    self.potential_fn.set_x(x)

    theta = ensure_theta_batched(torch.as_tensor(theta))

    with torch.set_grad_enabled(track_gradients):
        potential_values = self.potential_fn(
            theta.to(self._device), track_gradients=track_gradients
        )

        if normalization_constant_params is None:
            normalization_constant_params = dict()  # use defaults
        normalization_constant = self.estimate_normalization_constant(
            x, **normalization_constant_params
        )

        return (potential_values - torch.log(normalization_constant)).to(
            self._device
        )

map(x=None, num_iter=1000, num_to_optimize=100, learning_rate=0.01, init_method='proposal', num_init_samples=1000, save_best_every=10, show_progress_bars=False, force_update=False)

Returns the maximum-a-posteriori estimate (MAP).

The method can be interrupted (Ctrl-C) when the user sees that the log-probability converges. The best estimate will be saved in self._map and can be accessed with self.map(). The MAP is obtained by running gradient ascent from a given number of starting positions (samples from the posterior with the highest log-probability). After the optimization is done, we select the parameter set that has the highest log-probability after the optimization.

Warning: The default values used by this function are not well-tested. They might require hand-tuning for the problem at hand.

For developers: if the prior is a BoxUniform, we carry out the optimization in unbounded space and transform the result back into bounded space.

Parameters:

Name Type Description Default
x Optional[Tensor]

Deprecated - use .set_default_x() prior to .map().

None
num_iter int

Number of optimization steps that the algorithm takes to find the MAP.

1000
learning_rate float

Learning rate of the optimizer.

0.01
init_method Union[str, Tensor]

How to select the starting parameters for the optimization. If it is a string, it can be either [posterior, prior], which samples the respective distribution num_init_samples times. If it is a tensor, the tensor will be used as init locations.

'proposal'
num_init_samples int

Draw this number of samples from the posterior and evaluate the log-probability of all of them.

1000
num_to_optimize int

From the drawn num_init_samples, use the num_to_optimize with highest log-probability as the initial points for the optimization.

100
save_best_every int

The best log-probability is computed, saved in the map-attribute, and printed every save_best_every-th iteration. Computing the best log-probability creates a significant overhead (thus, the default is 10.)

10
show_progress_bars bool

Whether to show a progressbar during sampling from the posterior.

False
force_update bool

Whether to re-calculate the MAP when x is unchanged and have a cached value.

False
log_prob_kwargs

Will be empty for SNLE and SNRE. Will contain {‘norm_posterior’: True} for SNPE.

required

Returns:

Type Description
Tensor

The MAP estimate.

Source code in sbi/inference/posteriors/importance_posterior.py
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
def map(
    self,
    x: Optional[Tensor] = None,
    num_iter: int = 1_000,
    num_to_optimize: int = 100,
    learning_rate: float = 0.01,
    init_method: Union[str, Tensor] = "proposal",
    num_init_samples: int = 1_000,
    save_best_every: int = 10,
    show_progress_bars: bool = False,
    force_update: bool = False,
) -> Tensor:
    r"""Returns the maximum-a-posteriori estimate (MAP).

    The method can be interrupted (Ctrl-C) when the user sees that the
    log-probability converges. The best estimate will be saved in `self._map` and
    can be accessed with `self.map()`. The MAP is obtained by running gradient
    ascent from a given number of starting positions (samples from the posterior
    with the highest log-probability). After the optimization is done, we select the
    parameter set that has the highest log-probability after the optimization.

    Warning: The default values used by this function are not well-tested. They
    might require hand-tuning for the problem at hand.

    For developers: if the prior is a `BoxUniform`, we carry out the optimization
    in unbounded space and transform the result back into bounded space.

    Args:
        x: Deprecated - use `.set_default_x()` prior to `.map()`.
        num_iter: Number of optimization steps that the algorithm takes
            to find the MAP.
        learning_rate: Learning rate of the optimizer.
        init_method: How to select the starting parameters for the optimization. If
            it is a string, it can be either [`posterior`, `prior`], which samples
            the respective distribution `num_init_samples` times. If it is a
            tensor, the tensor will be used as init locations.
        num_init_samples: Draw this number of samples from the posterior and
            evaluate the log-probability of all of them.
        num_to_optimize: From the drawn `num_init_samples`, use the
            `num_to_optimize` with highest log-probability as the initial points
            for the optimization.
        save_best_every: The best log-probability is computed, saved in the
            `map`-attribute, and printed every `save_best_every`-th iteration.
            Computing the best log-probability creates a significant overhead
            (thus, the default is `10`.)
        show_progress_bars: Whether to show a progressbar during sampling from the
            posterior.
        force_update: Whether to re-calculate the MAP when x is unchanged and
            have a cached value.
        log_prob_kwargs: Will be empty for SNLE and SNRE. Will contain
            {'norm_posterior': True} for SNPE.

    Returns:
        The MAP estimate.
    """
    return super().map(
        x=x,
        num_iter=num_iter,
        num_to_optimize=num_to_optimize,
        learning_rate=learning_rate,
        init_method=init_method,
        num_init_samples=num_init_samples,
        save_best_every=save_best_every,
        show_progress_bars=show_progress_bars,
        force_update=force_update,
    )

sample(sample_shape=torch.Size(), x=None, oversampling_factor=32, max_sampling_batch_size=10000, sample_with=None)

Return samples from the approximate posterior distribution.

Parameters:

Name Type Description Default
sample_shape Shape

description

Size()
x Optional[Tensor]

description

None
Source code in sbi/inference/posteriors/importance_posterior.py
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
def sample(
    self,
    sample_shape: Shape = torch.Size(),
    x: Optional[Tensor] = None,
    oversampling_factor: int = 32,
    max_sampling_batch_size: int = 10_000,
    sample_with: Optional[str] = None,
) -> Union[Tensor, Tuple[Tensor, Tensor]]:
    """Return samples from the approximate posterior distribution.

    Args:
        sample_shape: _description_
        x: _description_
    """
    if sample_with is not None:
        raise ValueError(
            f"You set `sample_with={sample_with}`. As of sbi v0.18.0, setting "
            f"`sample_with` is no longer supported. You have to rerun "
            f"`.build_posterior(sample_with={sample_with}).`"
        )

    self.potential_fn.set_x(self._x_else_default_x(x))

    if self.method == "sir":
        return self._sir_sample(
            sample_shape,
            oversampling_factor=oversampling_factor,
            max_sampling_batch_size=max_sampling_batch_size,
        )
    elif self.method == "importance":
        return self._importance_sample(sample_shape)
    else:
        raise NameError

Bases: NeuralPosterior

Provides MCMC to sample from the posterior.

SNLE or SNRE train neural networks to approximate the likelihood(-ratios). MCMCPosterior allows to sample from the posterior with MCMC.

Source code in sbi/inference/posteriors/mcmc_posterior.py
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
class MCMCPosterior(NeuralPosterior):
    r"""Provides MCMC to sample from the posterior.<br/><br/>
    SNLE or SNRE train neural networks to approximate the likelihood(-ratios).
    `MCMCPosterior` allows to sample from the posterior with MCMC.
    """

    def __init__(
        self,
        potential_fn: Callable,
        proposal: Any,
        theta_transform: Optional[TorchTransform] = None,
        method: str = "slice_np",
        thin: int = 10,
        warmup_steps: int = 10,
        num_chains: int = 1,
        init_strategy: str = "resample",
        init_strategy_parameters: Dict[str, Any] = {},
        init_strategy_num_candidates: Optional[int] = None,
        num_workers: int = 1,
        device: Optional[str] = None,
        x_shape: Optional[torch.Size] = None,
    ):
        """
        Args:
            potential_fn: The potential function from which to draw samples.
            proposal: Proposal distribution that is used to initialize the MCMC chain.
            theta_transform: Transformation that will be applied during sampling.
                Allows to perform MCMC in unconstrained space.
            method: Method used for MCMC sampling, one of `slice_np`,
                `slice_np_vectorized`, `slice`, `hmc`, `nuts`. `slice_np` is a custom
                numpy implementation of slice sampling. `slice_np_vectorized` is
                identical to `slice_np`, but if `num_chains>1`, the chains are
                vectorized for `slice_np_vectorized` whereas they are run sequentially
                for `slice_np`. The samplers `hmc`, `nuts` or `slice` sample with Pyro.
            thin: The thinning factor for the chain.
            warmup_steps: The initial number of samples to discard.
            num_chains: The number of chains.
            init_strategy: The initialisation strategy for chains; `proposal` will draw
                init locations from `proposal`, whereas `sir` will use Sequential-
                Importance-Resampling (SIR). SIR initially samples
                `init_strategy_num_candidates` from the `proposal`, evaluates all of
                them under the `potential_fn` and `proposal`, and then resamples the
                initial locations with weights proportional to `exp(potential_fn -
                proposal.log_prob`. `resample` is the same as `sir` but
                uses `exp(potential_fn)` as weights.
            init_strategy_parameters: Dictionary of keyword arguments passed to the
                init strategy, e.g., for `init_strategy=sir` this could be
                `num_candidate_samples`, i.e., the number of candidates to to find init
                locations (internal default is `1000`), or `device`.
            init_strategy_num_candidates: Number of candidates to to find init
                 locations in `init_strategy=sir` (deprecated, use
                 init_strategy_parameters instead).
            num_workers: number of cpu cores used to parallelize mcmc
            device: Training device, e.g., "cpu", "cuda" or "cuda:0". If None,
                `potential_fn.device` is used.
            x_shape: Shape of a single simulator output. If passed, it is used to check
                the shape of the observed data and give a descriptive error.
        """

        super().__init__(
            potential_fn,
            theta_transform=theta_transform,
            device=device,
            x_shape=x_shape,
        )

        self.proposal = proposal
        self.method = method
        self.thin = thin
        self.warmup_steps = warmup_steps
        self.num_chains = num_chains
        self.init_strategy = init_strategy
        self.init_strategy_parameters = init_strategy_parameters
        self.num_workers = num_workers
        self._posterior_sampler = None
        # Hardcode parameter name to reduce clutter kwargs.
        self.param_name = "theta"

        if init_strategy_num_candidates is not None:
            warn(
                """Passing `init_strategy_num_candidates` is deprecated as of sbi
                v0.19.0. Instead, use e.g.,
                `init_strategy_parameters={"num_candidate_samples": 1000}`"""
            )
            self.init_strategy_parameters[
                "num_candidate_samples"
            ] = init_strategy_num_candidates

        self.potential_ = self._prepare_potential(method)

        self._purpose = (
            "It provides MCMC to .sample() from the posterior and "
            "can evaluate the _unnormalized_ posterior density with .log_prob()."
        )

    @property
    def mcmc_method(self) -> str:
        """Returns MCMC method."""
        return self._mcmc_method

    @mcmc_method.setter
    def mcmc_method(self, method: str) -> None:
        """See `set_mcmc_method`."""
        self.set_mcmc_method(method)

    @property
    def posterior_sampler(self):
        """Returns sampler created by `sample`."""
        return self._posterior_sampler

    def set_mcmc_method(self, method: str) -> "NeuralPosterior":
        """Sets sampling method to for MCMC and returns `NeuralPosterior`.

        Args:
            method: Method to use.

        Returns:
            `NeuralPosterior` for chainable calls.
        """
        self._mcmc_method = method
        return self

    def log_prob(
        self, theta: Tensor, x: Optional[Tensor] = None, track_gradients: bool = False
    ) -> Tensor:
        r"""Returns the log-probability of theta under the posterior.

        Args:
            theta: Parameters $\theta$.
            track_gradients: Whether the returned tensor supports tracking gradients.
                This can be helpful for e.g. sensitivity analysis, but increases memory
                consumption.

        Returns:
            `len($\theta$)`-shaped log-probability.
        """
        warn(
            """`.log_prob()` is deprecated for methods that can only evaluate the
            log-probability up to a normalizing constant. Use `.potential()` instead."""
        )
        warn("The log-probability is unnormalized!")

        self.potential_fn.set_x(self._x_else_default_x(x))

        theta = ensure_theta_batched(torch.as_tensor(theta))
        return self.potential_fn(
            theta.to(self._device), track_gradients=track_gradients
        )

    def sample(
        self,
        sample_shape: Shape = torch.Size(),
        x: Optional[Tensor] = None,
        method: Optional[str] = None,
        thin: Optional[int] = None,
        warmup_steps: Optional[int] = None,
        num_chains: Optional[int] = None,
        init_strategy: Optional[str] = None,
        init_strategy_parameters: Optional[Dict[str, Any]] = None,
        init_strategy_num_candidates: Optional[int] = None,
        mcmc_parameters: Dict = {},
        mcmc_method: Optional[str] = None,
        sample_with: Optional[str] = None,
        num_workers: Optional[int] = None,
        show_progress_bars: bool = True,
    ) -> Tensor:
        r"""Return samples from posterior distribution $p(\theta|x)$ with MCMC.

        Check the `__init__()` method for a description of all arguments as well as
        their default values.

        Args:
            sample_shape: Desired shape of samples that are drawn from posterior. If
                sample_shape is multidimensional we simply draw `sample_shape.numel()`
                samples and then reshape into the desired shape.
            mcmc_parameters: Dictionary that is passed only to support the API of
                `sbi` v0.17.2 or older.
            mcmc_method: This argument only exists to keep backward-compatibility with
                `sbi` v0.17.2 or older. Please use `method` instead.
            sample_with: This argument only exists to keep backward-compatibility with
                `sbi` v0.17.2 or older. If it is set, we instantly raise an error.
            show_progress_bars: Whether to show sampling progress monitor.

        Returns:
            Samples from posterior.
        """
        self.potential_fn.set_x(self._x_else_default_x(x))

        # Replace arguments that were not passed with their default.
        method = self.method if method is None else method
        thin = self.thin if thin is None else thin
        warmup_steps = self.warmup_steps if warmup_steps is None else warmup_steps
        num_chains = self.num_chains if num_chains is None else num_chains
        init_strategy = self.init_strategy if init_strategy is None else init_strategy
        num_workers = self.num_workers if num_workers is None else num_workers
        init_strategy_parameters = (
            self.init_strategy_parameters
            if init_strategy_parameters is None
            else init_strategy_parameters
        )
        if init_strategy_num_candidates is not None:
            warn(
                """Passing `init_strategy_num_candidates` is deprecated as of sbi
                v0.19.0. Instead, use e.g.,
                `init_strategy_parameters={"num_candidate_samples": 1000}`"""
            )
            self.init_strategy_parameters[
                "num_candidate_samples"
            ] = init_strategy_num_candidates
        if sample_with is not None:
            raise ValueError(
                f"You set `sample_with={sample_with}`. As of sbi v0.18.0, setting "
                f"`sample_with` is no longer supported. You have to rerun "
                f"`.build_posterior(sample_with={sample_with}).`"
            )
        if mcmc_method is not None:
            warn(
                "You passed `mcmc_method` to `.sample()`. As of sbi v0.18.0, this "
                "is deprecated and will be removed in a future release. Use `method` "
                "instead of `mcmc_method`."
            )
            method = mcmc_method
        if mcmc_parameters:
            warn(
                "You passed `mcmc_parameters` to `.sample()`. As of sbi v0.18.0, this "
                "is deprecated and will be removed in a future release. Instead, pass "
                "the variable to `.sample()` directly, e.g. "
                "`posterior.sample((1,), num_chains=5)`."
            )
        # The following lines are only for backwards compatibility with sbi v0.17.2 or
        # older.
        m_p = mcmc_parameters  # define to shorten the variable name
        method = _maybe_use_dict_entry(method, "mcmc_method", m_p)
        thin = _maybe_use_dict_entry(thin, "thin", m_p)
        warmup_steps = _maybe_use_dict_entry(warmup_steps, "warmup_steps", m_p)
        num_chains = _maybe_use_dict_entry(num_chains, "num_chains", m_p)
        init_strategy = _maybe_use_dict_entry(init_strategy, "init_strategy", m_p)
        self.potential_ = self._prepare_potential(method)  # type: ignore

        initial_params = self._get_initial_params(
            init_strategy,  # type: ignore
            num_chains,  # type: ignore
            num_workers,
            show_progress_bars,
            **init_strategy_parameters,
        )
        num_samples = torch.Size(sample_shape).numel()

        track_gradients = method in ("hmc", "nuts")
        with torch.set_grad_enabled(track_gradients):
            if method in ("slice_np", "slice_np_vectorized"):
                transformed_samples = self._slice_np_mcmc(
                    num_samples=num_samples,
                    potential_function=self.potential_,
                    initial_params=initial_params,
                    thin=thin,  # type: ignore
                    warmup_steps=warmup_steps,  # type: ignore
                    vectorized=(method == "slice_np_vectorized"),
                    num_workers=num_workers,
                    show_progress_bars=show_progress_bars,
                )
            elif method in ("hmc", "nuts", "slice"):
                transformed_samples = self._pyro_mcmc(
                    num_samples=num_samples,
                    potential_function=self.potential_,
                    initial_params=initial_params,
                    mcmc_method=method,  # type: ignore
                    thin=thin,  # type: ignore
                    warmup_steps=warmup_steps,  # type: ignore
                    num_chains=num_chains,
                    show_progress_bars=show_progress_bars,
                )
            else:
                raise NameError

        samples = self.theta_transform.inv(transformed_samples)

        return samples.reshape((*sample_shape, -1))  # type: ignore

    def _build_mcmc_init_fn(
        self,
        proposal: Any,
        potential_fn: Callable,
        transform: torch_tf.Transform,
        init_strategy: str,
        **kwargs,
    ) -> Callable:
        """Return function that, when called, creates an initial parameter set for MCMC.

        Args:
            proposal: Proposal distribution.
            potential_fn: Potential function that the candidate samples are weighted
                with.
            init_strategy: Specifies the initialization method. Either of
                [`proposal`|`sir`|`resample`|`latest_sample`].
            kwargs: Passed on to init function. This way, init specific keywords can
                be set through `mcmc_parameters`. Unused arguments will be absorbed by
                the intitialization method.

        Returns: Initialization function.
        """
        if init_strategy == "proposal" or init_strategy == "prior":
            if init_strategy == "prior":
                warn(
                    "You set `init_strategy=prior`. As of sbi v0.18.0, this is "
                    "deprecated and it will be removed in a future release. Use "
                    "`init_strategy=proposal` instead."
                )
            return lambda: proposal_init(proposal, transform=transform, **kwargs)
        elif init_strategy == "sir":
            warn(
                "As of sbi v0.19.0, the behavior of the SIR initialization for MCMC "
                "has changed. If you wish to restore the behavior of sbi v0.18.0, set "
                "`init_strategy='resample'.`"
            )
            return lambda: sir_init(
                proposal, potential_fn, transform=transform, **kwargs
            )
        elif init_strategy == "resample":
            return lambda: resample_given_potential_fn(
                proposal, potential_fn, transform=transform, **kwargs
            )
        elif init_strategy == "latest_sample":
            latest_sample = IterateParameters(self._mcmc_init_params, **kwargs)
            return latest_sample
        else:
            raise NotImplementedError

    def _get_initial_params(
        self,
        init_strategy: str,
        num_chains: int,
        num_workers: int,
        show_progress_bars: bool,
        **kwargs,
    ) -> Tensor:
        """Return initial parameters for MCMC obtained with given init strategy.

        Parallelizes across CPU cores only for SIR.

        Args:
            init_strategy: Specifies the initialization method. Either of
                [`proposal`|`sir`|`resample`|`latest_sample`].
            num_chains: number of MCMC chains, generates initial params for each
            num_workers: number of CPU cores for parallization
            show_progress_bars: whether to show progress bars for SIR init
            kwargs: Passed on to `_build_mcmc_init_fn`.

        Returns:
            Tensor: initial parameters, one for each chain
        """
        # Build init function
        init_fn = self._build_mcmc_init_fn(
            self.proposal,
            self.potential_fn,
            transform=self.theta_transform,
            init_strategy=init_strategy,  # type: ignore
            **kwargs,
        )

        # Parallelize inits for resampling only.
        if num_workers > 1 and (init_strategy == "resample" or init_strategy == "sir"):

            def seeded_init_fn(seed):
                torch.manual_seed(seed)
                return init_fn()

            seeds = torch.randint(high=2**31, size=(num_chains,))

            # Generate initial params parallelized over num_workers.
            with tqdm_joblib(
                tqdm(
                    range(num_chains),  # type: ignore
                    disable=not show_progress_bars,
                    desc=f"""Generating {num_chains} MCMC inits with {num_workers}
                         workers.""",
                    total=num_chains,
                )
            ):
                initial_params = torch.cat(
                    Parallel(n_jobs=num_workers)(
                        delayed(seeded_init_fn)(seed) for seed in seeds
                    )
                )
        else:
            initial_params = torch.cat(
                [init_fn() for _ in range(num_chains)]  # type: ignore
            )

        return initial_params

    def _slice_np_mcmc(
        self,
        num_samples: int,
        potential_function: Callable,
        initial_params: Tensor,
        thin: int,
        warmup_steps: int,
        vectorized: bool = False,
        num_workers: int = 1,
        init_width: Union[float, ndarray] = 0.01,
        show_progress_bars: bool = True,
    ) -> Tensor:
        """Custom implementation of slice sampling using Numpy.

        Args:
            num_samples: Desired number of samples.
            potential_function: A callable **class**.
            initial_params: Initial parameters for MCMC chain.
            thin: Thinning (subsampling) factor.
            warmup_steps: Initial number of samples to discard.
            vectorized: Whether to use a vectorized implementation of the Slice sampler.
            num_workers: Number of CPU cores to use.
            init_width: Inital width of brackets.
            show_progress_bars: Whether to show a progressbar during sampling;
                can only be turned off for vectorized sampler.

        Returns:
            Tensor of shape (num_samples, shape_of_single_theta).
        """

        num_chains, dim_samples = initial_params.shape

        if not vectorized:
            SliceSamplerMultiChain = SliceSamplerSerial
        else:
            SliceSamplerMultiChain = SliceSamplerVectorized

        posterior_sampler = SliceSamplerMultiChain(
            init_params=tensor2numpy(initial_params),
            log_prob_fn=potential_function,
            num_chains=num_chains,
            thin=thin,
            verbose=show_progress_bars,
            num_workers=num_workers,
            init_width=init_width,
        )
        warmup_ = warmup_steps * thin
        num_samples_ = ceil((num_samples * thin) / num_chains)
        # Run mcmc including warmup
        samples = posterior_sampler.run(warmup_ + num_samples_)
        samples = samples[:, warmup_steps:, :]  # discard warmup steps
        samples = torch.from_numpy(samples)  # chains x samples x dim

        # Save posterior sampler.
        self._posterior_sampler = posterior_sampler

        # Save sample as potential next init (if init_strategy == 'latest_sample').
        self._mcmc_init_params = samples[:, -1, :].reshape(num_chains, dim_samples)

        # Collect samples from all chains.
        samples = samples.reshape(-1, dim_samples)[:num_samples, :]
        assert samples.shape[0] == num_samples

        return samples.type(torch.float32).to(self._device)

    def _pyro_mcmc(
        self,
        num_samples: int,
        potential_function: Callable,
        initial_params: Tensor,
        mcmc_method: str = "slice",
        thin: int = 10,
        warmup_steps: int = 200,
        num_chains: Optional[int] = 1,
        show_progress_bars: bool = True,
    ) -> Tensor:
        r"""Return samples obtained using Pyro HMC, NUTS for slice kernels.

        Args:
            num_samples: Desired number of samples.
            potential_function: A callable **class**. A class, but not a function,
                is picklable for Pyro MCMC to use it across chains in parallel,
                even when the potential function requires evaluating a neural network.
            mcmc_method: One of `hmc`, `nuts` or `slice`.
            thin: Thinning (subsampling) factor.
            warmup_steps: Initial number of samples to discard.
            num_chains: Whether to sample in parallel. If None, use all but one CPU.
            show_progress_bars: Whether to show a progressbar during sampling.

        Returns:
            Tensor of shape (num_samples, shape_of_single_theta).
        """
        num_chains = mp.cpu_count() - 1 if num_chains is None else num_chains

        kernels = dict(slice=Slice, hmc=HMC, nuts=NUTS)

        sampler = MCMC(
            kernel=kernels[mcmc_method](potential_fn=potential_function),
            num_samples=(thin * num_samples) // num_chains + num_chains,
            warmup_steps=warmup_steps,
            initial_params={self.param_name: initial_params},
            num_chains=num_chains,
            mp_context="spawn",
            disable_progbar=not show_progress_bars,
            transforms={},
        )
        sampler.run()
        samples = next(iter(sampler.get_samples().values())).reshape(
            -1, initial_params.shape[1]  # .shape[1] = dim of theta
        )

        # Save posterior sampler.
        self._posterior_sampler = sampler

        samples = samples[::thin][:num_samples]
        assert samples.shape[0] == num_samples

        return samples.detach()

    def _prepare_potential(self, method: str) -> Callable:
        """Combines potential and transform and takes care of gradients and pyro.

        Args:
            method: Which MCMC method to use.

        Returns:
            A potential function that is ready to be used in MCMC.
        """
        if method == "slice":
            track_gradients = False
            pyro = True
        elif method in ("hmc", "nuts"):
            track_gradients = True
            pyro = True
        elif "slice_np" in method:
            track_gradients = False
            pyro = False
        else:
            raise NotImplementedError

        prepared_potential = partial(
            transformed_potential,
            potential_fn=self.potential_fn,
            theta_transform=self.theta_transform,
            device=self._device,
            track_gradients=track_gradients,
        )
        if pyro:
            prepared_potential = partial(
                pyro_potential_wrapper, potential=prepared_potential
            )

        return prepared_potential

    def map(
        self,
        x: Optional[Tensor] = None,
        num_iter: int = 1_000,
        num_to_optimize: int = 100,
        learning_rate: float = 0.01,
        init_method: Union[str, Tensor] = "proposal",
        num_init_samples: int = 1_000,
        save_best_every: int = 10,
        show_progress_bars: bool = False,
        force_update: bool = False,
    ) -> Tensor:
        r"""Returns the maximum-a-posteriori estimate (MAP).

        The method can be interrupted (Ctrl-C) when the user sees that the
        log-probability converges. The best estimate will be saved in `self._map` and
        can be accessed with `self.map()`. The MAP is obtained by running gradient
        ascent from a given number of starting positions (samples from the posterior
        with the highest log-probability). After the optimization is done, we select the
        parameter set that has the highest log-probability after the optimization.

        Warning: The default values used by this function are not well-tested. They
        might require hand-tuning for the problem at hand.

        For developers: if the prior is a `BoxUniform`, we carry out the optimization
        in unbounded space and transform the result back into bounded space.

        Args:
            x: Deprecated - use `.set_default_x()` prior to `.map()`.
            num_iter: Number of optimization steps that the algorithm takes
                to find the MAP.
            learning_rate: Learning rate of the optimizer.
            init_method: How to select the starting parameters for the optimization. If
                it is a string, it can be either [`posterior`, `prior`], which samples
                the respective distribution `num_init_samples` times. If it is a
                tensor, the tensor will be used as init locations.
            num_init_samples: Draw this number of samples from the posterior and
                evaluate the log-probability of all of them.
            num_to_optimize: From the drawn `num_init_samples`, use the
                `num_to_optimize` with highest log-probability as the initial points
                for the optimization.
            save_best_every: The best log-probability is computed, saved in the
                `map`-attribute, and printed every `save_best_every`-th iteration.
                Computing the best log-probability creates a significant overhead
                (thus, the default is `10`.)
            show_progress_bars: Whether to show a progressbar during sampling from
                the posterior.
            force_update: Whether to re-calculate the MAP when x is unchanged and
                have a cached value.
            log_prob_kwargs: Will be empty for SNLE and SNRE. Will contain
                {'norm_posterior': True} for SNPE.

        Returns:
            The MAP estimate.
        """
        return super().map(
            x=x,
            num_iter=num_iter,
            num_to_optimize=num_to_optimize,
            learning_rate=learning_rate,
            init_method=init_method,
            num_init_samples=num_init_samples,
            save_best_every=save_best_every,
            show_progress_bars=show_progress_bars,
            force_update=force_update,
        )

    def get_arviz_inference_data(self) -> InferenceData:
        """Returns arviz InferenceData object constructed most recent samples.

        Note: the InferenceData is constructed using the posterior samples generated in
        most recent call to `.sample(...)`.

        For Pyro HMC and NUTS kernels InferenceData will contain diagnostics, for Pyro
        Slice or sbi slice sampling samples, only the samples are added.

        Returns:
            inference_data: Arviz InferenceData object.
        """
        assert (
            self._posterior_sampler is not None
        ), """No samples have been generated, call .sample() first."""

        sampler: Union[
            MCMC, SliceSamplerSerial, SliceSamplerVectorized
        ] = self._posterior_sampler

        # If Pyro sampler and samples not transformed, use arviz' from_pyro.
        # Exclude 'slice' kernel as it lacks the 'divergence' diagnostics key.
        if isinstance(self._posterior_sampler, (HMC, NUTS)) and isinstance(
            self.theta_transform, torch_tf.IndependentTransform
        ):
            inference_data = az.from_pyro(sampler)

        # otherwise get samples from sampler and transform to original space.
        else:
            transformed_samples = sampler.get_samples(group_by_chain=True)
            # Pyro samplers returns dicts, get values.
            if isinstance(transformed_samples, Dict):
                # popitem gets last items, [1] get the values as tensor.
                transformed_samples = transformed_samples.popitem()[1]
            # Our slice samplers return numpy arrays.
            elif isinstance(transformed_samples, ndarray):
                transformed_samples = torch.from_numpy(transformed_samples).type(
                    torch.float32
                )
            # For MultipleIndependent priors transforms first dim must be batch dim.
            # thus, reshape back and forth to have batch dim in front.
            samples_shape = transformed_samples.shape
            samples = self.theta_transform.inv(  # type: ignore
                transformed_samples.reshape(-1, samples_shape[-1])
            ).reshape(  # type: ignore
                *samples_shape
            )

            inference_data = az.convert_to_inference_data(
                {f"{self.param_name}": samples}
            )

        return inference_data

mcmc_method: str property writable

Returns MCMC method.

posterior_sampler property

Returns sampler created by sample.

__init__(potential_fn, proposal, theta_transform=None, method='slice_np', thin=10, warmup_steps=10, num_chains=1, init_strategy='resample', init_strategy_parameters={}, init_strategy_num_candidates=None, num_workers=1, device=None, x_shape=None)

Parameters:

Name Type Description Default
potential_fn Callable

The potential function from which to draw samples.

required
proposal Any

Proposal distribution that is used to initialize the MCMC chain.

required
theta_transform Optional[TorchTransform]

Transformation that will be applied during sampling. Allows to perform MCMC in unconstrained space.

None
method str

Method used for MCMC sampling, one of slice_np, slice_np_vectorized, slice, hmc, nuts. slice_np is a custom numpy implementation of slice sampling. slice_np_vectorized is identical to slice_np, but if num_chains>1, the chains are vectorized for slice_np_vectorized whereas they are run sequentially for slice_np. The samplers hmc, nuts or slice sample with Pyro.

'slice_np'
thin int

The thinning factor for the chain.

10
warmup_steps int

The initial number of samples to discard.

10
num_chains int

The number of chains.

1
init_strategy str

The initialisation strategy for chains; proposal will draw init locations from proposal, whereas sir will use Sequential- Importance-Resampling (SIR). SIR initially samples init_strategy_num_candidates from the proposal, evaluates all of them under the potential_fn and proposal, and then resamples the initial locations with weights proportional to exp(potential_fn - proposal.log_prob. resample is the same as sir but uses exp(potential_fn) as weights.

'resample'
init_strategy_parameters Dict[str, Any]

Dictionary of keyword arguments passed to the init strategy, e.g., for init_strategy=sir this could be num_candidate_samples, i.e., the number of candidates to to find init locations (internal default is 1000), or device.

{}
init_strategy_num_candidates Optional[int]

Number of candidates to to find init locations in init_strategy=sir (deprecated, use init_strategy_parameters instead).

None
num_workers int

number of cpu cores used to parallelize mcmc

1
device Optional[str]

Training device, e.g., “cpu”, “cuda” or “cuda:0”. If None, potential_fn.device is used.

None
x_shape Optional[Size]

Shape of a single simulator output. If passed, it is used to check the shape of the observed data and give a descriptive error.

None
Source code in sbi/inference/posteriors/mcmc_posterior.py
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
def __init__(
    self,
    potential_fn: Callable,
    proposal: Any,
    theta_transform: Optional[TorchTransform] = None,
    method: str = "slice_np",
    thin: int = 10,
    warmup_steps: int = 10,
    num_chains: int = 1,
    init_strategy: str = "resample",
    init_strategy_parameters: Dict[str, Any] = {},
    init_strategy_num_candidates: Optional[int] = None,
    num_workers: int = 1,
    device: Optional[str] = None,
    x_shape: Optional[torch.Size] = None,
):
    """
    Args:
        potential_fn: The potential function from which to draw samples.
        proposal: Proposal distribution that is used to initialize the MCMC chain.
        theta_transform: Transformation that will be applied during sampling.
            Allows to perform MCMC in unconstrained space.
        method: Method used for MCMC sampling, one of `slice_np`,
            `slice_np_vectorized`, `slice`, `hmc`, `nuts`. `slice_np` is a custom
            numpy implementation of slice sampling. `slice_np_vectorized` is
            identical to `slice_np`, but if `num_chains>1`, the chains are
            vectorized for `slice_np_vectorized` whereas they are run sequentially
            for `slice_np`. The samplers `hmc`, `nuts` or `slice` sample with Pyro.
        thin: The thinning factor for the chain.
        warmup_steps: The initial number of samples to discard.
        num_chains: The number of chains.
        init_strategy: The initialisation strategy for chains; `proposal` will draw
            init locations from `proposal`, whereas `sir` will use Sequential-
            Importance-Resampling (SIR). SIR initially samples
            `init_strategy_num_candidates` from the `proposal`, evaluates all of
            them under the `potential_fn` and `proposal`, and then resamples the
            initial locations with weights proportional to `exp(potential_fn -
            proposal.log_prob`. `resample` is the same as `sir` but
            uses `exp(potential_fn)` as weights.
        init_strategy_parameters: Dictionary of keyword arguments passed to the
            init strategy, e.g., for `init_strategy=sir` this could be
            `num_candidate_samples`, i.e., the number of candidates to to find init
            locations (internal default is `1000`), or `device`.
        init_strategy_num_candidates: Number of candidates to to find init
             locations in `init_strategy=sir` (deprecated, use
             init_strategy_parameters instead).
        num_workers: number of cpu cores used to parallelize mcmc
        device: Training device, e.g., "cpu", "cuda" or "cuda:0". If None,
            `potential_fn.device` is used.
        x_shape: Shape of a single simulator output. If passed, it is used to check
            the shape of the observed data and give a descriptive error.
    """

    super().__init__(
        potential_fn,
        theta_transform=theta_transform,
        device=device,
        x_shape=x_shape,
    )

    self.proposal = proposal
    self.method = method
    self.thin = thin
    self.warmup_steps = warmup_steps
    self.num_chains = num_chains
    self.init_strategy = init_strategy
    self.init_strategy_parameters = init_strategy_parameters
    self.num_workers = num_workers
    self._posterior_sampler = None
    # Hardcode parameter name to reduce clutter kwargs.
    self.param_name = "theta"

    if init_strategy_num_candidates is not None:
        warn(
            """Passing `init_strategy_num_candidates` is deprecated as of sbi
            v0.19.0. Instead, use e.g.,
            `init_strategy_parameters={"num_candidate_samples": 1000}`"""
        )
        self.init_strategy_parameters[
            "num_candidate_samples"
        ] = init_strategy_num_candidates

    self.potential_ = self._prepare_potential(method)

    self._purpose = (
        "It provides MCMC to .sample() from the posterior and "
        "can evaluate the _unnormalized_ posterior density with .log_prob()."
    )

get_arviz_inference_data()

Returns arviz InferenceData object constructed most recent samples.

Note: the InferenceData is constructed using the posterior samples generated in most recent call to .sample(...).

For Pyro HMC and NUTS kernels InferenceData will contain diagnostics, for Pyro Slice or sbi slice sampling samples, only the samples are added.

Returns:

Name Type Description
inference_data InferenceData

Arviz InferenceData object.

Source code in sbi/inference/posteriors/mcmc_posterior.py
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
def get_arviz_inference_data(self) -> InferenceData:
    """Returns arviz InferenceData object constructed most recent samples.

    Note: the InferenceData is constructed using the posterior samples generated in
    most recent call to `.sample(...)`.

    For Pyro HMC and NUTS kernels InferenceData will contain diagnostics, for Pyro
    Slice or sbi slice sampling samples, only the samples are added.

    Returns:
        inference_data: Arviz InferenceData object.
    """
    assert (
        self._posterior_sampler is not None
    ), """No samples have been generated, call .sample() first."""

    sampler: Union[
        MCMC, SliceSamplerSerial, SliceSamplerVectorized
    ] = self._posterior_sampler

    # If Pyro sampler and samples not transformed, use arviz' from_pyro.
    # Exclude 'slice' kernel as it lacks the 'divergence' diagnostics key.
    if isinstance(self._posterior_sampler, (HMC, NUTS)) and isinstance(
        self.theta_transform, torch_tf.IndependentTransform
    ):
        inference_data = az.from_pyro(sampler)

    # otherwise get samples from sampler and transform to original space.
    else:
        transformed_samples = sampler.get_samples(group_by_chain=True)
        # Pyro samplers returns dicts, get values.
        if isinstance(transformed_samples, Dict):
            # popitem gets last items, [1] get the values as tensor.
            transformed_samples = transformed_samples.popitem()[1]
        # Our slice samplers return numpy arrays.
        elif isinstance(transformed_samples, ndarray):
            transformed_samples = torch.from_numpy(transformed_samples).type(
                torch.float32
            )
        # For MultipleIndependent priors transforms first dim must be batch dim.
        # thus, reshape back and forth to have batch dim in front.
        samples_shape = transformed_samples.shape
        samples = self.theta_transform.inv(  # type: ignore
            transformed_samples.reshape(-1, samples_shape[-1])
        ).reshape(  # type: ignore
            *samples_shape
        )

        inference_data = az.convert_to_inference_data(
            {f"{self.param_name}": samples}
        )

    return inference_data

log_prob(theta, x=None, track_gradients=False)

Returns the log-probability of theta under the posterior.

Parameters:

Name Type Description Default
theta Tensor

Parameters \(\theta\).

required
track_gradients bool

Whether the returned tensor supports tracking gradients. This can be helpful for e.g. sensitivity analysis, but increases memory consumption.

False

Returns:

Type Description
Tensor

len($\theta$)-shaped log-probability.

Source code in sbi/inference/posteriors/mcmc_posterior.py
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
def log_prob(
    self, theta: Tensor, x: Optional[Tensor] = None, track_gradients: bool = False
) -> Tensor:
    r"""Returns the log-probability of theta under the posterior.

    Args:
        theta: Parameters $\theta$.
        track_gradients: Whether the returned tensor supports tracking gradients.
            This can be helpful for e.g. sensitivity analysis, but increases memory
            consumption.

    Returns:
        `len($\theta$)`-shaped log-probability.
    """
    warn(
        """`.log_prob()` is deprecated for methods that can only evaluate the
        log-probability up to a normalizing constant. Use `.potential()` instead."""
    )
    warn("The log-probability is unnormalized!")

    self.potential_fn.set_x(self._x_else_default_x(x))

    theta = ensure_theta_batched(torch.as_tensor(theta))
    return self.potential_fn(
        theta.to(self._device), track_gradients=track_gradients
    )

map(x=None, num_iter=1000, num_to_optimize=100, learning_rate=0.01, init_method='proposal', num_init_samples=1000, save_best_every=10, show_progress_bars=False, force_update=False)

Returns the maximum-a-posteriori estimate (MAP).

The method can be interrupted (Ctrl-C) when the user sees that the log-probability converges. The best estimate will be saved in self._map and can be accessed with self.map(). The MAP is obtained by running gradient ascent from a given number of starting positions (samples from the posterior with the highest log-probability). After the optimization is done, we select the parameter set that has the highest log-probability after the optimization.

Warning: The default values used by this function are not well-tested. They might require hand-tuning for the problem at hand.

For developers: if the prior is a BoxUniform, we carry out the optimization in unbounded space and transform the result back into bounded space.

Parameters:

Name Type Description Default
x Optional[Tensor]

Deprecated - use .set_default_x() prior to .map().

None
num_iter int

Number of optimization steps that the algorithm takes to find the MAP.

1000
learning_rate float

Learning rate of the optimizer.

0.01
init_method Union[str, Tensor]

How to select the starting parameters for the optimization. If it is a string, it can be either [posterior, prior], which samples the respective distribution num_init_samples times. If it is a tensor, the tensor will be used as init locations.

'proposal'
num_init_samples int

Draw this number of samples from the posterior and evaluate the log-probability of all of them.

1000
num_to_optimize int

From the drawn num_init_samples, use the num_to_optimize with highest log-probability as the initial points for the optimization.

100
save_best_every int

The best log-probability is computed, saved in the map-attribute, and printed every save_best_every-th iteration. Computing the best log-probability creates a significant overhead (thus, the default is 10.)

10
show_progress_bars bool

Whether to show a progressbar during sampling from the posterior.

False
force_update bool

Whether to re-calculate the MAP when x is unchanged and have a cached value.

False
log_prob_kwargs

Will be empty for SNLE and SNRE. Will contain {‘norm_posterior’: True} for SNPE.

required

Returns:

Type Description
Tensor

The MAP estimate.

Source code in sbi/inference/posteriors/mcmc_posterior.py
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
def map(
    self,
    x: Optional[Tensor] = None,
    num_iter: int = 1_000,
    num_to_optimize: int = 100,
    learning_rate: float = 0.01,
    init_method: Union[str, Tensor] = "proposal",
    num_init_samples: int = 1_000,
    save_best_every: int = 10,
    show_progress_bars: bool = False,
    force_update: bool = False,
) -> Tensor:
    r"""Returns the maximum-a-posteriori estimate (MAP).

    The method can be interrupted (Ctrl-C) when the user sees that the
    log-probability converges. The best estimate will be saved in `self._map` and
    can be accessed with `self.map()`. The MAP is obtained by running gradient
    ascent from a given number of starting positions (samples from the posterior
    with the highest log-probability). After the optimization is done, we select the
    parameter set that has the highest log-probability after the optimization.

    Warning: The default values used by this function are not well-tested. They
    might require hand-tuning for the problem at hand.

    For developers: if the prior is a `BoxUniform`, we carry out the optimization
    in unbounded space and transform the result back into bounded space.

    Args:
        x: Deprecated - use `.set_default_x()` prior to `.map()`.
        num_iter: Number of optimization steps that the algorithm takes
            to find the MAP.
        learning_rate: Learning rate of the optimizer.
        init_method: How to select the starting parameters for the optimization. If
            it is a string, it can be either [`posterior`, `prior`], which samples
            the respective distribution `num_init_samples` times. If it is a
            tensor, the tensor will be used as init locations.
        num_init_samples: Draw this number of samples from the posterior and
            evaluate the log-probability of all of them.
        num_to_optimize: From the drawn `num_init_samples`, use the
            `num_to_optimize` with highest log-probability as the initial points
            for the optimization.
        save_best_every: The best log-probability is computed, saved in the
            `map`-attribute, and printed every `save_best_every`-th iteration.
            Computing the best log-probability creates a significant overhead
            (thus, the default is `10`.)
        show_progress_bars: Whether to show a progressbar during sampling from
            the posterior.
        force_update: Whether to re-calculate the MAP when x is unchanged and
            have a cached value.
        log_prob_kwargs: Will be empty for SNLE and SNRE. Will contain
            {'norm_posterior': True} for SNPE.

    Returns:
        The MAP estimate.
    """
    return super().map(
        x=x,
        num_iter=num_iter,
        num_to_optimize=num_to_optimize,
        learning_rate=learning_rate,
        init_method=init_method,
        num_init_samples=num_init_samples,
        save_best_every=save_best_every,
        show_progress_bars=show_progress_bars,
        force_update=force_update,
    )

sample(sample_shape=torch.Size(), x=None, method=None, thin=None, warmup_steps=None, num_chains=None, init_strategy=None, init_strategy_parameters=None, init_strategy_num_candidates=None, mcmc_parameters={}, mcmc_method=None, sample_with=None, num_workers=None, show_progress_bars=True)

Return samples from posterior distribution \(p(\theta|x)\) with MCMC.

Check the __init__() method for a description of all arguments as well as their default values.

Parameters:

Name Type Description Default
sample_shape Shape

Desired shape of samples that are drawn from posterior. If sample_shape is multidimensional we simply draw sample_shape.numel() samples and then reshape into the desired shape.

Size()
mcmc_parameters Dict

Dictionary that is passed only to support the API of sbi v0.17.2 or older.

{}
mcmc_method Optional[str]

This argument only exists to keep backward-compatibility with sbi v0.17.2 or older. Please use method instead.

None
sample_with Optional[str]

This argument only exists to keep backward-compatibility with sbi v0.17.2 or older. If it is set, we instantly raise an error.

None
show_progress_bars bool

Whether to show sampling progress monitor.

True

Returns:

Type Description
Tensor

Samples from posterior.

Source code in sbi/inference/posteriors/mcmc_posterior.py
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
def sample(
    self,
    sample_shape: Shape = torch.Size(),
    x: Optional[Tensor] = None,
    method: Optional[str] = None,
    thin: Optional[int] = None,
    warmup_steps: Optional[int] = None,
    num_chains: Optional[int] = None,
    init_strategy: Optional[str] = None,
    init_strategy_parameters: Optional[Dict[str, Any]] = None,
    init_strategy_num_candidates: Optional[int] = None,
    mcmc_parameters: Dict = {},
    mcmc_method: Optional[str] = None,
    sample_with: Optional[str] = None,
    num_workers: Optional[int] = None,
    show_progress_bars: bool = True,
) -> Tensor:
    r"""Return samples from posterior distribution $p(\theta|x)$ with MCMC.

    Check the `__init__()` method for a description of all arguments as well as
    their default values.

    Args:
        sample_shape: Desired shape of samples that are drawn from posterior. If
            sample_shape is multidimensional we simply draw `sample_shape.numel()`
            samples and then reshape into the desired shape.
        mcmc_parameters: Dictionary that is passed only to support the API of
            `sbi` v0.17.2 or older.
        mcmc_method: This argument only exists to keep backward-compatibility with
            `sbi` v0.17.2 or older. Please use `method` instead.
        sample_with: This argument only exists to keep backward-compatibility with
            `sbi` v0.17.2 or older. If it is set, we instantly raise an error.
        show_progress_bars: Whether to show sampling progress monitor.

    Returns:
        Samples from posterior.
    """
    self.potential_fn.set_x(self._x_else_default_x(x))

    # Replace arguments that were not passed with their default.
    method = self.method if method is None else method
    thin = self.thin if thin is None else thin
    warmup_steps = self.warmup_steps if warmup_steps is None else warmup_steps
    num_chains = self.num_chains if num_chains is None else num_chains
    init_strategy = self.init_strategy if init_strategy is None else init_strategy
    num_workers = self.num_workers if num_workers is None else num_workers
    init_strategy_parameters = (
        self.init_strategy_parameters
        if init_strategy_parameters is None
        else init_strategy_parameters
    )
    if init_strategy_num_candidates is not None:
        warn(
            """Passing `init_strategy_num_candidates` is deprecated as of sbi
            v0.19.0. Instead, use e.g.,
            `init_strategy_parameters={"num_candidate_samples": 1000}`"""
        )
        self.init_strategy_parameters[
            "num_candidate_samples"
        ] = init_strategy_num_candidates
    if sample_with is not None:
        raise ValueError(
            f"You set `sample_with={sample_with}`. As of sbi v0.18.0, setting "
            f"`sample_with` is no longer supported. You have to rerun "
            f"`.build_posterior(sample_with={sample_with}).`"
        )
    if mcmc_method is not None:
        warn(
            "You passed `mcmc_method` to `.sample()`. As of sbi v0.18.0, this "
            "is deprecated and will be removed in a future release. Use `method` "
            "instead of `mcmc_method`."
        )
        method = mcmc_method
    if mcmc_parameters:
        warn(
            "You passed `mcmc_parameters` to `.sample()`. As of sbi v0.18.0, this "
            "is deprecated and will be removed in a future release. Instead, pass "
            "the variable to `.sample()` directly, e.g. "
            "`posterior.sample((1,), num_chains=5)`."
        )
    # The following lines are only for backwards compatibility with sbi v0.17.2 or
    # older.
    m_p = mcmc_parameters  # define to shorten the variable name
    method = _maybe_use_dict_entry(method, "mcmc_method", m_p)
    thin = _maybe_use_dict_entry(thin, "thin", m_p)
    warmup_steps = _maybe_use_dict_entry(warmup_steps, "warmup_steps", m_p)
    num_chains = _maybe_use_dict_entry(num_chains, "num_chains", m_p)
    init_strategy = _maybe_use_dict_entry(init_strategy, "init_strategy", m_p)
    self.potential_ = self._prepare_potential(method)  # type: ignore

    initial_params = self._get_initial_params(
        init_strategy,  # type: ignore
        num_chains,  # type: ignore
        num_workers,
        show_progress_bars,
        **init_strategy_parameters,
    )
    num_samples = torch.Size(sample_shape).numel()

    track_gradients = method in ("hmc", "nuts")
    with torch.set_grad_enabled(track_gradients):
        if method in ("slice_np", "slice_np_vectorized"):
            transformed_samples = self._slice_np_mcmc(
                num_samples=num_samples,
                potential_function=self.potential_,
                initial_params=initial_params,
                thin=thin,  # type: ignore
                warmup_steps=warmup_steps,  # type: ignore
                vectorized=(method == "slice_np_vectorized"),
                num_workers=num_workers,
                show_progress_bars=show_progress_bars,
            )
        elif method in ("hmc", "nuts", "slice"):
            transformed_samples = self._pyro_mcmc(
                num_samples=num_samples,
                potential_function=self.potential_,
                initial_params=initial_params,
                mcmc_method=method,  # type: ignore
                thin=thin,  # type: ignore
                warmup_steps=warmup_steps,  # type: ignore
                num_chains=num_chains,
                show_progress_bars=show_progress_bars,
            )
        else:
            raise NameError

    samples = self.theta_transform.inv(transformed_samples)

    return samples.reshape((*sample_shape, -1))  # type: ignore

set_mcmc_method(method)

Sets sampling method to for MCMC and returns NeuralPosterior.

Parameters:

Name Type Description Default
method str

Method to use.

required

Returns:

Type Description
NeuralPosterior

NeuralPosterior for chainable calls.

Source code in sbi/inference/posteriors/mcmc_posterior.py
146
147
148
149
150
151
152
153
154
155
156
def set_mcmc_method(self, method: str) -> "NeuralPosterior":
    """Sets sampling method to for MCMC and returns `NeuralPosterior`.

    Args:
        method: Method to use.

    Returns:
        `NeuralPosterior` for chainable calls.
    """
    self._mcmc_method = method
    return self

Bases: NeuralPosterior

Provides rejection sampling to sample from the posterior.

SNLE or SNRE train neural networks to approximate the likelihood(-ratios). RejectionPosterior allows to sample from the posterior with rejection sampling.

Source code in sbi/inference/posteriors/rejection_posterior.py
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
class RejectionPosterior(NeuralPosterior):
    r"""Provides rejection sampling to sample from the posterior.<br/><br/>
    SNLE or SNRE train neural networks to approximate the likelihood(-ratios).
    `RejectionPosterior` allows to sample from the posterior with rejection sampling.
    """

    def __init__(
        self,
        potential_fn: Callable,
        proposal: Any,
        theta_transform: Optional[TorchTransform] = None,
        max_sampling_batch_size: int = 10_000,
        num_samples_to_find_max: int = 10_000,
        num_iter_to_find_max: int = 100,
        m: float = 1.2,
        device: Optional[str] = None,
        x_shape: Optional[torch.Size] = None,
    ):
        """
        Args:
            potential_fn: The potential function from which to draw samples.
            proposal: The proposal distribution.
            theta_transform: Transformation that is applied to parameters. Is not used
                during but only when calling `.map()`.
            max_sampling_batch_size: The batchsize of samples being drawn from
                the proposal at every iteration.
            num_samples_to_find_max: The number of samples that are used to find the
                maximum of the `potential_fn / proposal` ratio.
            num_iter_to_find_max: The number of gradient ascent iterations to find the
                maximum of the `potential_fn / proposal` ratio.
            m: Multiplier to the `potential_fn / proposal` ratio.
            device: Training device, e.g., "cpu", "cuda" or "cuda:0". If None,
                `potential_fn.device` is used.
            x_shape: Shape of a single simulator output. If passed, it is used to check
                the shape of the observed data and give a descriptive error.
        """
        super().__init__(
            potential_fn,
            theta_transform=theta_transform,
            device=device,
            x_shape=x_shape,
        )

        self.proposal = proposal
        self.max_sampling_batch_size = max_sampling_batch_size
        self.num_samples_to_find_max = num_samples_to_find_max
        self.num_iter_to_find_max = num_iter_to_find_max
        self.m = m

        self._purpose = (
            "It provides rejection sampling to .sample() from the posterior and "
            "can evaluate the _unnormalized_ posterior density with .log_prob()."
        )

    def log_prob(
        self, theta: Tensor, x: Optional[Tensor] = None, track_gradients: bool = False
    ) -> Tensor:
        r"""Returns the log-probability of theta under the posterior.

        Args:
            theta: Parameters $\theta$.
            track_gradients: Whether the returned tensor supports tracking gradients.
                This can be helpful for e.g. sensitivity analysis, but increases memory
                consumption.

        Returns:
            `len($\theta$)`-shaped log-probability.
        """
        warn(
            "`.log_prob()` is deprecated for methods that can only evaluate the log-probability up to a normalizing constant. Use `.potential()` instead."
        )
        warn("The log-probability is unnormalized!")

        self.potential_fn.set_x(self._x_else_default_x(x))

        theta = ensure_theta_batched(torch.as_tensor(theta))
        return self.potential_fn(
            theta.to(self._device), track_gradients=track_gradients
        )

    def sample(
        self,
        sample_shape: Shape = torch.Size(),
        x: Optional[Tensor] = None,
        max_sampling_batch_size: Optional[int] = None,
        num_samples_to_find_max: Optional[int] = None,
        num_iter_to_find_max: Optional[int] = None,
        m: Optional[float] = None,
        sample_with: Optional[str] = None,
        show_progress_bars: bool = True,
    ):
        r"""Return samples from posterior $p(\theta|x)$ via rejection sampling.

        Args:
            sample_shape: Desired shape of samples that are drawn from posterior. If
                sample_shape is multidimensional we simply draw `sample_shape.numel()`
                samples and then reshape into the desired shape.
            sample_with: This argument only exists to keep backward-compatibility with
                `sbi` v0.17.2 or older. If it is set, we instantly raise an error.
            show_progress_bars: Whether to show sampling progress monitor.

        Returns:
            Samples from posterior.
        """
        num_samples = torch.Size(sample_shape).numel()
        self.potential_fn.set_x(self._x_else_default_x(x))

        potential = partial(self.potential_fn, track_gradients=True)

        if sample_with is not None:
            raise ValueError(
                f"You set `sample_with={sample_with}`. As of sbi v0.18.0, setting "
                f"`sample_with` is no longer supported. You have to rerun "
                f"`.build_posterior(sample_with={sample_with}).`"
            )
        # Replace arguments that were not passed with their default.
        max_sampling_batch_size = (
            self.max_sampling_batch_size
            if max_sampling_batch_size is None
            else max_sampling_batch_size
        )
        num_samples_to_find_max = (
            self.num_samples_to_find_max
            if num_samples_to_find_max is None
            else num_samples_to_find_max
        )
        num_iter_to_find_max = (
            self.num_iter_to_find_max
            if num_iter_to_find_max is None
            else num_iter_to_find_max
        )
        m = self.m if m is None else m

        samples, _ = rejection_sample(
            potential,
            proposal=self.proposal,
            num_samples=num_samples,
            show_progress_bars=show_progress_bars,
            warn_acceptance=0.01,
            max_sampling_batch_size=max_sampling_batch_size,
            num_samples_to_find_max=num_samples_to_find_max,
            num_iter_to_find_max=num_iter_to_find_max,
            m=m,
            device=self._device,
        )

        return samples.reshape((*sample_shape, -1))

    def map(
        self,
        x: Optional[Tensor] = None,
        num_iter: int = 1_000,
        num_to_optimize: int = 100,
        learning_rate: float = 0.01,
        init_method: Union[str, Tensor] = "proposal",
        num_init_samples: int = 1_000,
        save_best_every: int = 10,
        show_progress_bars: bool = False,
        force_update: bool = False,
    ) -> Tensor:
        r"""Returns the maximum-a-posteriori estimate (MAP).

        The method can be interrupted (Ctrl-C) when the user sees that the
        log-probability converges. The best estimate will be saved in `self._map` and
        can be accessed with `self.map()`. The MAP is obtained by running gradient
        ascent from a given number of starting positions (samples from the posterior
        with the highest log-probability). After the optimization is done, we select the
        parameter set that has the highest log-probability after the optimization.

        Warning: The default values used by this function are not well-tested. They
        might require hand-tuning for the problem at hand.

        For developers: if the prior is a `BoxUniform`, we carry out the optimization
        in unbounded space and transform the result back into bounded space.

        Args:
            x: Deprecated - use `.set_default_x()` prior to `.map()`.
            num_iter: Number of optimization steps that the algorithm takes
                to find the MAP.
            learning_rate: Learning rate of the optimizer.
            init_method: How to select the starting parameters for the optimization. If
                it is a string, it can be either [`posterior`, `prior`], which samples
                the respective distribution `num_init_samples` times. If it is a
                tensor, the tensor will be used as init locations.
            num_init_samples: Draw this number of samples from the posterior and
                evaluate the log-probability of all of them.
            num_to_optimize: From the drawn `num_init_samples`, use the
                `num_to_optimize` with highest log-probability as the initial points
                for the optimization.
            save_best_every: The best log-probability is computed, saved in the
                `map`-attribute, and printed every `save_best_every`-th iteration.
                Computing the best log-probability creates a significant overhead
                (thus, the default is `10`.)
            show_progress_bars: Whether to show a progressbar during sampling from
                the posterior.
            force_update: Whether to re-calculate the MAP when x is unchanged and
                have a cached value.
            log_prob_kwargs: Will be empty for SNLE and SNRE. Will contain
                {'norm_posterior': True} for SNPE.

        Returns:
            The MAP estimate.
        """
        return super().map(
            x=x,
            num_iter=num_iter,
            num_to_optimize=num_to_optimize,
            learning_rate=learning_rate,
            init_method=init_method,
            num_init_samples=num_init_samples,
            save_best_every=save_best_every,
            show_progress_bars=show_progress_bars,
            force_update=force_update,
        )

__init__(potential_fn, proposal, theta_transform=None, max_sampling_batch_size=10000, num_samples_to_find_max=10000, num_iter_to_find_max=100, m=1.2, device=None, x_shape=None)

Parameters:

Name Type Description Default
potential_fn Callable

The potential function from which to draw samples.

required
proposal Any

The proposal distribution.

required
theta_transform Optional[TorchTransform]

Transformation that is applied to parameters. Is not used during but only when calling .map().

None
max_sampling_batch_size int

The batchsize of samples being drawn from the proposal at every iteration.

10000
num_samples_to_find_max int

The number of samples that are used to find the maximum of the potential_fn / proposal ratio.

10000
num_iter_to_find_max int

The number of gradient ascent iterations to find the maximum of the potential_fn / proposal ratio.

100
m float

Multiplier to the potential_fn / proposal ratio.

1.2
device Optional[str]

Training device, e.g., “cpu”, “cuda” or “cuda:0”. If None, potential_fn.device is used.

None
x_shape Optional[Size]

Shape of a single simulator output. If passed, it is used to check the shape of the observed data and give a descriptive error.

None
Source code in sbi/inference/posteriors/rejection_posterior.py
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
def __init__(
    self,
    potential_fn: Callable,
    proposal: Any,
    theta_transform: Optional[TorchTransform] = None,
    max_sampling_batch_size: int = 10_000,
    num_samples_to_find_max: int = 10_000,
    num_iter_to_find_max: int = 100,
    m: float = 1.2,
    device: Optional[str] = None,
    x_shape: Optional[torch.Size] = None,
):
    """
    Args:
        potential_fn: The potential function from which to draw samples.
        proposal: The proposal distribution.
        theta_transform: Transformation that is applied to parameters. Is not used
            during but only when calling `.map()`.
        max_sampling_batch_size: The batchsize of samples being drawn from
            the proposal at every iteration.
        num_samples_to_find_max: The number of samples that are used to find the
            maximum of the `potential_fn / proposal` ratio.
        num_iter_to_find_max: The number of gradient ascent iterations to find the
            maximum of the `potential_fn / proposal` ratio.
        m: Multiplier to the `potential_fn / proposal` ratio.
        device: Training device, e.g., "cpu", "cuda" or "cuda:0". If None,
            `potential_fn.device` is used.
        x_shape: Shape of a single simulator output. If passed, it is used to check
            the shape of the observed data and give a descriptive error.
    """
    super().__init__(
        potential_fn,
        theta_transform=theta_transform,
        device=device,
        x_shape=x_shape,
    )

    self.proposal = proposal
    self.max_sampling_batch_size = max_sampling_batch_size
    self.num_samples_to_find_max = num_samples_to_find_max
    self.num_iter_to_find_max = num_iter_to_find_max
    self.m = m

    self._purpose = (
        "It provides rejection sampling to .sample() from the posterior and "
        "can evaluate the _unnormalized_ posterior density with .log_prob()."
    )

log_prob(theta, x=None, track_gradients=False)

Returns the log-probability of theta under the posterior.

Parameters:

Name Type Description Default
theta Tensor

Parameters \(\theta\).

required
track_gradients bool

Whether the returned tensor supports tracking gradients. This can be helpful for e.g. sensitivity analysis, but increases memory consumption.

False

Returns:

Type Description
Tensor

len($\theta$)-shaped log-probability.

Source code in sbi/inference/posteriors/rejection_posterior.py
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
def log_prob(
    self, theta: Tensor, x: Optional[Tensor] = None, track_gradients: bool = False
) -> Tensor:
    r"""Returns the log-probability of theta under the posterior.

    Args:
        theta: Parameters $\theta$.
        track_gradients: Whether the returned tensor supports tracking gradients.
            This can be helpful for e.g. sensitivity analysis, but increases memory
            consumption.

    Returns:
        `len($\theta$)`-shaped log-probability.
    """
    warn(
        "`.log_prob()` is deprecated for methods that can only evaluate the log-probability up to a normalizing constant. Use `.potential()` instead."
    )
    warn("The log-probability is unnormalized!")

    self.potential_fn.set_x(self._x_else_default_x(x))

    theta = ensure_theta_batched(torch.as_tensor(theta))
    return self.potential_fn(
        theta.to(self._device), track_gradients=track_gradients
    )

map(x=None, num_iter=1000, num_to_optimize=100, learning_rate=0.01, init_method='proposal', num_init_samples=1000, save_best_every=10, show_progress_bars=False, force_update=False)

Returns the maximum-a-posteriori estimate (MAP).

The method can be interrupted (Ctrl-C) when the user sees that the log-probability converges. The best estimate will be saved in self._map and can be accessed with self.map(). The MAP is obtained by running gradient ascent from a given number of starting positions (samples from the posterior with the highest log-probability). After the optimization is done, we select the parameter set that has the highest log-probability after the optimization.

Warning: The default values used by this function are not well-tested. They might require hand-tuning for the problem at hand.

For developers: if the prior is a BoxUniform, we carry out the optimization in unbounded space and transform the result back into bounded space.

Parameters:

Name Type Description Default
x Optional[Tensor]

Deprecated - use .set_default_x() prior to .map().

None
num_iter int

Number of optimization steps that the algorithm takes to find the MAP.

1000
learning_rate float

Learning rate of the optimizer.

0.01
init_method Union[str, Tensor]

How to select the starting parameters for the optimization. If it is a string, it can be either [posterior, prior], which samples the respective distribution num_init_samples times. If it is a tensor, the tensor will be used as init locations.

'proposal'
num_init_samples int

Draw this number of samples from the posterior and evaluate the log-probability of all of them.

1000
num_to_optimize int

From the drawn num_init_samples, use the num_to_optimize with highest log-probability as the initial points for the optimization.

100
save_best_every int

The best log-probability is computed, saved in the map-attribute, and printed every save_best_every-th iteration. Computing the best log-probability creates a significant overhead (thus, the default is 10.)

10
show_progress_bars bool

Whether to show a progressbar during sampling from the posterior.

False
force_update bool

Whether to re-calculate the MAP when x is unchanged and have a cached value.

False
log_prob_kwargs

Will be empty for SNLE and SNRE. Will contain {‘norm_posterior’: True} for SNPE.

required

Returns:

Type Description
Tensor

The MAP estimate.

Source code in sbi/inference/posteriors/rejection_posterior.py
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
def map(
    self,
    x: Optional[Tensor] = None,
    num_iter: int = 1_000,
    num_to_optimize: int = 100,
    learning_rate: float = 0.01,
    init_method: Union[str, Tensor] = "proposal",
    num_init_samples: int = 1_000,
    save_best_every: int = 10,
    show_progress_bars: bool = False,
    force_update: bool = False,
) -> Tensor:
    r"""Returns the maximum-a-posteriori estimate (MAP).

    The method can be interrupted (Ctrl-C) when the user sees that the
    log-probability converges. The best estimate will be saved in `self._map` and
    can be accessed with `self.map()`. The MAP is obtained by running gradient
    ascent from a given number of starting positions (samples from the posterior
    with the highest log-probability). After the optimization is done, we select the
    parameter set that has the highest log-probability after the optimization.

    Warning: The default values used by this function are not well-tested. They
    might require hand-tuning for the problem at hand.

    For developers: if the prior is a `BoxUniform`, we carry out the optimization
    in unbounded space and transform the result back into bounded space.

    Args:
        x: Deprecated - use `.set_default_x()` prior to `.map()`.
        num_iter: Number of optimization steps that the algorithm takes
            to find the MAP.
        learning_rate: Learning rate of the optimizer.
        init_method: How to select the starting parameters for the optimization. If
            it is a string, it can be either [`posterior`, `prior`], which samples
            the respective distribution `num_init_samples` times. If it is a
            tensor, the tensor will be used as init locations.
        num_init_samples: Draw this number of samples from the posterior and
            evaluate the log-probability of all of them.
        num_to_optimize: From the drawn `num_init_samples`, use the
            `num_to_optimize` with highest log-probability as the initial points
            for the optimization.
        save_best_every: The best log-probability is computed, saved in the
            `map`-attribute, and printed every `save_best_every`-th iteration.
            Computing the best log-probability creates a significant overhead
            (thus, the default is `10`.)
        show_progress_bars: Whether to show a progressbar during sampling from
            the posterior.
        force_update: Whether to re-calculate the MAP when x is unchanged and
            have a cached value.
        log_prob_kwargs: Will be empty for SNLE and SNRE. Will contain
            {'norm_posterior': True} for SNPE.

    Returns:
        The MAP estimate.
    """
    return super().map(
        x=x,
        num_iter=num_iter,
        num_to_optimize=num_to_optimize,
        learning_rate=learning_rate,
        init_method=init_method,
        num_init_samples=num_init_samples,
        save_best_every=save_best_every,
        show_progress_bars=show_progress_bars,
        force_update=force_update,
    )

sample(sample_shape=torch.Size(), x=None, max_sampling_batch_size=None, num_samples_to_find_max=None, num_iter_to_find_max=None, m=None, sample_with=None, show_progress_bars=True)

Return samples from posterior \(p(\theta|x)\) via rejection sampling.

Parameters:

Name Type Description Default
sample_shape Shape

Desired shape of samples that are drawn from posterior. If sample_shape is multidimensional we simply draw sample_shape.numel() samples and then reshape into the desired shape.

Size()
sample_with Optional[str]

This argument only exists to keep backward-compatibility with sbi v0.17.2 or older. If it is set, we instantly raise an error.

None
show_progress_bars bool

Whether to show sampling progress monitor.

True

Returns:

Type Description

Samples from posterior.

Source code in sbi/inference/posteriors/rejection_posterior.py
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
def sample(
    self,
    sample_shape: Shape = torch.Size(),
    x: Optional[Tensor] = None,
    max_sampling_batch_size: Optional[int] = None,
    num_samples_to_find_max: Optional[int] = None,
    num_iter_to_find_max: Optional[int] = None,
    m: Optional[float] = None,
    sample_with: Optional[str] = None,
    show_progress_bars: bool = True,
):
    r"""Return samples from posterior $p(\theta|x)$ via rejection sampling.

    Args:
        sample_shape: Desired shape of samples that are drawn from posterior. If
            sample_shape is multidimensional we simply draw `sample_shape.numel()`
            samples and then reshape into the desired shape.
        sample_with: This argument only exists to keep backward-compatibility with
            `sbi` v0.17.2 or older. If it is set, we instantly raise an error.
        show_progress_bars: Whether to show sampling progress monitor.

    Returns:
        Samples from posterior.
    """
    num_samples = torch.Size(sample_shape).numel()
    self.potential_fn.set_x(self._x_else_default_x(x))

    potential = partial(self.potential_fn, track_gradients=True)

    if sample_with is not None:
        raise ValueError(
            f"You set `sample_with={sample_with}`. As of sbi v0.18.0, setting "
            f"`sample_with` is no longer supported. You have to rerun "
            f"`.build_posterior(sample_with={sample_with}).`"
        )
    # Replace arguments that were not passed with their default.
    max_sampling_batch_size = (
        self.max_sampling_batch_size
        if max_sampling_batch_size is None
        else max_sampling_batch_size
    )
    num_samples_to_find_max = (
        self.num_samples_to_find_max
        if num_samples_to_find_max is None
        else num_samples_to_find_max
    )
    num_iter_to_find_max = (
        self.num_iter_to_find_max
        if num_iter_to_find_max is None
        else num_iter_to_find_max
    )
    m = self.m if m is None else m

    samples, _ = rejection_sample(
        potential,
        proposal=self.proposal,
        num_samples=num_samples,
        show_progress_bars=show_progress_bars,
        warn_acceptance=0.01,
        max_sampling_batch_size=max_sampling_batch_size,
        num_samples_to_find_max=num_samples_to_find_max,
        num_iter_to_find_max=num_iter_to_find_max,
        m=m,
        device=self._device,
    )

    return samples.reshape((*sample_shape, -1))

Bases: NeuralPosterior

Provides VI (Variational Inference) to sample from the posterior.

SNLE or SNRE train neural networks to approximate the likelihood(-ratios). VIPosterior allows to learn a tractable variational posterior \(q(\theta)\) which approximates the true posterior \(p(\theta|x_o)\). After this second training stage, we can produce approximate posterior samples, by just sampling from q with no additional cost. For additional information see [1] and [2].

References:
[1] Variational methods for simulation-based inference, Manuel Glöckler, Michael Deistler, Jakob Macke, 2022, https://openreview.net/forum?id=kZ0UYdhqkNY
[2] Sequential Neural Posterior and Likelihood Approximation, Samuel Wiqvist, Jes Frellsen, Umberto Picchini, 2021, https://arxiv.org/abs/2102.06522

Source code in sbi/inference/posteriors/vi_posterior.py
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
class VIPosterior(NeuralPosterior):
    r"""Provides VI (Variational Inference) to sample from the posterior.<br/><br/>
    SNLE or SNRE train neural networks to approximate the likelihood(-ratios).
    `VIPosterior` allows to learn a tractable variational posterior $q(\theta)$ which
    approximates the true posterior $p(\theta|x_o)$. After this second training stage,
    we can produce approximate posterior samples, by just sampling from q with no
    additional cost. For additional information see [1] and [2].<br/><br/>
    References:<br/>
    [1] Variational methods for simulation-based inference, Manuel Glöckler, Michael
    Deistler, Jakob Macke, 2022, https://openreview.net/forum?id=kZ0UYdhqkNY<br/>
    [2] Sequential Neural Posterior and Likelihood Approximation, Samuel Wiqvist, Jes
    Frellsen, Umberto Picchini, 2021, https://arxiv.org/abs/2102.06522
    """

    def __init__(
        self,
        potential_fn: Callable,
        prior: Optional[TorchDistribution] = None,
        q: Union[str, PyroTransformedDistribution, "VIPosterior", Callable] = "maf",
        theta_transform: Optional[TorchTransform] = None,
        vi_method: str = "rKL",
        device: str = "cpu",
        x_shape: Optional[torch.Size] = None,
        parameters: Iterable = [],
        modules: Iterable = [],
    ):
        """
        Args:
            potential_fn: The potential function from which to draw samples.
            prior: This is the prior distribution. Note that this is only
                used to check/construct the variational distribution or within some
                quality metrics. Please make sure that this matches with the prior
                within the potential_fn. If `None` is given, we will try to infer it
                from potential_fn or q, if this fails we raise an Error.
            q: Variational distribution, either string, `TransformedDistribution`, or a
                `VIPosterior` object. This specifies a parametric class of distribution
                over which the best possible posterior approximation is searched. For
                string input, we currently support [nsf, scf, maf, mcf, gaussian,
                gaussian_diag]. You can also specify your own variational family by
                passing a pyro `TransformedDistribution`.
                Additionally, we allow a `Callable`, which allows you the pass a
                `builder` function, which if called returns a distribution. This may be
                useful for setting the hyperparameters e.g. `num_transfroms` within the
                `get_flow_builder` method specifying the number of transformations
                within a normalizing flow. If q is already a `VIPosterior`, then the
                arguments will be copied from it (relevant for multi-round training).
            theta_transform: Maps form prior support to unconstrained space. The
                inverse is used here to ensure that the posterior support is equal to
                that of the prior.
            vi_method: This specifies the variational methods which are used to fit q to
                the posterior. We currently support [rKL, fKL, IW, alpha]. Note that
                some of the divergences are `mode seeking` i.e. they underestimate
                variance and collapse on multimodal targets (`rKL`, `alpha` for alpha >
                1) and some are `mass covering` i.e. they overestimate variance but
                typically cover all modes (`fKL`, `IW`, `alpha` for alpha < 1).
            device: Training device, e.g., `cpu`, `cuda` or `cuda:0`. We will ensure
                that all other objects are also on this device.
            x_shape: Shape of a single simulator output. If passed, it is used to check
                the shape of the observed data and give a descriptive error.
            parameters: List of parameters of the variational posterior. This is only
                required for user-defined q i.e. if q does not have a `parameters`
                attribute.
            modules: List of modules of the variational posterior. This is only
                required for user-defined q i.e. if q does not have a `modules`
                attribute.
        """
        super().__init__(potential_fn, theta_transform, device, x_shape=x_shape)

        # Especially the prior may be on another device -> move it...
        self._device = device
        self.potential_fn.device = device
        move_all_tensor_to_device(self.potential_fn, device)

        # Get prior and previous builds
        if prior is not None:
            self._prior = prior
        elif hasattr(self.potential_fn, "prior") and isinstance(
            self.potential_fn.prior, Distribution
        ):
            self._prior = self.potential_fn.prior
        elif isinstance(q, VIPosterior) and isinstance(q._prior, Distribution):
            self._prior = q._prior
        else:
            raise ValueError(
                "We could not find a suitable prior distribution within `potential_fn`"
                "or `q` (if a VIPosterior is given). Please explicitly specify a prior."
            )
        move_all_tensor_to_device(self._prior, device)
        self._optimizer = None

        # In contrast to MCMC we want to project into constrained space.
        if theta_transform is None:
            self.link_transform = mcmc_transform(self._prior).inv
        else:
            self.link_transform = theta_transform.inv

        # This will set the variational distribution and VI method
        self.set_q(q, parameters=parameters, modules=modules)
        self.set_vi_method(vi_method)

        self._purpose = (
            "It provides Variational inference to .sample() from the posterior and "
            "can evaluate the _normalized_ posterior density with .log_prob()."
        )

    @property
    def q(self) -> Distribution:
        """Returns the variational posterior."""
        return self._q

    @q.setter
    def q(
        self,
        q: Union[str, Distribution, "VIPosterior", Callable],
    ) -> None:
        """Sets the variational distribution. If the distribution does not admit access
        through `parameters` and `modules` function, please use `set_q` if you want to
        explicitly specify the parameters and modules.


        Args:
            q: Variational distribution, either string, distribution, or a VIPosterior
                object. This specifies a parametric class of distribution over which
                the best possible posterior approximation is searched. For string input,
                we currently support [nsf, scf, maf, mcf, gaussian, gaussian_diag]. Of
                course, you can also specify your own variational family by passing a
                `parameterized` distribution object i.e. a torch.distributions
                Distribution with methods `parameters` returning an iterable of all
                parameters (you can pass them within the paramters/modules attribute).
                Additionally, we allow a `Callable`, which allows you the pass a
                `builder` function, which if called returns an distribution. This may be
                useful for setting the hyperparameters e.g. `num_transfroms:int` by
                using the `get_flow_builder` method specifying the hyperparameters. If q
                is already a `VIPosterior`, then the arguments will be copied from it
                (relevant for multi-round training).


        """
        self.set_q(q)

    def set_q(
        self,
        q: Union[str, PyroTransformedDistribution, "VIPosterior", Callable],
        parameters: Iterable = [],
        modules: Iterable = [],
    ) -> None:
        """Defines the variational family.

        You can specify over which parameters/modules we optimize. This is required for
        custom distributions which e.g. do not inherit nn.Modules or has the function
        `parameters` or `modules` to give direct access to trainable parameters.
        Further, you can pass a function, which constructs a variational distribution
        if called.

        Args:
            q: Variational distribution, either string, distribution, or a VIPosterior
                object. This specifies a parametric class of distribution over which
                the best possible posterior approximation is searched. For string input,
                we currently support [nsf, scf, maf, mcf, gaussian, gaussian_diag]. Of
                course, you can also specify your own variational family by passing a
                `parameterized` distribution object i.e. a torch.distributions
                Distribution with methods `parameters` returning an iterable of all
                parameters (you can pass them within the paramters/modules attribute).
                Additionally, we allow a `Callable`, which allows you the pass a
                `builder` function, which if called returns an distribution. This may be
                useful for setting the hyperparameters e.g. `num_transfroms:int` by
                using the `get_flow_builder` method specifying the hyperparameters. If q
                is already a `VIPosterior`, then the arguments will be copied from it
                (relevant for multi-round training).
            parameters: List of parameters associated with the distribution object.
            modules: List of modules associated with the distribution object.

        """
        self._q_arg = q
        if isinstance(q, Distribution):
            q = adapt_variational_distribution(
                q,
                self._prior,
                self.link_transform,
                parameters=parameters,
                modules=modules,
            )
            make_object_deepcopy_compatible(q)
            self_custom_q_init_cache = deepcopy(q)
            self._q_build_fn = lambda *args, **kwargs: self_custom_q_init_cache
            self._trained_on = None
        elif isinstance(q, str) or isinstance(q, Callable):
            if isinstance(q, str):
                self._q_build_fn = get_flow_builder(q)
            else:
                self._q_build_fn = q

            q = self._q_build_fn(
                self._prior.event_shape,
                self.link_transform,
                device=self._device,
            )
            make_object_deepcopy_compatible(q)
            self._trained_on = None
        elif isinstance(q, VIPosterior):
            self._q_build_fn = q._q_build_fn
            self._trained_on = q._trained_on
            self.vi_method = q.vi_method  # type: ignore
            self._device = q._device
            self._prior = q._prior
            self._x = q._x
            self._q_arg = q._q_arg
            make_object_deepcopy_compatible(q.q)
            q = deepcopy(q.q)
        move_all_tensor_to_device(q, self._device)
        assert isinstance(
            q, Distribution
        ), """Something went wrong when initializing the variational distribution.
            Please create an issue on github https://github.com/mackelab/sbi/issues"""
        check_variational_distribution(q, self._prior)
        self._q = q

    @property
    def vi_method(self) -> str:
        """Variational inference method e.g. one of [rKL, fKL, IW, alpha]."""
        return self._vi_method

    @vi_method.setter
    def vi_method(self, method: str) -> None:
        """See `set_vi_method`."""
        self.set_vi_method(method)

    def set_vi_method(self, method: str) -> "VIPosterior":
        """Sets variational inference method.

        Args:
            method: One of [rKL, fKL, IW, alpha].

        Returns:
            `VIPosterior` for chainable calls.
        """
        self._vi_method = method
        self._optimizer_builder = get_VI_method(method)
        return self

    def sample(
        self,
        sample_shape: Shape = torch.Size(),
        x: Optional[Tensor] = None,
        **kwargs,
    ) -> Tensor:
        """Samples from the variational posterior distribution.

        Args:
            sample_shape: Shape of samples

        Returns:
            Samples from posterior.
        """
        x = self._x_else_default_x(x)
        if self._trained_on is None or (x != self._trained_on).all():
            raise AttributeError(
                f"The variational posterior was not fit on the specified `default_x` "
                f"{x}. Please train using `posterior.train()`."
            )
        samples = self.q.sample(torch.Size(sample_shape))
        return samples.reshape((*sample_shape, samples.shape[-1]))

    def log_prob(
        self,
        theta: Tensor,
        x: Optional[Tensor] = None,
        track_gradients: bool = False,
    ) -> Tensor:
        r"""Returns the log-probability of theta under the variational posterior.

        Args:
            theta: Parameters
            track_gradients: Whether the returned tensor supports tracking gradients.
                This can be helpful for e.g. sensitivity analysis but increases memory
                consumption.

        Returns:
            `len($\theta$)`-shaped log-probability.
        """
        x = self._x_else_default_x(x)
        if self._trained_on is None or (x != self._trained_on).all():
            raise AttributeError(
                f"The variational posterior was not fit using observation {x}.\
                     Please train."
            )
        with torch.set_grad_enabled(track_gradients):
            theta = ensure_theta_batched(torch.as_tensor(theta))
            return self.q.log_prob(theta)

    def train(
        self,
        x: Optional[TorchTensor] = None,
        n_particles: int = 256,
        learning_rate: float = 1e-3,
        gamma: float = 0.999,
        max_num_iters: int = 2000,
        min_num_iters: int = 10,
        clip_value: float = 10.0,
        warm_up_rounds: int = 100,
        retrain_from_scratch: bool = False,
        reset_optimizer: bool = False,
        show_progress_bar: bool = True,
        check_for_convergence: bool = True,
        quality_control: bool = True,
        quality_control_metric: str = "psis",
        **kwargs,
    ) -> "VIPosterior":
        """This method trains the variational posterior.

        Args:
            x: The observation.
            n_particles: Number of samples to approximate expectations within the
                variational bounds. The larger the more accurate are gradient
                estimates, but the computational cost per iteration increases.
            learning_rate: Learning rate of the optimizer.
            gamma: Learning rate decay per iteration. We use an exponential decay
                scheduler.
            max_num_iters: Maximum number of iterations.
            min_num_iters: Minimum number of iterations.
            clip_value: Gradient clipping value, decreasing may help if you see invalid
                values.
            warm_up_rounds: Initialize the posterior as the prior.
            retrain_from_scratch: Retrain the variational distributions from scratch.
            reset_optimizer: Reset the divergence optimizer
            show_progress_bar: If any progress report should be displayed.
            quality_control: If False quality control is skipped.
            quality_control_metric: Which metric to use for evaluating the quality.
            kwargs: Hyperparameters check corresponding `DivergenceOptimizer` for detail
                eps: Determines sensitivity of convergence check.
                retain_graph: Boolean which decides whether to retain the computation
                    graph. This may be required for some `exotic` user-specified q's.
                optimizer: A PyTorch Optimizer class e.g. Adam or SGD. See
                    `DivergenceOptimizer` for details.
                scheduler: A PyTorch learning rate scheduler. See
                    `DivergenceOptimizer` for details.
                alpha: Only used if vi_method=`alpha`. Determines the alpha divergence.
                K: Only used if vi_method=`IW`. Determines the number of importance
                    weighted particles.
                stick_the_landing: If one should use the STL estimator (only for rKL,
                    IW, alpha).
                dreg: If one should use the DREG estimator (only for rKL, IW, alpha).
                weight_transform: Callable applied to importance weights (only for fKL)
        Returns:
            VIPosterior: `VIPosterior` (can be used to chain calls).
        """
        # Update optimizer with current arguments.
        if self._optimizer is not None:
            self._optimizer.update({**locals(), **kwargs})

        # Init q and the optimizer if necessary
        if retrain_from_scratch:
            self.q = self._q_build_fn()  # type: ignore
            self._optimizer = self._optimizer_builder(
                self.potential_fn,
                self.q,
                lr=learning_rate,
                clip_value=clip_value,
                gamma=gamma,
                n_particles=n_particles,
                prior=self._prior,
                **kwargs,
            )

        if (
            reset_optimizer
            or self._optimizer is None
            or not isinstance(self._optimizer, self._optimizer_builder)
        ):
            self._optimizer = self._optimizer_builder(
                self.potential_fn,
                self.q,
                lr=learning_rate,
                clip_value=clip_value,
                gamma=gamma,
                n_particles=n_particles,
                prior=self._prior,
                **kwargs,
            )

        # Check context
        x = atleast_2d_float32_tensor(self._x_else_default_x(x)).to(  # type: ignore
            self._device
        )

        already_trained = self._trained_on is not None and (x == self._trained_on).all()

        # Optimize
        optimizer = self._optimizer
        optimizer.to(self._device)
        optimizer.reset_loss_stats()

        if show_progress_bar:
            iters = tqdm(range(max_num_iters))
        else:
            iters = range(max_num_iters)

        # Warmup before training
        if reset_optimizer or (not optimizer.warm_up_was_done and not already_trained):
            if show_progress_bar:
                iters.set_description(  # type: ignore
                    "Warmup phase, this may take a few seconds..."
                )
            optimizer.warm_up(warm_up_rounds)

        for i in iters:
            optimizer.step(x)
            mean_loss, std_loss = optimizer.get_loss_stats()
            # Update progress bar
            if show_progress_bar:
                assert isinstance(iters, tqdm)
                iters.set_description(  # type: ignore
                    f"Loss: {np.round(float(mean_loss), 2)}"
                    f"Std: {np.round(float(std_loss), 2)}"
                )
            # Check for convergence
            if check_for_convergence and i > min_num_iters:
                if optimizer.converged():
                    if show_progress_bar:
                        print(f"\nConverged with loss: {np.round(float(mean_loss), 2)}")
                    break
        # Training finished:
        self._trained_on = x

        # Evaluate quality
        if quality_control:
            try:
                self.evaluate(quality_control_metric=quality_control_metric)
            except Exception as e:
                print(
                    f"Quality control did not work, we reset the variational \
                        posterior,please check your setting. \
                        \n Following error occured {e}"
                )
                self.train(
                    learning_rate=learning_rate * 0.1,
                    retrain_from_scratch=True,
                    reset_optimizer=True,
                )

        return self

    def evaluate(self, quality_control_metric: str = "psis", N: int = int(5e4)) -> None:
        """This function will evaluate the quality of the variational posterior
        distribution. We currently support two different metrics of type `psis`, which
        checks the quality based on the tails of importance weights (there should not be
        much with a large one), or `prop` which checks the proportionality between q
        and potential_fn.

        NOTE: In our experience `prop` is sensitive to distinguish ``good`` from ``ok``
        whereas `psis` is more sensitive in distinguishing `very bad` from `ok`.

        Args:
            quality_control_metric: The metric of choice, we currently support [psis,
                prop, prop_prior].
            N: Number of samples which is used to evaluate the metric.
        """
        quality_control_fn, quality_control_msg = get_quality_metric(
            quality_control_metric
        )
        metric = round(float(quality_control_fn(self, N=N)), 3)
        print(f"Quality Score: {metric} " + quality_control_msg)

    def map(
        self,
        x: Optional[TorchTensor] = None,
        num_iter: int = 1_000,
        num_to_optimize: int = 100,
        learning_rate: float = 0.01,
        init_method: Union[str, TorchTensor] = "proposal",
        num_init_samples: int = 10_000,
        save_best_every: int = 10,
        show_progress_bars: bool = False,
        force_update: bool = False,
    ) -> Tensor:
        r"""Returns the maximum-a-posteriori estimate (MAP).

        The method can be interrupted (Ctrl-C) when the user sees that the
        log-probability converges. The best estimate will be saved in `self._map` and
        can be accessed with `self.map()`. The MAP is obtained by running gradient
        ascent from a given number of starting positions (samples from the posterior
        with the highest log-probability). After the optimization is done, we select the
        parameter set that has the highest log-probability after the optimization.

        Warning: The default values used by this function are not well-tested. They
        might require hand-tuning for the problem at hand.

        For developers: if the prior is a `BoxUniform`, we carry out the optimization
        in unbounded space and transform the result back into bounded space.

        Args:
            x: Deprecated - use `.set_default_x()` prior to `.map()`.
            num_iter: Number of optimization steps that the algorithm takes
                to find the MAP.
            learning_rate: Learning rate of the optimizer.
            init_method: How to select the starting parameters for the optimization. If
                it is a string, it can be either [`posterior`, `prior`], which samples
                the respective distribution `num_init_samples` times. If it is a
                tensor, the tensor will be used as init locations.
            num_init_samples: Draw this number of samples from the posterior and
                evaluate the log-probability of all of them.
            num_to_optimize: From the drawn `num_init_samples`, use the
                `num_to_optimize` with highest log-probability as the initial points
                for the optimization.
            save_best_every: The best log-probability is computed, saved in the
                `map`-attribute, and printed every `save_best_every`-th iteration.
                Computing the best log-probability creates a significant overhead
                (thus, the default is `10`.)
            show_progress_bars: Whether to show a progressbar during sampling from
                the posterior.
            force_update: Whether to re-calculate the MAP when x is unchanged and
                have a cached value.
            log_prob_kwargs: Will be empty for SNLE and SNRE. Will contain
                {'norm_posterior': True} for SNPE.

        Returns:
            The MAP estimate.
        """
        self.proposal = self.q
        return super().map(
            x=x,
            num_iter=num_iter,
            num_to_optimize=num_to_optimize,
            learning_rate=learning_rate,
            init_method=init_method,
            num_init_samples=num_init_samples,
            save_best_every=save_best_every,
            show_progress_bars=show_progress_bars,
            force_update=force_update,
        )

q: Distribution property writable

Returns the variational posterior.

vi_method: str property writable

Variational inference method e.g. one of [rKL, fKL, IW, alpha].

__init__(potential_fn, prior=None, q='maf', theta_transform=None, vi_method='rKL', device='cpu', x_shape=None, parameters=[], modules=[])

Parameters:

Name Type Description Default
potential_fn Callable

The potential function from which to draw samples.

required
prior Optional[TorchDistribution]

This is the prior distribution. Note that this is only used to check/construct the variational distribution or within some quality metrics. Please make sure that this matches with the prior within the potential_fn. If None is given, we will try to infer it from potential_fn or q, if this fails we raise an Error.

None
q Union[str, PyroTransformedDistribution, VIPosterior, Callable]

Variational distribution, either string, TransformedDistribution, or a VIPosterior object. This specifies a parametric class of distribution over which the best possible posterior approximation is searched. For string input, we currently support [nsf, scf, maf, mcf, gaussian, gaussian_diag]. You can also specify your own variational family by passing a pyro TransformedDistribution. Additionally, we allow a Callable, which allows you the pass a builder function, which if called returns a distribution. This may be useful for setting the hyperparameters e.g. num_transfroms within the get_flow_builder method specifying the number of transformations within a normalizing flow. If q is already a VIPosterior, then the arguments will be copied from it (relevant for multi-round training).

'maf'
theta_transform Optional[TorchTransform]

Maps form prior support to unconstrained space. The inverse is used here to ensure that the posterior support is equal to that of the prior.

None
vi_method str

This specifies the variational methods which are used to fit q to the posterior. We currently support [rKL, fKL, IW, alpha]. Note that some of the divergences are mode seeking i.e. they underestimate variance and collapse on multimodal targets (rKL, alpha for alpha > 1) and some are mass covering i.e. they overestimate variance but typically cover all modes (fKL, IW, alpha for alpha < 1).

'rKL'
device str

Training device, e.g., cpu, cuda or cuda:0. We will ensure that all other objects are also on this device.

'cpu'
x_shape Optional[Size]

Shape of a single simulator output. If passed, it is used to check the shape of the observed data and give a descriptive error.

None
parameters Iterable

List of parameters of the variational posterior. This is only required for user-defined q i.e. if q does not have a parameters attribute.

[]
modules Iterable

List of modules of the variational posterior. This is only required for user-defined q i.e. if q does not have a modules attribute.

[]
Source code in sbi/inference/posteriors/vi_posterior.py
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
def __init__(
    self,
    potential_fn: Callable,
    prior: Optional[TorchDistribution] = None,
    q: Union[str, PyroTransformedDistribution, "VIPosterior", Callable] = "maf",
    theta_transform: Optional[TorchTransform] = None,
    vi_method: str = "rKL",
    device: str = "cpu",
    x_shape: Optional[torch.Size] = None,
    parameters: Iterable = [],
    modules: Iterable = [],
):
    """
    Args:
        potential_fn: The potential function from which to draw samples.
        prior: This is the prior distribution. Note that this is only
            used to check/construct the variational distribution or within some
            quality metrics. Please make sure that this matches with the prior
            within the potential_fn. If `None` is given, we will try to infer it
            from potential_fn or q, if this fails we raise an Error.
        q: Variational distribution, either string, `TransformedDistribution`, or a
            `VIPosterior` object. This specifies a parametric class of distribution
            over which the best possible posterior approximation is searched. For
            string input, we currently support [nsf, scf, maf, mcf, gaussian,
            gaussian_diag]. You can also specify your own variational family by
            passing a pyro `TransformedDistribution`.
            Additionally, we allow a `Callable`, which allows you the pass a
            `builder` function, which if called returns a distribution. This may be
            useful for setting the hyperparameters e.g. `num_transfroms` within the
            `get_flow_builder` method specifying the number of transformations
            within a normalizing flow. If q is already a `VIPosterior`, then the
            arguments will be copied from it (relevant for multi-round training).
        theta_transform: Maps form prior support to unconstrained space. The
            inverse is used here to ensure that the posterior support is equal to
            that of the prior.
        vi_method: This specifies the variational methods which are used to fit q to
            the posterior. We currently support [rKL, fKL, IW, alpha]. Note that
            some of the divergences are `mode seeking` i.e. they underestimate
            variance and collapse on multimodal targets (`rKL`, `alpha` for alpha >
            1) and some are `mass covering` i.e. they overestimate variance but
            typically cover all modes (`fKL`, `IW`, `alpha` for alpha < 1).
        device: Training device, e.g., `cpu`, `cuda` or `cuda:0`. We will ensure
            that all other objects are also on this device.
        x_shape: Shape of a single simulator output. If passed, it is used to check
            the shape of the observed data and give a descriptive error.
        parameters: List of parameters of the variational posterior. This is only
            required for user-defined q i.e. if q does not have a `parameters`
            attribute.
        modules: List of modules of the variational posterior. This is only
            required for user-defined q i.e. if q does not have a `modules`
            attribute.
    """
    super().__init__(potential_fn, theta_transform, device, x_shape=x_shape)

    # Especially the prior may be on another device -> move it...
    self._device = device
    self.potential_fn.device = device
    move_all_tensor_to_device(self.potential_fn, device)

    # Get prior and previous builds
    if prior is not None:
        self._prior = prior
    elif hasattr(self.potential_fn, "prior") and isinstance(
        self.potential_fn.prior, Distribution
    ):
        self._prior = self.potential_fn.prior
    elif isinstance(q, VIPosterior) and isinstance(q._prior, Distribution):
        self._prior = q._prior
    else:
        raise ValueError(
            "We could not find a suitable prior distribution within `potential_fn`"
            "or `q` (if a VIPosterior is given). Please explicitly specify a prior."
        )
    move_all_tensor_to_device(self._prior, device)
    self._optimizer = None

    # In contrast to MCMC we want to project into constrained space.
    if theta_transform is None:
        self.link_transform = mcmc_transform(self._prior).inv
    else:
        self.link_transform = theta_transform.inv

    # This will set the variational distribution and VI method
    self.set_q(q, parameters=parameters, modules=modules)
    self.set_vi_method(vi_method)

    self._purpose = (
        "It provides Variational inference to .sample() from the posterior and "
        "can evaluate the _normalized_ posterior density with .log_prob()."
    )

evaluate(quality_control_metric='psis', N=int(50000.0))

This function will evaluate the quality of the variational posterior distribution. We currently support two different metrics of type psis, which checks the quality based on the tails of importance weights (there should not be much with a large one), or prop which checks the proportionality between q and potential_fn.

NOTE: In our experience prop is sensitive to distinguish good from ok whereas psis is more sensitive in distinguishing very bad from ok.

Parameters:

Name Type Description Default
quality_control_metric str

The metric of choice, we currently support [psis, prop, prop_prior].

'psis'
N int

Number of samples which is used to evaluate the metric.

int(50000.0)
Source code in sbi/inference/posteriors/vi_posterior.py
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
def evaluate(self, quality_control_metric: str = "psis", N: int = int(5e4)) -> None:
    """This function will evaluate the quality of the variational posterior
    distribution. We currently support two different metrics of type `psis`, which
    checks the quality based on the tails of importance weights (there should not be
    much with a large one), or `prop` which checks the proportionality between q
    and potential_fn.

    NOTE: In our experience `prop` is sensitive to distinguish ``good`` from ``ok``
    whereas `psis` is more sensitive in distinguishing `very bad` from `ok`.

    Args:
        quality_control_metric: The metric of choice, we currently support [psis,
            prop, prop_prior].
        N: Number of samples which is used to evaluate the metric.
    """
    quality_control_fn, quality_control_msg = get_quality_metric(
        quality_control_metric
    )
    metric = round(float(quality_control_fn(self, N=N)), 3)
    print(f"Quality Score: {metric} " + quality_control_msg)

log_prob(theta, x=None, track_gradients=False)

Returns the log-probability of theta under the variational posterior.

Parameters:

Name Type Description Default
theta Tensor

Parameters

required
track_gradients bool

Whether the returned tensor supports tracking gradients. This can be helpful for e.g. sensitivity analysis but increases memory consumption.

False

Returns:

Type Description
Tensor

len($\theta$)-shaped log-probability.

Source code in sbi/inference/posteriors/vi_posterior.py
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
def log_prob(
    self,
    theta: Tensor,
    x: Optional[Tensor] = None,
    track_gradients: bool = False,
) -> Tensor:
    r"""Returns the log-probability of theta under the variational posterior.

    Args:
        theta: Parameters
        track_gradients: Whether the returned tensor supports tracking gradients.
            This can be helpful for e.g. sensitivity analysis but increases memory
            consumption.

    Returns:
        `len($\theta$)`-shaped log-probability.
    """
    x = self._x_else_default_x(x)
    if self._trained_on is None or (x != self._trained_on).all():
        raise AttributeError(
            f"The variational posterior was not fit using observation {x}.\
                 Please train."
        )
    with torch.set_grad_enabled(track_gradients):
        theta = ensure_theta_batched(torch.as_tensor(theta))
        return self.q.log_prob(theta)

map(x=None, num_iter=1000, num_to_optimize=100, learning_rate=0.01, init_method='proposal', num_init_samples=10000, save_best_every=10, show_progress_bars=False, force_update=False)

Returns the maximum-a-posteriori estimate (MAP).

The method can be interrupted (Ctrl-C) when the user sees that the log-probability converges. The best estimate will be saved in self._map and can be accessed with self.map(). The MAP is obtained by running gradient ascent from a given number of starting positions (samples from the posterior with the highest log-probability). After the optimization is done, we select the parameter set that has the highest log-probability after the optimization.

Warning: The default values used by this function are not well-tested. They might require hand-tuning for the problem at hand.

For developers: if the prior is a BoxUniform, we carry out the optimization in unbounded space and transform the result back into bounded space.

Parameters:

Name Type Description Default
x Optional[TorchTensor]

Deprecated - use .set_default_x() prior to .map().

None
num_iter int

Number of optimization steps that the algorithm takes to find the MAP.

1000
learning_rate float

Learning rate of the optimizer.

0.01
init_method Union[str, TorchTensor]

How to select the starting parameters for the optimization. If it is a string, it can be either [posterior, prior], which samples the respective distribution num_init_samples times. If it is a tensor, the tensor will be used as init locations.

'proposal'
num_init_samples int

Draw this number of samples from the posterior and evaluate the log-probability of all of them.

10000
num_to_optimize int

From the drawn num_init_samples, use the num_to_optimize with highest log-probability as the initial points for the optimization.

100
save_best_every int

The best log-probability is computed, saved in the map-attribute, and printed every save_best_every-th iteration. Computing the best log-probability creates a significant overhead (thus, the default is 10.)

10
show_progress_bars bool

Whether to show a progressbar during sampling from the posterior.

False
force_update bool

Whether to re-calculate the MAP when x is unchanged and have a cached value.

False
log_prob_kwargs

Will be empty for SNLE and SNRE. Will contain {‘norm_posterior’: True} for SNPE.

required

Returns:

Type Description
Tensor

The MAP estimate.

Source code in sbi/inference/posteriors/vi_posterior.py
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
def map(
    self,
    x: Optional[TorchTensor] = None,
    num_iter: int = 1_000,
    num_to_optimize: int = 100,
    learning_rate: float = 0.01,
    init_method: Union[str, TorchTensor] = "proposal",
    num_init_samples: int = 10_000,
    save_best_every: int = 10,
    show_progress_bars: bool = False,
    force_update: bool = False,
) -> Tensor:
    r"""Returns the maximum-a-posteriori estimate (MAP).

    The method can be interrupted (Ctrl-C) when the user sees that the
    log-probability converges. The best estimate will be saved in `self._map` and
    can be accessed with `self.map()`. The MAP is obtained by running gradient
    ascent from a given number of starting positions (samples from the posterior
    with the highest log-probability). After the optimization is done, we select the
    parameter set that has the highest log-probability after the optimization.

    Warning: The default values used by this function are not well-tested. They
    might require hand-tuning for the problem at hand.

    For developers: if the prior is a `BoxUniform`, we carry out the optimization
    in unbounded space and transform the result back into bounded space.

    Args:
        x: Deprecated - use `.set_default_x()` prior to `.map()`.
        num_iter: Number of optimization steps that the algorithm takes
            to find the MAP.
        learning_rate: Learning rate of the optimizer.
        init_method: How to select the starting parameters for the optimization. If
            it is a string, it can be either [`posterior`, `prior`], which samples
            the respective distribution `num_init_samples` times. If it is a
            tensor, the tensor will be used as init locations.
        num_init_samples: Draw this number of samples from the posterior and
            evaluate the log-probability of all of them.
        num_to_optimize: From the drawn `num_init_samples`, use the
            `num_to_optimize` with highest log-probability as the initial points
            for the optimization.
        save_best_every: The best log-probability is computed, saved in the
            `map`-attribute, and printed every `save_best_every`-th iteration.
            Computing the best log-probability creates a significant overhead
            (thus, the default is `10`.)
        show_progress_bars: Whether to show a progressbar during sampling from
            the posterior.
        force_update: Whether to re-calculate the MAP when x is unchanged and
            have a cached value.
        log_prob_kwargs: Will be empty for SNLE and SNRE. Will contain
            {'norm_posterior': True} for SNPE.

    Returns:
        The MAP estimate.
    """
    self.proposal = self.q
    return super().map(
        x=x,
        num_iter=num_iter,
        num_to_optimize=num_to_optimize,
        learning_rate=learning_rate,
        init_method=init_method,
        num_init_samples=num_init_samples,
        save_best_every=save_best_every,
        show_progress_bars=show_progress_bars,
        force_update=force_update,
    )

sample(sample_shape=torch.Size(), x=None, **kwargs)

Samples from the variational posterior distribution.

Parameters:

Name Type Description Default
sample_shape Shape

Shape of samples

Size()

Returns:

Type Description
Tensor

Samples from posterior.

Source code in sbi/inference/posteriors/vi_posterior.py
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
def sample(
    self,
    sample_shape: Shape = torch.Size(),
    x: Optional[Tensor] = None,
    **kwargs,
) -> Tensor:
    """Samples from the variational posterior distribution.

    Args:
        sample_shape: Shape of samples

    Returns:
        Samples from posterior.
    """
    x = self._x_else_default_x(x)
    if self._trained_on is None or (x != self._trained_on).all():
        raise AttributeError(
            f"The variational posterior was not fit on the specified `default_x` "
            f"{x}. Please train using `posterior.train()`."
        )
    samples = self.q.sample(torch.Size(sample_shape))
    return samples.reshape((*sample_shape, samples.shape[-1]))

set_q(q, parameters=[], modules=[])

Defines the variational family.

You can specify over which parameters/modules we optimize. This is required for custom distributions which e.g. do not inherit nn.Modules or has the function parameters or modules to give direct access to trainable parameters. Further, you can pass a function, which constructs a variational distribution if called.

Parameters:

Name Type Description Default
q Union[str, PyroTransformedDistribution, VIPosterior, Callable]

Variational distribution, either string, distribution, or a VIPosterior object. This specifies a parametric class of distribution over which the best possible posterior approximation is searched. For string input, we currently support [nsf, scf, maf, mcf, gaussian, gaussian_diag]. Of course, you can also specify your own variational family by passing a parameterized distribution object i.e. a torch.distributions Distribution with methods parameters returning an iterable of all parameters (you can pass them within the paramters/modules attribute). Additionally, we allow a Callable, which allows you the pass a builder function, which if called returns an distribution. This may be useful for setting the hyperparameters e.g. num_transfroms:int by using the get_flow_builder method specifying the hyperparameters. If q is already a VIPosterior, then the arguments will be copied from it (relevant for multi-round training).

required
parameters Iterable

List of parameters associated with the distribution object.

[]
modules Iterable

List of modules associated with the distribution object.

[]
Source code in sbi/inference/posteriors/vi_posterior.py
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
def set_q(
    self,
    q: Union[str, PyroTransformedDistribution, "VIPosterior", Callable],
    parameters: Iterable = [],
    modules: Iterable = [],
) -> None:
    """Defines the variational family.

    You can specify over which parameters/modules we optimize. This is required for
    custom distributions which e.g. do not inherit nn.Modules or has the function
    `parameters` or `modules` to give direct access to trainable parameters.
    Further, you can pass a function, which constructs a variational distribution
    if called.

    Args:
        q: Variational distribution, either string, distribution, or a VIPosterior
            object. This specifies a parametric class of distribution over which
            the best possible posterior approximation is searched. For string input,
            we currently support [nsf, scf, maf, mcf, gaussian, gaussian_diag]. Of
            course, you can also specify your own variational family by passing a
            `parameterized` distribution object i.e. a torch.distributions
            Distribution with methods `parameters` returning an iterable of all
            parameters (you can pass them within the paramters/modules attribute).
            Additionally, we allow a `Callable`, which allows you the pass a
            `builder` function, which if called returns an distribution. This may be
            useful for setting the hyperparameters e.g. `num_transfroms:int` by
            using the `get_flow_builder` method specifying the hyperparameters. If q
            is already a `VIPosterior`, then the arguments will be copied from it
            (relevant for multi-round training).
        parameters: List of parameters associated with the distribution object.
        modules: List of modules associated with the distribution object.

    """
    self._q_arg = q
    if isinstance(q, Distribution):
        q = adapt_variational_distribution(
            q,
            self._prior,
            self.link_transform,
            parameters=parameters,
            modules=modules,
        )
        make_object_deepcopy_compatible(q)
        self_custom_q_init_cache = deepcopy(q)
        self._q_build_fn = lambda *args, **kwargs: self_custom_q_init_cache
        self._trained_on = None
    elif isinstance(q, str) or isinstance(q, Callable):
        if isinstance(q, str):
            self._q_build_fn = get_flow_builder(q)
        else:
            self._q_build_fn = q

        q = self._q_build_fn(
            self._prior.event_shape,
            self.link_transform,
            device=self._device,
        )
        make_object_deepcopy_compatible(q)
        self._trained_on = None
    elif isinstance(q, VIPosterior):
        self._q_build_fn = q._q_build_fn
        self._trained_on = q._trained_on
        self.vi_method = q.vi_method  # type: ignore
        self._device = q._device
        self._prior = q._prior
        self._x = q._x
        self._q_arg = q._q_arg
        make_object_deepcopy_compatible(q.q)
        q = deepcopy(q.q)
    move_all_tensor_to_device(q, self._device)
    assert isinstance(
        q, Distribution
    ), """Something went wrong when initializing the variational distribution.
        Please create an issue on github https://github.com/mackelab/sbi/issues"""
    check_variational_distribution(q, self._prior)
    self._q = q

set_vi_method(method)

Sets variational inference method.

Parameters:

Name Type Description Default
method str

One of [rKL, fKL, IW, alpha].

required

Returns:

Type Description
VIPosterior

VIPosterior for chainable calls.

Source code in sbi/inference/posteriors/vi_posterior.py
261
262
263
264
265
266
267
268
269
270
271
272
def set_vi_method(self, method: str) -> "VIPosterior":
    """Sets variational inference method.

    Args:
        method: One of [rKL, fKL, IW, alpha].

    Returns:
        `VIPosterior` for chainable calls.
    """
    self._vi_method = method
    self._optimizer_builder = get_VI_method(method)
    return self

train(x=None, n_particles=256, learning_rate=0.001, gamma=0.999, max_num_iters=2000, min_num_iters=10, clip_value=10.0, warm_up_rounds=100, retrain_from_scratch=False, reset_optimizer=False, show_progress_bar=True, check_for_convergence=True, quality_control=True, quality_control_metric='psis', **kwargs)

This method trains the variational posterior.

Parameters:

Name Type Description Default
x Optional[TorchTensor]

The observation.

None
n_particles int

Number of samples to approximate expectations within the variational bounds. The larger the more accurate are gradient estimates, but the computational cost per iteration increases.

256
learning_rate float

Learning rate of the optimizer.

0.001
gamma float

Learning rate decay per iteration. We use an exponential decay scheduler.

0.999
max_num_iters int

Maximum number of iterations.

2000
min_num_iters int

Minimum number of iterations.

10
clip_value float

Gradient clipping value, decreasing may help if you see invalid values.

10.0
warm_up_rounds int

Initialize the posterior as the prior.

100
retrain_from_scratch bool

Retrain the variational distributions from scratch.

False
reset_optimizer bool

Reset the divergence optimizer

False
show_progress_bar bool

If any progress report should be displayed.

True
quality_control bool

If False quality control is skipped.

True
quality_control_metric str

Which metric to use for evaluating the quality.

'psis'
kwargs

Hyperparameters check corresponding DivergenceOptimizer for detail eps: Determines sensitivity of convergence check. retain_graph: Boolean which decides whether to retain the computation graph. This may be required for some exotic user-specified q’s. optimizer: A PyTorch Optimizer class e.g. Adam or SGD. See DivergenceOptimizer for details. scheduler: A PyTorch learning rate scheduler. See DivergenceOptimizer for details. alpha: Only used if vi_method=alpha. Determines the alpha divergence. K: Only used if vi_method=IW. Determines the number of importance weighted particles. stick_the_landing: If one should use the STL estimator (only for rKL, IW, alpha). dreg: If one should use the DREG estimator (only for rKL, IW, alpha). weight_transform: Callable applied to importance weights (only for fKL)

{}

Returns: VIPosterior: VIPosterior (can be used to chain calls).

Source code in sbi/inference/posteriors/vi_posterior.py
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
def train(
    self,
    x: Optional[TorchTensor] = None,
    n_particles: int = 256,
    learning_rate: float = 1e-3,
    gamma: float = 0.999,
    max_num_iters: int = 2000,
    min_num_iters: int = 10,
    clip_value: float = 10.0,
    warm_up_rounds: int = 100,
    retrain_from_scratch: bool = False,
    reset_optimizer: bool = False,
    show_progress_bar: bool = True,
    check_for_convergence: bool = True,
    quality_control: bool = True,
    quality_control_metric: str = "psis",
    **kwargs,
) -> "VIPosterior":
    """This method trains the variational posterior.

    Args:
        x: The observation.
        n_particles: Number of samples to approximate expectations within the
            variational bounds. The larger the more accurate are gradient
            estimates, but the computational cost per iteration increases.
        learning_rate: Learning rate of the optimizer.
        gamma: Learning rate decay per iteration. We use an exponential decay
            scheduler.
        max_num_iters: Maximum number of iterations.
        min_num_iters: Minimum number of iterations.
        clip_value: Gradient clipping value, decreasing may help if you see invalid
            values.
        warm_up_rounds: Initialize the posterior as the prior.
        retrain_from_scratch: Retrain the variational distributions from scratch.
        reset_optimizer: Reset the divergence optimizer
        show_progress_bar: If any progress report should be displayed.
        quality_control: If False quality control is skipped.
        quality_control_metric: Which metric to use for evaluating the quality.
        kwargs: Hyperparameters check corresponding `DivergenceOptimizer` for detail
            eps: Determines sensitivity of convergence check.
            retain_graph: Boolean which decides whether to retain the computation
                graph. This may be required for some `exotic` user-specified q's.
            optimizer: A PyTorch Optimizer class e.g. Adam or SGD. See
                `DivergenceOptimizer` for details.
            scheduler: A PyTorch learning rate scheduler. See
                `DivergenceOptimizer` for details.
            alpha: Only used if vi_method=`alpha`. Determines the alpha divergence.
            K: Only used if vi_method=`IW`. Determines the number of importance
                weighted particles.
            stick_the_landing: If one should use the STL estimator (only for rKL,
                IW, alpha).
            dreg: If one should use the DREG estimator (only for rKL, IW, alpha).
            weight_transform: Callable applied to importance weights (only for fKL)
    Returns:
        VIPosterior: `VIPosterior` (can be used to chain calls).
    """
    # Update optimizer with current arguments.
    if self._optimizer is not None:
        self._optimizer.update({**locals(), **kwargs})

    # Init q and the optimizer if necessary
    if retrain_from_scratch:
        self.q = self._q_build_fn()  # type: ignore
        self._optimizer = self._optimizer_builder(
            self.potential_fn,
            self.q,
            lr=learning_rate,
            clip_value=clip_value,
            gamma=gamma,
            n_particles=n_particles,
            prior=self._prior,
            **kwargs,
        )

    if (
        reset_optimizer
        or self._optimizer is None
        or not isinstance(self._optimizer, self._optimizer_builder)
    ):
        self._optimizer = self._optimizer_builder(
            self.potential_fn,
            self.q,
            lr=learning_rate,
            clip_value=clip_value,
            gamma=gamma,
            n_particles=n_particles,
            prior=self._prior,
            **kwargs,
        )

    # Check context
    x = atleast_2d_float32_tensor(self._x_else_default_x(x)).to(  # type: ignore
        self._device
    )

    already_trained = self._trained_on is not None and (x == self._trained_on).all()

    # Optimize
    optimizer = self._optimizer
    optimizer.to(self._device)
    optimizer.reset_loss_stats()

    if show_progress_bar:
        iters = tqdm(range(max_num_iters))
    else:
        iters = range(max_num_iters)

    # Warmup before training
    if reset_optimizer or (not optimizer.warm_up_was_done and not already_trained):
        if show_progress_bar:
            iters.set_description(  # type: ignore
                "Warmup phase, this may take a few seconds..."
            )
        optimizer.warm_up(warm_up_rounds)

    for i in iters:
        optimizer.step(x)
        mean_loss, std_loss = optimizer.get_loss_stats()
        # Update progress bar
        if show_progress_bar:
            assert isinstance(iters, tqdm)
            iters.set_description(  # type: ignore
                f"Loss: {np.round(float(mean_loss), 2)}"
                f"Std: {np.round(float(std_loss), 2)}"
            )
        # Check for convergence
        if check_for_convergence and i > min_num_iters:
            if optimizer.converged():
                if show_progress_bar:
                    print(f"\nConverged with loss: {np.round(float(mean_loss), 2)}")
                break
    # Training finished:
    self._trained_on = x

    # Evaluate quality
    if quality_control:
        try:
            self.evaluate(quality_control_metric=quality_control_metric)
        except Exception as e:
            print(
                f"Quality control did not work, we reset the variational \
                    posterior,please check your setting. \
                    \n Following error occured {e}"
            )
            self.train(
                learning_rate=learning_rate * 0.1,
                retrain_from_scratch=True,
                reset_optimizer=True,
            )

    return self

Models

Returns a function that builds a density estimator for learning the posterior.

This function will usually be used for SNPE. The returned function is to be passed to the inference class when using the flexible interface.

Parameters:

Name Type Description Default
model str

The type of density estimator that will be created. One of [mdn, made, maf, maf_rqs, nsf].

required
z_score_theta Optional[str]

Whether to z-score parameters \(\theta\) before passing them into the network, can take one of the following: - none, or None: do not z-score. - independent: z-score each dimension independently. - structured: treat dimensions as related, therefore compute mean and std over the entire batch, instead of per-dimension. Should be used when each sample is, for example, a time series or an image.

'independent'
z_score_x Optional[str]

Whether to z-score simulation outputs \(x\) before passing them into the network, same options as z_score_theta.

'independent'
hidden_features int

Number of hidden features.

50
num_transforms int

Number of transforms when a flow is used. Only relevant if density estimator is a normalizing flow (i.e. currently either a maf or a nsf). Ignored if density estimator is a mdn or made.

5
num_bins int

Number of bins used for the splines in nsf. Ignored if density estimator not nsf.

10
embedding_net Module

Optional embedding network for simulation outputs \(x\). This embedding net allows to learn features from potentially high-dimensional simulation outputs.

Identity()
num_components int

Number of mixture components for a mixture of Gaussians. Ignored if density estimator is not an mdn.

10
kwargs

additional custom arguments passed to downstream build functions.

{}
Source code in sbi/utils/get_nn_models.py
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
def posterior_nn(
    model: str,
    z_score_theta: Optional[str] = "independent",
    z_score_x: Optional[str] = "independent",
    hidden_features: int = 50,
    num_transforms: int = 5,
    num_bins: int = 10,
    embedding_net: nn.Module = nn.Identity(),
    num_components: int = 10,
    **kwargs,
) -> Callable:
    r"""
    Returns a function that builds a density estimator for learning the posterior.

    This function will usually be used for SNPE. The returned function is to be passed
    to the inference class when using the flexible interface.

    Args:
        model: The type of density estimator that will be created. One of [`mdn`,
            `made`, `maf`, `maf_rqs`, `nsf`].
        z_score_theta: Whether to z-score parameters $\theta$ before passing them into
            the network, can take one of the following:
            - `none`, or None: do not z-score.
            - `independent`: z-score each dimension independently.
            - `structured`: treat dimensions as related, therefore compute mean and std
            over the entire batch, instead of per-dimension. Should be used when each
            sample is, for example, a time series or an image.
        z_score_x: Whether to z-score simulation outputs $x$ before passing them into
            the network, same options as z_score_theta.
        hidden_features: Number of hidden features.
        num_transforms: Number of transforms when a flow is used. Only relevant if
            density estimator is a normalizing flow (i.e. currently either a `maf` or a
            `nsf`). Ignored if density estimator is a `mdn` or `made`.
        num_bins: Number of bins used for the splines in `nsf`. Ignored if density
            estimator not `nsf`.
        embedding_net: Optional embedding network for simulation outputs $x$. This
            embedding net allows to learn features from potentially high-dimensional
            simulation outputs.
        num_components: Number of mixture components for a mixture of Gaussians.
            Ignored if density estimator is not an mdn.
        kwargs: additional custom arguments passed to downstream build functions.
    """

    kwargs = dict(
        zip(
            (
                "z_score_x",
                "z_score_y",
                "hidden_features",
                "num_transforms",
                "num_bins",
                "embedding_net",
                "num_components",
            ),
            (
                z_score_theta,
                z_score_x,
                hidden_features,
                num_transforms,
                num_bins,
                embedding_net,
                num_components,
            ),
        ),
        **kwargs,
    )

    def build_fn_snpe_a(batch_theta, batch_x, num_components):
        """Build function for SNPE-A

        Extract the number of components from the kwargs, such that they are exposed as
        a kwargs, offering the possibility to later override this kwarg with
        `functools.partial`. This is necessary in order to make sure that the MDN in
        SNPE-A only has one component when running the Algorithm 1 part.
        """
        return build_mdn(
            batch_x=batch_theta,
            batch_y=batch_x,
            num_components=num_components,
            **kwargs,
        )

    def build_fn(batch_theta, batch_x):
        if model == "mdn":
            return build_mdn(batch_x=batch_theta, batch_y=batch_x, **kwargs)
        elif model == "made":
            return build_made(batch_x=batch_theta, batch_y=batch_x, **kwargs)
        elif model == "maf":
            return build_maf(batch_x=batch_theta, batch_y=batch_x, **kwargs)
        elif model == "maf_rqs":
            return build_maf_rqs(batch_x=batch_theta, batch_y=batch_x, **kwargs)
        elif model == "nsf":
            return build_nsf(batch_x=batch_theta, batch_y=batch_x, **kwargs)
        else:
            raise NotImplementedError

    if model == "mdn_snpe_a":
        if num_components != 10:
            raise ValueError(
                "You set `num_components`. For SNPE-A, this has to be done at "
                "instantiation of the inference object, i.e. "
                "`inference = SNPE_A(..., num_components=20)`"
            )
        kwargs.pop("num_components")

    return build_fn_snpe_a if model == "mdn_snpe_a" else build_fn

Returns a function that builds a density estimator for learning the likelihood.

This function will usually be used for SNLE. The returned function is to be passed to the inference class when using the flexible interface.

Parameters:

Name Type Description Default
model str

The type of density estimator that will be created. One of [mdn, made, maf, maf_rqs, nsf].

required
z_score_theta Optional[str]

Whether to z-score parameters \(\theta\) before passing them into the network, can take one of the following: - none, or None: do not z-score. - independent: z-score each dimension independently. - structured: treat dimensions as related, therefore compute mean and std over the entire batch, instead of per-dimension. Should be used when each sample is, for example, a time series or an image.

'independent'
z_score_x Optional[str]

Whether to z-score simulation outputs \(x\) before passing them into the network, same options as z_score_theta.

'independent'
hidden_features int

Number of hidden features.

50
num_transforms int

Number of transforms when a flow is used. Only relevant if density estimator is a normalizing flow (i.e. currently either a maf or a nsf). Ignored if density estimator is a mdn or made.

5
num_bins int

Number of bins used for the splines in nsf. Ignored if density estimator not nsf.

10
embedding_net Module

Optional embedding network for parameters \(\theta\).

Identity()
num_components int

Number of mixture components for a mixture of Gaussians. Ignored if density estimator is not an mdn.

10
kwargs

additional custom arguments passed to downstream build functions.

{}
Source code in sbi/utils/get_nn_models.py
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
def likelihood_nn(
    model: str,
    z_score_theta: Optional[str] = "independent",
    z_score_x: Optional[str] = "independent",
    hidden_features: int = 50,
    num_transforms: int = 5,
    num_bins: int = 10,
    embedding_net: nn.Module = nn.Identity(),
    num_components: int = 10,
    **kwargs,
) -> Callable:
    r"""
    Returns a function that builds a density estimator for learning the likelihood.

    This function will usually be used for SNLE. The returned function is to be passed
    to the inference class when using the flexible interface.

    Args:
        model: The type of density estimator that will be created. One of [`mdn`,
            `made`, `maf`, `maf_rqs`, `nsf`].
        z_score_theta: Whether to z-score parameters $\theta$ before passing them into
            the network, can take one of the following:
            - `none`, or None: do not z-score.
            - `independent`: z-score each dimension independently.
            - `structured`: treat dimensions as related, therefore compute mean and std
            over the entire batch, instead of per-dimension. Should be used when each
            sample is, for example, a time series or an image.
        z_score_x: Whether to z-score simulation outputs $x$ before passing them into
            the network, same options as z_score_theta.
        hidden_features: Number of hidden features.
        num_transforms: Number of transforms when a flow is used. Only relevant if
            density estimator is a normalizing flow (i.e. currently either a `maf` or a
            `nsf`). Ignored if density estimator is a `mdn` or `made`.
        num_bins: Number of bins used for the splines in `nsf`. Ignored if density
            estimator not `nsf`.
        embedding_net: Optional embedding network for parameters $\theta$.
        num_components: Number of mixture components for a mixture of Gaussians.
            Ignored if density estimator is not an mdn.
        kwargs: additional custom arguments passed to downstream build functions.
    """

    kwargs = dict(
        zip(
            (
                "z_score_x",
                "z_score_y",
                "hidden_features",
                "num_transforms",
                "num_bins",
                "embedding_net",
                "num_components",
            ),
            (
                z_score_x,
                z_score_theta,
                hidden_features,
                num_transforms,
                num_bins,
                embedding_net,
                num_components,
            ),
        ),
        **kwargs,
    )

    def build_fn(batch_theta, batch_x):
        if model == "mdn":
            return build_mdn(batch_x=batch_x, batch_y=batch_theta, **kwargs)
        elif model == "made":
            return build_made(batch_x=batch_x, batch_y=batch_theta, **kwargs)
        elif model == "maf":
            return build_maf(batch_x=batch_x, batch_y=batch_theta, **kwargs)
        elif model == "maf_rqs":
            return build_maf_rqs(batch_x=batch_x, batch_y=batch_theta, **kwargs)
        elif model == "nsf":
            return build_nsf(batch_x=batch_x, batch_y=batch_theta, **kwargs)
        elif model == "mnle":
            return build_mnle(batch_x=batch_x, batch_y=batch_theta, **kwargs)
        else:
            raise NotImplementedError

    return build_fn

Returns a function that builds a classifier for learning density ratios.

This function will usually be used for SNRE. The returned function is to be passed to the inference class when using the flexible interface.

Note that in the view of the SNRE classifier we build below, x=theta and y=x.

Parameters:

Name Type Description Default
model str

The type of classifier that will be created. One of [linear, mlp, resnet].

required
z_score_theta Optional[str]

Whether to z-score parameters \(\theta\) before passing them into the network, can take one of the following: - none, or None: do not z-score. - independent: z-score each dimension independently. - structured: treat dimensions as related, therefore compute mean and std over the entire batch, instead of per-dimension. Should be used when each sample is, for example, a time series or an image.

'independent'
z_score_x Optional[str]

Whether to z-score simulation outputs \(x\) before passing them into the network, same options as z_score_theta.

'independent'
hidden_features int

Number of hidden features.

50
embedding_net_theta Module

Optional embedding network for parameters \(\theta\).

Identity()
embedding_net_x Module

Optional embedding network for simulation outputs \(x\). This embedding net allows to learn features from potentially high-dimensional simulation outputs.

Identity()
kwargs

additional custom arguments passed to downstream build functions.

{}
Source code in sbi/utils/get_nn_models.py
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
def classifier_nn(
    model: str,
    z_score_theta: Optional[str] = "independent",
    z_score_x: Optional[str] = "independent",
    hidden_features: int = 50,
    embedding_net_theta: nn.Module = nn.Identity(),
    embedding_net_x: nn.Module = nn.Identity(),
    **kwargs,
) -> Callable:
    r"""
    Returns a function that builds a classifier for learning density ratios.

    This function will usually be used for SNRE. The returned function is to be passed
    to the inference class when using the flexible interface.

    Note that in the view of the SNRE classifier we build below, x=theta and y=x.

    Args:
        model: The type of classifier that will be created. One of [`linear`, `mlp`,
            `resnet`].
        z_score_theta: Whether to z-score parameters $\theta$ before passing them into
            the network, can take one of the following:
            - `none`, or None: do not z-score.
            - `independent`: z-score each dimension independently.
            - `structured`: treat dimensions as related, therefore compute mean and std
            over the entire batch, instead of per-dimension. Should be used when each
            sample is, for example, a time series or an image.
        z_score_x: Whether to z-score simulation outputs $x$ before passing them into
            the network, same options as z_score_theta.
        hidden_features: Number of hidden features.
        embedding_net_theta:  Optional embedding network for parameters $\theta$.
        embedding_net_x:  Optional embedding network for simulation outputs $x$. This
            embedding net allows to learn features from potentially high-dimensional
            simulation outputs.
        kwargs: additional custom arguments passed to downstream build functions.
    """

    kwargs = dict(
        zip(
            (
                "z_score_x",
                "z_score_y",
                "hidden_features",
                "embedding_net_x",
                "embedding_net_y",
            ),
            (
                z_score_theta,
                z_score_x,
                hidden_features,
                embedding_net_theta,
                embedding_net_x,
            ),
        ),
        **kwargs,
    )

    def build_fn(batch_theta, batch_x):
        if model == "linear":
            return build_linear_classifier(
                batch_x=batch_theta, batch_y=batch_x, **kwargs
            )
        if model == "mlp":
            return build_mlp_classifier(batch_x=batch_theta, batch_y=batch_x, **kwargs)
        if model == "resnet":
            return build_resnet_classifier(
                batch_x=batch_theta, batch_y=batch_x, **kwargs
            )
        else:
            raise NotImplementedError

    return build_fn

Potentials

Returns the potential for posterior-based methods.

It also returns a transformation that can be used to transform the potential into unconstrained space.

The potential is the same as the log-probability of the posterior_estimator, but it is set to \(-\inf\) outside of the prior bounds.

Parameters:

Name Type Description Default
posterior_estimator Module

The neural network modelling the posterior.

required
prior Distribution

The prior distribution.

required
x_o Optional[Tensor]

The observed data at which to evaluate the posterior.

required
enable_transform bool

Whether to transform parameters to unconstrained space. When False, an identity transform will be returned for theta_transform.

True

Returns:

Type Description
Callable

The potential function and a transformation that maps

TorchTransform

to unconstrained space.

Source code in sbi/inference/potentials/posterior_based_potential.py
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
def posterior_estimator_based_potential(
    posterior_estimator: nn.Module,
    prior: Distribution,
    x_o: Optional[Tensor],
    enable_transform: bool = True,
) -> Tuple[Callable, TorchTransform]:
    r"""Returns the potential for posterior-based methods.

    It also returns a transformation that can be used to transform the potential into
    unconstrained space.

    The potential is the same as the log-probability of the `posterior_estimator`, but
    it is set to $-\inf$ outside of the prior bounds.

    Args:
        posterior_estimator: The neural network modelling the posterior.
        prior: The prior distribution.
        x_o: The observed data at which to evaluate the posterior.
        enable_transform: Whether to transform parameters to unconstrained space.
            When False, an identity transform will be returned for `theta_transform`.

    Returns:
        The potential function and a transformation that maps
        to unconstrained space.
    """

    device = str(next(posterior_estimator.parameters()).device)

    potential_fn = PosteriorBasedPotential(
        posterior_estimator, prior, x_o, device=device
    )

    theta_transform = mcmc_transform(
        prior, device=device, enable_transform=enable_transform
    )

    return potential_fn, theta_transform

Returns potential \(\log(p(x_o|\theta)p(\theta))\) for likelihood-based methods.

It also returns a transformation that can be used to transform the potential into unconstrained space.

Parameters:

Name Type Description Default
likelihood_estimator Module

The neural network modelling the likelihood.

required
prior Distribution

The prior distribution.

required
x_o Optional[Tensor]

The observed data at which to evaluate the likelihood.

required
enable_transform bool

Whether to transform parameters to unconstrained space. When False, an identity transform will be returned for theta_transform.

True

Returns:

Type Description
Callable

The potential function \(p(x_o|\theta)p(\theta)\) and a transformation that maps

TorchTransform

to unconstrained space.

Source code in sbi/inference/potentials/likelihood_based_potential.py
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
def likelihood_estimator_based_potential(
    likelihood_estimator: nn.Module,
    prior: Distribution,
    x_o: Optional[Tensor],
    enable_transform: bool = True,
) -> Tuple[Callable, TorchTransform]:
    r"""Returns potential $\log(p(x_o|\theta)p(\theta))$ for likelihood-based methods.

    It also returns a transformation that can be used to transform the potential into
    unconstrained space.

    Args:
        likelihood_estimator: The neural network modelling the likelihood.
        prior: The prior distribution.
        x_o: The observed data at which to evaluate the likelihood.
        enable_transform: Whether to transform parameters to unconstrained space.
             When False, an identity transform will be returned for `theta_transform`.

    Returns:
        The potential function $p(x_o|\theta)p(\theta)$ and a transformation that maps
        to unconstrained space.
    """

    device = str(next(likelihood_estimator.parameters()).device)

    potential_fn = LikelihoodBasedPotential(
        likelihood_estimator, prior, x_o, device=device
    )
    theta_transform = mcmc_transform(
        prior, device=device, enable_transform=enable_transform
    )

    return potential_fn, theta_transform

Returns the potential for ratio-based methods.

It also returns a transformation that can be used to transform the potential into unconstrained space.

Parameters:

Name Type Description Default
ratio_estimator Module

The neural network modelling likelihood-to-evidence ratio.

required
prior Distribution

The prior distribution.

required
x_o Optional[Tensor]

The observed data at which to evaluate the likelihood-to-evidence ratio.

required
enable_transform bool

Whether to transform parameters to unconstrained space. When False, an identity transform will be returned for theta_transform.

True

Returns:

Type Description
Callable

The potential function and a transformation that maps

TorchTransform

to unconstrained space.

Source code in sbi/inference/potentials/ratio_based_potential.py
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
def ratio_estimator_based_potential(
    ratio_estimator: nn.Module,
    prior: Distribution,
    x_o: Optional[Tensor],
    enable_transform: bool = True,
) -> Tuple[Callable, TorchTransform]:
    r"""Returns the potential for ratio-based methods.

    It also returns a transformation that can be used to transform the potential into
    unconstrained space.

    Args:
        ratio_estimator: The neural network modelling likelihood-to-evidence ratio.
        prior: The prior distribution.
        x_o: The observed data at which to evaluate the likelihood-to-evidence ratio.
        enable_transform: Whether to transform parameters to unconstrained space.
            When False, an identity transform will be returned for `theta_transform`.

    Returns:
        The potential function and a transformation that maps
        to unconstrained space.
    """

    device = str(next(ratio_estimator.parameters()).device)

    potential_fn = RatioBasedPotential(ratio_estimator, prior, x_o, device=device)
    theta_transform = mcmc_transform(
        prior, device=device, enable_transform=enable_transform
    )

    return potential_fn, theta_transform

Analysis

Plot samples in a 2D grid showing marginals and pairwise marginals.

Each of the diagonal plots can be interpreted as a 1D-marginal of the distribution that the samples were drawn from. Each upper-diagonal plot can be interpreted as a 2D-marginal of the distribution.

Parameters:

Name Type Description Default
samples Union[List[ndarray], List[Tensor], ndarray, Tensor]

Samples used to build the histogram.

required
points Optional[Union[List[ndarray], List[Tensor], ndarray, Tensor]]

List of additional points to scatter.

None
limits Optional[Union[List, Tensor]]

Array containing the plot xlim for each parameter dimension. If None, just use the min and max of the passed samples

None
subset Optional[List[int]]

List containing the dimensions to plot. E.g. subset=[1,3] will plot plot only the 1st and 3rd dimension but will discard the 0th and 2nd (and, if they exist, the 4th, 5th and so on).

None
offdiag Optional[Union[List[str], str]]

Plotting style for upper diagonal, {hist, scatter, contour, cond, None}.

'hist'
upper Optional[str]

deprecated, use offdiag instead.

None
diag Optional[Union[List[str], str]]

Plotting style for diagonal, {hist, cond, None}.

'hist'
figsize Tuple

Size of the entire figure.

(10, 10)
labels Optional[List[str]]

List of strings specifying the names of the parameters.

None
ticks Union[List, Tensor]

Position of the ticks.

[]
fig

matplotlib figure to plot on.

None
axes

matplotlib axes corresponding to fig.

None
**kwargs

Additional arguments to adjust the plot, e.g., samples_colors, points_colors and many more, see the source code in _get_default_opts() in sbi.analysis.plot for details.

{}

Returns: figure and axis of posterior distribution plot

Source code in sbi/analysis/plot.py
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
def pairplot(
    samples: Union[List[np.ndarray], List[torch.Tensor], np.ndarray, torch.Tensor],
    points: Optional[
        Union[List[np.ndarray], List[torch.Tensor], np.ndarray, torch.Tensor]
    ] = None,
    limits: Optional[Union[List, torch.Tensor]] = None,
    subset: Optional[List[int]] = None,
    offdiag: Optional[Union[List[str], str]] = "hist",
    diag: Optional[Union[List[str], str]] = "hist",
    figsize: Tuple = (10, 10),
    labels: Optional[List[str]] = None,
    ticks: Union[List, torch.Tensor] = [],
    upper: Optional[str] = None,
    fig=None,
    axes=None,
    **kwargs,
):
    """
    Plot samples in a 2D grid showing marginals and pairwise marginals.

    Each of the diagonal plots can be interpreted as a 1D-marginal of the distribution
    that the samples were drawn from. Each upper-diagonal plot can be interpreted as a
    2D-marginal of the distribution.

    Args:
        samples: Samples used to build the histogram.
        points: List of additional points to scatter.
        limits: Array containing the plot xlim for each parameter dimension. If None,
            just use the min and max of the passed samples
        subset: List containing the dimensions to plot. E.g. subset=[1,3] will plot
            plot only the 1st and 3rd dimension but will discard the 0th and 2nd (and,
            if they exist, the 4th, 5th and so on).
        offdiag: Plotting style for upper diagonal, {hist, scatter, contour, cond,
            None}.
        upper: deprecated, use offdiag instead.
        diag: Plotting style for diagonal, {hist, cond, None}.
        figsize: Size of the entire figure.
        labels: List of strings specifying the names of the parameters.
        ticks: Position of the ticks.
        fig: matplotlib figure to plot on.
        axes: matplotlib axes corresponding to fig.
        **kwargs: Additional arguments to adjust the plot, e.g., `samples_colors`,
            `points_colors` and many more, see the source code in `_get_default_opts()`
            in `sbi.analysis.plot` for details.

    Returns: figure and axis of posterior distribution plot
    """

    # TODO: add color map support
    # TODO: automatically determine good bin sizes for histograms
    # TODO: add legend (if legend is True)

    opts = _get_default_opts()
    # update the defaults dictionary by the current values of the variables (passed by
    # the user)

    opts = _update(opts, locals())
    opts = _update(opts, kwargs)

    samples, dim, limits = prepare_for_plot(samples, limits)

    # checks.
    if opts["legend"]:
        assert len(opts["samples_labels"]) >= len(
            samples
        ), "Provide at least as many labels as samples."
    if opts["upper"] is not None:
        warn("upper is deprecated, use offdiag instead.")
        opts["offdiag"] = opts["upper"]

    # Prepare diag/upper/lower
    if type(opts["diag"]) is not list:
        opts["diag"] = [opts["diag"] for _ in range(len(samples))]
    if type(opts["offdiag"]) is not list:
        opts["offdiag"] = [opts["offdiag"] for _ in range(len(samples))]
    # if type(opts['lower']) is not list:
    #    opts['lower'] = [opts['lower'] for _ in range(len(samples))]
    opts["lower"] = None

    diag_func = get_diag_func(samples, limits, opts, **kwargs)

    def offdiag_func(row, col, limits, **kwargs):
        if len(samples) > 0:
            for n, v in enumerate(samples):
                if opts["offdiag"][n] == "hist" or opts["offdiag"][n] == "hist2d":
                    hist, xedges, yedges = np.histogram2d(
                        v[:, col],
                        v[:, row],
                        range=[
                            [limits[col][0], limits[col][1]],
                            [limits[row][0], limits[row][1]],
                        ],
                        **opts["hist_offdiag"],
                    )
                    plt.imshow(
                        hist.T,
                        origin="lower",
                        extent=(
                            xedges[0],
                            xedges[-1],
                            yedges[0],
                            yedges[-1],
                        ),
                        aspect="auto",
                    )

                elif opts["offdiag"][n] in [
                    "kde",
                    "kde2d",
                    "contour",
                    "contourf",
                ]:
                    density = gaussian_kde(
                        v[:, [col, row]].T,
                        bw_method=opts["kde_offdiag"]["bw_method"],
                    )
                    X, Y = np.meshgrid(
                        np.linspace(
                            limits[col][0],
                            limits[col][1],
                            opts["kde_offdiag"]["bins"],
                        ),
                        np.linspace(
                            limits[row][0],
                            limits[row][1],
                            opts["kde_offdiag"]["bins"],
                        ),
                    )
                    positions = np.vstack([X.ravel(), Y.ravel()])
                    Z = np.reshape(density(positions).T, X.shape)

                    if opts["offdiag"][n] == "kde" or opts["offdiag"][n] == "kde2d":
                        plt.imshow(
                            Z,
                            extent=(
                                limits[col][0],
                                limits[col][1],
                                limits[row][0],
                                limits[row][1],
                            ),
                            origin="lower",
                            aspect="auto",
                        )
                    elif opts["offdiag"][n] == "contour":
                        if opts["contour_offdiag"]["percentile"]:
                            Z = probs2contours(Z, opts["contour_offdiag"]["levels"])
                        else:
                            Z = (Z - Z.min()) / (Z.max() - Z.min())
                        plt.contour(
                            X,
                            Y,
                            Z,
                            origin="lower",
                            extent=[
                                limits[col][0],
                                limits[col][1],
                                limits[row][0],
                                limits[row][1],
                            ],
                            colors=opts["samples_colors"][n],
                            levels=opts["contour_offdiag"]["levels"],
                        )
                    else:
                        pass
                elif opts["offdiag"][n] == "scatter":
                    plt.scatter(
                        v[:, col],
                        v[:, row],
                        color=opts["samples_colors"][n],
                        **opts["scatter_offdiag"],
                    )
                elif opts["offdiag"][n] == "plot":
                    plt.plot(
                        v[:, col],
                        v[:, row],
                        color=opts["samples_colors"][n],
                        **opts["plot_offdiag"],
                    )
                else:
                    pass

    return _arrange_plots(
        diag_func, offdiag_func, dim, limits, points, opts, fig=fig, axes=axes
    )

Plot samples in a row showing 1D marginals of selected dimensions.

Each of the plots can be interpreted as a 1D-marginal of the distribution that the samples were drawn from.

Parameters:

Name Type Description Default
samples Union[List[ndarray], List[Tensor], ndarray, Tensor]

Samples used to build the histogram.

required
points Optional[Union[List[ndarray], List[Tensor], ndarray, Tensor]]

List of additional points to scatter.

None
limits Optional[Union[List, Tensor]]

Array containing the plot xlim for each parameter dimension. If None, just use the min and max of the passed samples

None
subset Optional[List[int]]

List containing the dimensions to plot. E.g. subset=[1,3] will plot plot only the 1st and 3rd dimension but will discard the 0th and 2nd (and, if they exist, the 4th, 5th and so on).

None
diag Optional[str]

Plotting style for 1D marginals, {hist, kde cond, None}.

'hist'
figsize Tuple

Size of the entire figure.

(10, 10)
labels Optional[List[str]]

List of strings specifying the names of the parameters.

None
ticks Union[List, Tensor]

Position of the ticks.

[]
points_colors

Colors of the points.

required
fig

matplotlib figure to plot on.

None
axes

matplotlib axes corresponding to fig.

None
**kwargs

Additional arguments to adjust the plot, e.g., samples_colors, points_colors and many more, see the source code in _get_default_opts() in sbi.analysis.plot for details.

{}

Returns: figure and axis of posterior distribution plot

Source code in sbi/analysis/plot.py
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
def marginal_plot(
    samples: Union[List[np.ndarray], List[torch.Tensor], np.ndarray, torch.Tensor],
    points: Optional[
        Union[List[np.ndarray], List[torch.Tensor], np.ndarray, torch.Tensor]
    ] = None,
    limits: Optional[Union[List, torch.Tensor]] = None,
    subset: Optional[List[int]] = None,
    diag: Optional[str] = "hist",
    figsize: Tuple = (10, 10),
    labels: Optional[List[str]] = None,
    ticks: Union[List, torch.Tensor] = [],
    fig=None,
    axes=None,
    **kwargs,
):
    """
    Plot samples in a row showing 1D marginals of selected dimensions.

    Each of the plots can be interpreted as a 1D-marginal of the distribution
    that the samples were drawn from.

    Args:
        samples: Samples used to build the histogram.
        points: List of additional points to scatter.
        limits: Array containing the plot xlim for each parameter dimension. If None,
            just use the min and max of the passed samples
        subset: List containing the dimensions to plot. E.g. subset=[1,3] will plot
            plot only the 1st and 3rd dimension but will discard the 0th and 2nd (and,
            if they exist, the 4th, 5th and so on).
        diag: Plotting style for 1D marginals, {hist, kde cond, None}.
        figsize: Size of the entire figure.
        labels: List of strings specifying the names of the parameters.
        ticks: Position of the ticks.
        points_colors: Colors of the `points`.
        fig: matplotlib figure to plot on.
        axes: matplotlib axes corresponding to fig.
        **kwargs: Additional arguments to adjust the plot, e.g., `samples_colors`,
            `points_colors` and many more, see the source code in `_get_default_opts()`
            in `sbi.analysis.plot` for details.

    Returns: figure and axis of posterior distribution plot
    """

    opts = _get_default_opts()
    # update the defaults dictionary by the current values of the variables (passed by
    # the user)

    opts = _update(opts, locals())
    opts = _update(opts, kwargs)

    samples, dim, limits = prepare_for_plot(samples, limits)

    # Prepare diag/upper/lower
    if type(opts["diag"]) is not list:
        opts["diag"] = [opts["diag"] for _ in range(len(samples))]

    diag_func = get_diag_func(samples, limits, opts, **kwargs)

    return _arrange_plots(
        diag_func, None, dim, limits, points, opts, fig=fig, axes=axes
    )

Plot conditional distribution given all other parameters.

The conditionals can be interpreted as slices through the density at a location given by condition.

For example: Say we have a 3D density with parameters \(\theta_0\), \(\theta_1\), \(\theta_2\) and a condition \(c\) passed by the user in the condition argument. For the plot of \(\theta_0\) on the diagonal, this will plot the conditional \(p(\theta_0 | \theta_1=c[1], \theta_2=c[2])\). For the upper diagonal of \(\theta_1\) and \(\theta_2\), it will plot \(p(\theta_1, \theta_2 | \theta_0=c[0])\). All other diagonals and upper-diagonals are built in the corresponding way.

Parameters:

Name Type Description Default
density Any

Probability density with a log_prob() method.

required
condition Tensor

Condition that all but the one/two regarded parameters are fixed to. The condition should be of shape (1, dim_theta), i.e. it could e.g. be a sample from the posterior distribution.

required
limits Union[List, Tensor]

Limits in between which each parameter will be evaluated.

required
points Optional[Union[List[ndarray], List[Tensor], ndarray, Tensor]]

Additional points to scatter.

None
subset Optional[List[int]]

List containing the dimensions to plot. E.g. subset=[1,3] will plot plot only the 1st and 3rd dimension but will discard the 0th and 2nd (and, if they exist, the 4th, 5th and so on)

None
resolution int

Resolution of the grid at which we evaluate the pdf.

50
figsize Tuple

Size of the entire figure.

(10, 10)
labels Optional[List[str]]

List of strings specifying the names of the parameters.

None
ticks Union[List, Tensor]

Position of the ticks.

[]
points_colors

Colors of the points.

required
fig

matplotlib figure to plot on.

None
axes

matplotlib axes corresponding to fig.

None
**kwargs

Additional arguments to adjust the plot, e.g., samples_colors, points_colors and many more, see the source code in _get_default_opts() in sbi.analysis.plot for details.

{}

Returns: figure and axis of posterior distribution plot

Source code in sbi/analysis/plot.py
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
def conditional_pairplot(
    density: Any,
    condition: torch.Tensor,
    limits: Union[List, torch.Tensor],
    points: Optional[
        Union[List[np.ndarray], List[torch.Tensor], np.ndarray, torch.Tensor]
    ] = None,
    subset: Optional[List[int]] = None,
    resolution: int = 50,
    figsize: Tuple = (10, 10),
    labels: Optional[List[str]] = None,
    ticks: Union[List, torch.Tensor] = [],
    fig=None,
    axes=None,
    **kwargs,
):
    r"""
    Plot conditional distribution given all other parameters.

    The conditionals can be interpreted as slices through the `density` at a location
    given by `condition`.

    For example:
    Say we have a 3D density with parameters $\theta_0$, $\theta_1$, $\theta_2$ and
    a condition $c$ passed by the user in the `condition` argument.
    For the plot of $\theta_0$ on the diagonal, this will plot the conditional
    $p(\theta_0 | \theta_1=c[1], \theta_2=c[2])$. For the upper
    diagonal of $\theta_1$ and $\theta_2$, it will plot
    $p(\theta_1, \theta_2 | \theta_0=c[0])$. All other diagonals and upper-diagonals
    are built in the corresponding way.

    Args:
        density: Probability density with a `log_prob()` method.
        condition: Condition that all but the one/two regarded parameters are fixed to.
            The condition should be of shape (1, dim_theta), i.e. it could e.g. be
            a sample from the posterior distribution.
        limits: Limits in between which each parameter will be evaluated.
        points: Additional points to scatter.
        subset: List containing the dimensions to plot. E.g. subset=[1,3] will plot
            plot only the 1st and 3rd dimension but will discard the 0th and 2nd (and,
            if they exist, the 4th, 5th and so on)
        resolution: Resolution of the grid at which we evaluate the `pdf`.
        figsize: Size of the entire figure.
        labels: List of strings specifying the names of the parameters.
        ticks: Position of the ticks.
        points_colors: Colors of the `points`.

        fig: matplotlib figure to plot on.
        axes: matplotlib axes corresponding to fig.
        **kwargs: Additional arguments to adjust the plot, e.g., `samples_colors`,
            `points_colors` and many more, see the source code in `_get_default_opts()`
            in `sbi.analysis.plot` for details.

    Returns: figure and axis of posterior distribution plot
    """
    device = density._device if hasattr(density, "_device") else "cpu"

    # Setting these is required because _pairplot_scaffold will check if opts['diag'] is
    # `None`. This would break if opts has no key 'diag'. Same for 'upper'.
    diag = "cond"
    offdiag = "cond"

    opts = _get_default_opts()
    # update the defaults dictionary by the current values of the variables (passed by
    # the user)
    opts = _update(opts, locals())
    opts = _update(opts, kwargs)
    opts["lower"] = None

    dim, limits, eps_margins = prepare_for_conditional_plot(condition, opts)
    diag_func = get_conditional_diag_func(opts, limits, eps_margins, resolution)

    def offdiag_func(row, col, **kwargs):
        p_image = (
            eval_conditional_density(
                opts["density"],
                opts["condition"].to(device),
                limits.to(device),
                row,
                col,
                resolution=resolution,
                eps_margins1=eps_margins[row],
                eps_margins2=eps_margins[col],
            )
            .to("cpu")
            .numpy()
        )
        plt.imshow(
            p_image.T,
            origin="lower",
            extent=(
                limits[col, 0].item(),
                limits[col, 1].item(),
                limits[row, 0].item(),
                limits[row, 1].item(),
            ),
            aspect="auto",
        )

    return _arrange_plots(
        diag_func, offdiag_func, dim, limits, points, opts, fig=fig, axes=axes
    )

Returns the conditional correlation matrix of a distribution.

To compute the conditional distribution, we condition all but two parameters to values from condition, and then compute the Pearson correlation coefficient \(\rho\) between the remaining two parameters under the distribution density. We do so for any pair of parameters specified in subset, thus creating a matrix containing conditional correlations between any pair of parameters.

If condition is a batch of conditions, this function computes the conditional correlation matrix for each one of them and returns the mean.

Parameters:

Name Type Description Default
density Any

Probability density function with .log_prob() function.

required
limits Tensor

Limits within which to evaluate the density.

required
condition Tensor

Values to condition the density on. If a batch of conditions is passed, we compute the conditional correlation matrix for each of them and return the average conditional correlation matrix.

required
subset Optional[List[int]]

Evaluate the conditional distribution only on a subset of dimensions. If None this function uses all dimensions.

None
resolution int

Number of grid points on which the conditional distribution is evaluated. A higher value increases the accuracy of the estimated correlation but also increases the computational cost.

50

Returns: Average conditional correlation matrix of shape either (num_dim, num_dim) or (len(subset), len(subset)) if subset was specified.

Source code in sbi/analysis/conditional_density.py
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
def conditional_corrcoeff(
    density: Any,
    limits: Tensor,
    condition: Tensor,
    subset: Optional[List[int]] = None,
    resolution: int = 50,
) -> Tensor:
    r"""Returns the conditional correlation matrix of a distribution.

    To compute the conditional distribution, we condition all but two parameters to
    values from `condition`, and then compute the Pearson correlation
    coefficient $\rho$ between the remaining two parameters under the distribution
    `density`. We do so for any pair of parameters specified in `subset`, thus
    creating a matrix containing conditional correlations between any pair of
    parameters.

    If `condition` is a batch of conditions, this function computes the conditional
    correlation matrix for each one of them and returns the mean.

    Args:
        density: Probability density function with `.log_prob()` function.
        limits: Limits within which to evaluate the `density`.
        condition: Values to condition the `density` on. If a batch of conditions is
            passed, we compute the conditional correlation matrix for each of them and
            return the average conditional correlation matrix.
        subset: Evaluate the conditional distribution only on a subset of dimensions.
            If `None` this function uses all dimensions.
        resolution: Number of grid points on which the conditional distribution is
            evaluated. A higher value increases the accuracy of the estimated
            correlation but also increases the computational cost.

    Returns: Average conditional correlation matrix of shape either `(num_dim, num_dim)`
    or `(len(subset), len(subset))` if `subset` was specified.
    """

    device = density._device if hasattr(density, "_device") else "cpu"

    subset_ = subset if subset is not None else range(condition.shape[1])

    correlation_matrices = []
    for cond in condition:
        correlation_matrices.append(
            torch.stack(
                [
                    compute_corrcoeff(
                        eval_conditional_density(
                            density,
                            cond.to(device),
                            limits.to(device),
                            dim1=dim1,
                            dim2=dim2,
                            resolution=resolution,
                        ),
                        limits[[dim1, dim2]].to(device),
                    )
                    for dim1 in subset_
                    for dim2 in subset_
                    if dim1 < dim2
                ]
            )
        )

    average_correlations = torch.mean(torch.stack(correlation_matrices), dim=0)

    # `average_correlations` is still a vector containing the upper triangular entries.
    # Below, assemble them into a matrix:
    av_correlation_matrix = torch.zeros((len(subset_), len(subset_)), device=device)
    triu_indices = torch.triu_indices(
        row=len(subset_), col=len(subset_), offset=1, device=device
    )
    av_correlation_matrix[triu_indices[0], triu_indices[1]] = average_correlations

    # Make the matrix symmetric by copying upper diagonal to lower diagonal.
    av_correlation_matrix = torch.triu(av_correlation_matrix) + torch.tril(
        av_correlation_matrix.T
    )

    av_correlation_matrix.fill_diagonal_(1.0)
    return av_correlation_matrix