Can I stop neural network training and resume it later?¶
Many clusters have a time limit and sbi
might exceed this limit. You can circumvent this problem by using the flexible interface. After simulations are finished, sbi
trains a neural network. If this process takes too long, you can stop training and resume it later. The syntax is:
inference = SNPE(prior=prior)
inference = inference.append_simulations(theta, x)
inference.train(max_num_epochs=300) # Pick `max_num_epochs` such that it does not exceed the runtime.
with open("path/to/my/inference.pkl", "wb") as handle:
pickle.dump(inference, handle)
# To resume training:
with open("path/to/my/inference.pkl", "rb") as handle:
inference_from_disk = pickle.load(handle)
inference_from_disk.train(resume_training=True, max_num_epochs=600) # Run epochs 301 until 600 (or stop early).
posterior = inference_from_disk.build_posterior()