Tutorial 2: Running an MCMC sampler¶
In the first tutorial we generated a single likelihood. In this tutorial we we run an MCMC analysis to explore a parameter space and put constraints on some parameters. There are lots of different MCMC algorithms available through CosmoSIS; in this example we will use one called emcee, which is popular in astronomy.
This example uses a supernova likelihood, of the Pantheon supernova sample, which measures the cosmic redshift-distance relation.
Running an MCMC¶
Have a look at examples/pantheon.ini
and its values file examples/pantheon_values.ini
.
Let’s try using MPI parallelism to speed up this analysis. Run this command:
mpirun -n 4 cosmosis --mpi examples/pantheon.ini
If that fails straight away then you may not have MPI installed (MPI should work automatically with the conda installation method - let us know if not). If it fails then you can fall back to serial mode:
cosmosis examples/pantheon.ini
The code will take a few minutes to run, and will generate a file called output/pantheon.txt
as output. This file will contain a Monte Carlo Markov Chain (MCMC) which you can use as samples from the posterior probability distribution of the model given the data.
The Parameter File¶
This time the parameter file, examples/pantheon.ini
contains these lines (plus some comments):
[runtime]
sampler = emcee
[emcee]
walkers = 32
samples = 300
nsteps = 10
This tells CosmoSIS to use the emcee sampler, and configures it to use 32 walkers (points exploring the parameter space). It tells it to generate 300 samples per walker, and to save results to disc every 10 steps.
The output file¶
The parameter file also contains these lines:
[output]
filename = output/pantheon.txt
format = text
verbosity = debug
This tells the code to generate an output file called output/pantheon.txt
, in text format (the default). Our first demo in tutorial 1 didn’t produce an output chain, so it didn’t need this section.
Whichever sampler you use, CosmoSIS output files always have the same format. Comment lines are all preceded with a #, so that chains can be read easily with most tools. The first line is a header which tells you what the different columns mean:
#cosmological_parameters--omega_m supernova_params--deltam supernova_params--alpha supernova_params--beta supernova_params--m prior post
The first entries are the varied parameters from the values file. They are shown in the form: section_name--parameter_name
. After this any parameters generated by the sampler are shown. In this case that just means post
- the log-posterior of this row. Other samplers might also generate other outputs such as weights.
The next lines are metadata and show the name of the sampler, the number of varied parameters, the pipeline that was run, papers you should cite for the given pipeline, and options passed to the sampler. Finally, the three parameter files are all copied into the output file so you can check later exactly what you ran.
NB: The verbosity settings are a little bit confused in the code right now, so this option might not do that much.
All the samplers except the test sampler produce this chain file. Some produce other files too.