Tutorial 2: Running an MCMC sampler

In the first tutorial we generated a single likelihood. In this tutorial we we run an MCMC analysis to explore a parameter space and put constraints on some parameters. There are lots of different MCMC algorithms available through CosmoSIS; in this example we will use one called emcee, which is popular in astronomy.

This example uses a supernova likelihood, of the Pantheon supernova sample, which measures the cosmic redshift-distance relation.

Running an MCMC

Have a look at examples/pantheon.ini and its values file examples/pantheon_values.ini.

Let’s try using MPI parallelism to speed up this analysis. Run this command:

mpirun -n 4 cosmosis --mpi examples/pantheon.ini

If that fails straight away then you may not have MPI installed (MPI should work automatically with the conda installation method - let us know if not). If it fails then you can fall back to serial mode:

cosmosis examples/pantheon.ini

The code will take a few minutes to run, and will generate a file called output/pantheon.txt as output. This file will contain a Monte Carlo Markov Chain (MCMC) which you can use as samples from the posterior probability distribution of the model given the data.

The Parameter File

This time the parameter file, examples/pantheon.ini contains these lines (plus some comments):

[runtime]
sampler = emcee

[emcee]
walkers = 32
samples = 300
nsteps = 10

This tells CosmoSIS to use the emcee sampler, and configures it to use 32 walkers (points exploring the parameter space). It tells it to generate 300 samples per walker, and to save results to disc every 10 steps.

The output file

The parameter file also contains these lines:

[output]
filename = output/pantheon.txt
format = text
verbosity = debug

This tells the code to generate an output file called output/pantheon.txt, in text format (the default). Our first demo in tutorial 1 didn’t produce an output chain, so it didn’t need this section.

Whichever sampler you use, CosmoSIS output files always have the same format. Comment lines are all preceded with a #, so that chains can be read easily with most tools. The first line is a header which tells you what the different columns mean:

#cosmological_parameters--omega_m   supernova_params--deltam    supernova_params--alpha supernova_params--beta  supernova_params--m   prior  post

The first entries are the varied parameters from the values file. They are shown in the form: section_name--parameter_name. After this any parameters generated by the sampler are shown. In this case that just means post - the log-posterior of this row. Other samplers might also generate other outputs such as weights.

The next lines are metadata and show the name of the sampler, the number of varied parameters, the pipeline that was run, papers you should cite for the given pipeline, and options passed to the sampler. Finally, the three parameter files are all copied into the output file so you can check later exactly what you ran.

NB: The verbosity settings are a little bit confused in the code right now, so this option might not do that much.

All the samplers except the test sampler produce this chain file. Some produce other files too.