Modifying MCMC Initial Positions

by Henry Ngo (2019)

When you set up the MCMC Sampler, the initial position of your walkers are randomly determined. Specifically, they are uniformly distributed in your Prior phase space. This tutorial will show you how to change this default behaviour so that the walkers can begin at locations you specify. For instance, if you have an initial guess for the best fitting orbit and want to use MCMC to explore posterior space around this peak, you may want to start your walkers at positions centered around this peak and distributed according to an N-dimensional Gaussian distribution.

Note: This tutorial is meant to be read after reading the MCMC Introduction tutorial. If you are wondering what walkers are, you should start there!

The Driver class is the main way you might interact with orbitize! as it automatically reads your input, creates all the orbitize! objects needed to do your calculation, and defaults to some commonly used parameters or settings. However, sometimes you want to work directly with the underlying API to do more advanced tasks such as changing the MCMC walkers’ initial positions, or modifying the priors.

This tutorial walks you through how to do that.

Goals of this tutorial: - Learn to modify the MCMC Sampler object - Learn about the structure of the orbitize code base

Import modules

[1]:
import numpy as np

import orbitize
from orbitize import driver
import multiprocessing as mp

1) Create Driver object

First, let’s begin as usual and create our Driver object, as in the MCMC Introduction tutorial.

[2]:
filename = "{}/GJ504.csv".format(orbitize.DATADIR)

# system parameters
num_secondary_bodies = 1
system_mass = 1.75 # [Msol]
plx = 51.44 # [mas]
mass_err = 0.05 # [Msol]
plx_err = 0.12 # [mas]

# MCMC parameters
num_temps = 5
num_walkers = 30
num_threads = mp.cpu_count() # or a different number if you prefer


my_driver = driver.Driver(
    filename, 'MCMC', num_secondary_bodies, system_mass, plx, mass_err=mass_err, plx_err=plx_err,
    mcmc_kwargs={'num_temps': num_temps, 'num_walkers': num_walkers, 'num_threads': num_threads}
)

2) Access the Sampler object to view the walker positions

As mentioned in the introduction, the Driver class creates the objects needed for the orbit fit. At the time of this writing, it creates a Sampler object which you can access with the .sampler attribute and a System object which you can access with the .system attribute.

The Sampler object contains all of the information used by the orbit sampling algorithm (OFTI or MCMC) to fit the orbit and determine the posteriors. The System object contains information about the astrophysical system itself (stellar and companion parameters, the input data, etc.).

To see all of the attributes of the driver object, you can use dir().

[3]:
print(dir(my_driver))
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'sampler', 'system']

This returns many other functions too, but you see sampler and system at the bottom. Don’t forget that in Jupyter notebooks, you can use my_driver? to get the docstring for its class (i.e. the Driver class) and my_driver?? to get the full source code of that class. You can also get this information in the API documentation.

Now, let’s list the attributes of the my_driver.sampler attribute.

[4]:
print(dir(my_driver.sampler))
['__abstractmethods__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_abc_cache', '_abc_negative_cache', '_abc_negative_cache_version', '_abc_registry', '_fill_in_fixed_params', '_logl', 'curr_pos', 'custom_lnlike', 'fixed_params', 'lnlike', 'num_params', 'num_temps', 'num_threads', 'num_walkers', 'priors', 'results', 'run_sampler', 'system', 'use_pt']

Again, you can use the ? and ?? features as well as the API documentation to find out more. Here we see an attribute curr_pos which contains the current position of all the walkers for the MCMC sampler. These positions were generated upon initialization of the Sampler object, which happened as part of the initialization of the Driver object.

Examine my_driver.sampler.curr_pos

curr_pos is an array and has shape (n_temps, n_walkers, n_params) for the parallel-tempered MCMC sampler and shape (n_walkers, n_params) for the affine-invariant ensemble sampler.

[5]:
my_driver.sampler.curr_pos.shape # Here we are using the parallel-tempered MCMC sampler
[5]:
(5, 30, 8)

Basically, this is the same shape as the output of the Sampler. Let’s look at the start position of the first five walkers at the lowest temperature, to get a better sense of what the strucutre is like.

[6]:
print(my_driver.sampler.curr_pos[0,0:5,:])
[[2.78236056e-02 3.99587565e-01 1.27833033e+00 4.20135188e+00
  9.51196816e-01 2.04454263e-01 5.16373937e+01 1.70928701e+00]
 [3.24959814e-01 6.10149107e-01 2.33276143e+00 2.56745568e+00
  3.42534933e+00 6.63473771e-01 5.16156550e+01 1.77022496e+00]
 [5.31391104e+04 1.48788378e-01 1.45104241e+00 3.81780927e+00
  6.17769540e+00 5.58103875e-01 5.14355526e+01 1.80134045e+00]
 [8.31900263e-03 2.34844571e-01 1.24152911e+00 2.09272711e+00
  5.26905059e+00 2.15279622e-01 5.14948181e+01 1.82392081e+00]
 [3.57536276e-01 7.92536687e-01 1.09983426e+00 2.56628690e-01
  3.06233626e+00 7.80981430e-02 5.14872569e+01 1.69155342e+00]]

3) Replace curr_pos with your own initial positions for walkers

When the sampler is run with the sampler.run_sampler() method, it will start the walkers at the curr_pos values, run the MCMC forward for the given number of steps, and then update curr_pos to reflect where the walkers ended up. The next time run_sampler() is called, it does the same thing again.

Here, you have just created the sampler but have not run it yet. So, if you update curr_pos with our own custom start locations, when you run the sampler, it will begin at your custom start locations instead.

3.1) Generate your own initial positions

There are many ways to create your own walker start distribution and what you want to do will depend on your science question and prior knowledge.

If you have already generated and validated your own initial walker positions, you can skip down to the “Update sampler position”. Some users use the output of OFTI or a previous MCMC run as the initial position.

If you need to generate your own positions, read on. Here, let’s assume you know a possible best fit value and your uncertainty in that fit. Perhaps you got this through a least squares minimization. So, let’s create a distribution of walkers that are centered on the best fit value and distributed normallly with the 1-sigma in each dimension equal to the uncertainty on that best fit value.

First, let’s define the best fit value and the spread. As a reminder, the order of the parameters in the array is: semimajor axis, eccentricity, inclination, argument of periastron, position angle of nodes, epoch of periastron passage, parallax and total mass. If there are more than one body, the first six elements are repeated for the second and subsequent body right after the first body so that parallax and total mass is always the last two elements. You can check the indices with this dict in the system object.

[7]:
print(my_driver.system.param_idx)
{'sma1': 0, 'plx': 6, 'inc1': 2, 'mtot': 7, 'ecc1': 1, 'epp1': 5, 'pan1': 4, 'aop1': 3}
[8]:
# Set centre and spread of the walker distribution
# Values from Table 1 in Blunt et al. 2017, AJ, 153, 229
sma_cen = 44.48
sma_sig = 15.0
ecc_cen = 0.0151
ecc_sig = 0.175
inc_cen = 131.7
inc_sig = 16.0
aop_cen = 91.7
aop_sig = 60.0
pan_cen = 133.7
pan_sig = 50.0
epp_cen = 2228.11
epp_sig = 121.0

# Note : parallax and system mass already defined above (plx, plx_err, system_mass, mass_err)
walker_centres = np.array([sma_cen,ecc_cen,inc_cen,aop_cen,pan_cen,epp_cen,plx,system_mass])
walker_1sigmas = np.array([sma_sig,ecc_sig,inc_sig,aop_sig,pan_sig,epp_sig,plx_err,mass_err])

You can use numpy.random.standard_normal to generate normally distributed random numbers in the same shape as your walker initial positions (my_driver.sampler.curr_pos.shape). Then, multiply by walker_1sigmas to get the spread to match your desired distribution and add walker_centres to get the distribution centered on your desired values.

[9]:
curr_pos_shape = my_driver.sampler.curr_pos.shape # Get shape of walker positions

# Draw from multi-variate normal distribution to generate new walker positions
new_pos = np.random.standard_normal(curr_pos_shape)*walker_1sigmas + walker_centres

3.2) Validate your new positions

Drawing from a normal distribution can cause your walkers to start outside of your prior space. See the Modifying Priors tutorial for information on how to interact with the prior objects, which would allow you to find the limits on each parameter set by the priors etc.

Here, let’s do something more simple and just check that all values are physically valid. In this tutorial, eccentricity is the most likely problem because the distribution from Blunt et al. 2017 was very non-Gaussian but we are applying a Gaussian distribution with centre at 0.0151 but a spread of 0.175, so almost half of the generated numbers will be negative!

So, let’s keep the default eccentricity values of the walkers (originally uniformly distributed from 0 to 1). You can make a copy of the current position into the new_pos array. Another option (not shown here) would be to generate a different distribution (e.g. Poisson) for this parameter instead.

[10]:
ecc_ind = my_driver.system.param_idx['ecc1']
new_pos[:,:,ecc_ind] = np.copy(my_driver.sampler.curr_pos[:,:,ecc_ind])

Randomizing some angles

You could also just change some values in the new_pos arrays. For instance, the aop and pan angles are degenerate by 180 degrees (i.e. 30 degrees is the same as 210 degrees). If you are getting values from a previous fit, they might have already been clustered around one of the values. So, we can randomly take about half of the values and add or subtract 180 degrees.

[11]:
# Find indices of the two angles
aop_ind = my_driver.system.param_idx['aop1']
pan_ind = my_driver.system.param_idx['pan1']

# Get the shape of curr_pos without the last dimension (the list of parameters)
select_arr_shape = curr_pos_shape[:-1] # (n_temps,n_walkers)

# Draw a random number from 0 to 1, and mark index for replacement if > 0.5
replace_index = np.random.uniform(size=select_arr_shape)>0.5

# Replace aop values selected with current values plus 180 degrees
new_pos[replace_index,aop_ind] = new_pos[replace_index,aop_ind] + 180.0

# This may cause values to be larger than 360; find these and subtract 360
wrap_ind = new_pos[:,:,aop_ind] > 360
new_pos[wrap_ind] = new_pos[wrap_ind] - 360

# Repeat all of the above for the pan angle
replace_index = np.random.uniform(size=select_arr_shape)>0.5
new_pos[replace_index,pan_ind] = new_pos[replace_index,pan_ind] + 180.0
wrap_ind = new_pos[:,:,pan_ind] > 360
new_pos[wrap_ind] = new_pos[wrap_ind] - 360

Additional checks

This tutorial does not do all of the steps required to ensure all of the walkers start within the prior space. For example, there is no check for angles not to be negative nor the (unlikely in this example) case where the semimajor axis might be negative. However, the two examples above can be used to identify and replace values outside of prior space, as needed.

3.3) Update sampler position

After generating and validating your new walker positions, through whatever methods you choose, it’s now time to update the sampler object to have its curr_pos be your new positions.

[12]:
my_driver.sampler.curr_pos = np.copy(new_pos)

And you’re done! You can continue at “Running the MCMC Sampler” in the MCMC Introduction Tutorial