Radial Velocity Tutorial for MCMC

By Roberto Tejada (2019)

This tutorial will assume the user is familiar with the Driver class and is acquainted with MCMC terminology. For more information about MCMC, see the MCMC Introduction Tutorial.

We explain how to jointly fit radial velocity data and relative astrometry using the MCMC technique. First we need a set of data containing radial velocity measurements. We check the data using read_input and observe the quant_type column for radial velocity data. For more information on orbitize.read_input.read_file(), see the Formatting Input Tutorial.

NOTE: Astrometry+RV fitting currently only works with MCMC and not OFTI. It also does not account for potential gamma/jitter differences between instruments.

Read and Format Data

[1]:
import numpy as np
import matplotlib.pyplot as plt
import orbitize
from orbitize import read_input, system, priors, sampler, driver, DATADIR
import multiprocessing as mp
import pprint

data_table = read_input.read_file("{}/HD4747.csv".format(DATADIR)) # print all columns
data_table.pprint_all()
 epoch   object  quant1  quant1_err quant2 quant2_err quant_type
-------- ------ -------- ---------- ------ ---------- ----------
 366.975      0 -0.54103    0.00123    nan        nan         rv
 418.829      0 -0.40053    0.00206    nan        nan         rv
 462.752      0 -0.24094     0.0011    nan        nan         rv
 690.104      0  0.37292    0.00117    nan        nan         rv
 784.771      0  0.46223    0.00133    nan        nan         rv
 806.727      0  0.48519    0.00103    nan        nan         rv
 837.717      0  0.49395    0.00117    nan        nan         rv
 838.701      0  0.49751    0.00112    nan        nan         rv
 839.711      0  0.50187    0.00112    nan        nan         rv
 1010.12      0  0.53355    0.00135    nan        nan         rv
1012.048      0  0.53164    0.00128    nan        nan         rv
1014.102      0  0.53629     0.0016    nan        nan         rv
1050.991      0  0.52154    0.00468    nan        nan         rv
1170.748      0  0.50757     0.0014    nan        nan         rv
1368.085      0  0.47678    0.00131    nan        nan         rv
1410.027      0  0.46147    0.00523    nan        nan         rv
1543.744      0  0.44311    0.00152    nan        nan         rv
1550.729      0  0.43286    0.00147    nan        nan         rv
1756.051      0  0.39329    0.00213    nan        nan         rv
1899.784      0  0.36457    0.00163    nan        nan         rv
2098.126      0  0.32986    0.00157    nan        nan         rv
2489.042      0  0.26687     0.0015    nan        nan         rv
2572.812      0  0.25035    0.00195    nan        nan         rv
2987.728      0  0.19466    0.00238    nan        nan         rv
2988.686      0  0.18469    0.00194    nan        nan         rv
3238.956      0  0.16892    0.00134    nan        nan         rv
3303.903      0  0.16769    0.00112    nan        nan         rv
3339.765      0  0.16069    0.00119    nan        nan         rv
3724.774      0  0.11302    0.00103    nan        nan         rv
3724.776      0  0.11605    0.00112    nan        nan         rv
4717.955      0  0.00984    0.00123    nan        nan         rv
4719.008      0  0.01242    0.00115    nan        nan         rv
 4720.01      0  0.01572    0.00123    nan        nan         rv
 4720.97      0  0.01534    0.00113    nan        nan         rv
4722.901      0  0.01479    0.00127    nan        nan         rv
 4723.97      0  0.01423    0.00122    nan        nan         rv
4724.974      0  0.01169     0.0012    nan        nan         rv
4725.883      0  0.01383    0.00113    nan        nan         rv
4727.005      0   0.0195    0.00123    nan        nan         rv
4727.952      0   0.0175    0.00113    nan        nan         rv
 5015.12      0 -0.00636    0.00141    nan        nan         rv
5016.124      0 -0.00409    0.00138    nan        nan         rv
5017.124      0 -0.00566    0.00121    nan        nan         rv
5049.025      0 -0.01975    0.00124    nan        nan         rv
5077.084      0 -0.01614    0.00128    nan        nan         rv
5078.094      0 -0.01303    0.00126    nan        nan         rv
5134.964      0 -0.01689    0.00136    nan        nan         rv
5198.754      0 -0.02885     0.0012    nan        nan         rv
5426.084      0 -0.04359    0.00125    nan        nan         rv
5522.879      0  -0.0512     0.0013    nan        nan         rv
5807.047      0 -0.07697     0.0013    nan        nan         rv
 6149.05      0 -0.10429    0.00128    nan        nan         rv
6319.701      0  -0.1102    0.00128    nan        nan         rv
6327.708      0 -0.11332    0.00149    nan        nan         rv
6508.145      0 -0.12324    0.00133    nan        nan         rv
6913.034      0 -0.17085    0.00113    nan        nan         rv
  6942.8      1    606.5        7.0 180.04       0.62      seppa
  7031.7      1    606.6        6.4 180.52       0.58      seppa
  7289.9      1    604.0        7.0  184.9        0.9      seppa

The quant_type column displays the type of data each row contains: astrometry (radec or seppa), or radial velocity (rv). For astrometry, quant1 column contains right ascension or separation, and the quant2 column contains declination or position angle. For rv data, quant1 contains radial velocity data in \(\mathrm{km/s}\), while quant2 is filled with nan to preserve the data structure. The table contains each respective error column.

We can now initialize the Driver class. MCMC samplers take time to converge to absolute maxima in parameter space, and the more parameters we introduce, the longer we expect it to take.

Create Driver Object

For joint orbital fits with RV data, we need to fit the stellar and companion masses (m0 and m1 respectively as separate free parameters). This differs from the astrometry-only case where fitting the total mass mtot suffices. We set the system keyword fit_secondary_mass to True when initializing the Driver object.

[2]:
filename = "{}/HD4747.csv".format(DATADIR)

# system parameters
num_secondary_bodies = 1
system_mass = 0.84 # [Msol]
plx = 53.18 # [mas]
mass_err = 0.04 # [Msol]
plx_err = 0.12 # [mas]

# MCMC parameters
num_temps = 5
num_walkers = 30
num_threads = mp.cpu_count() # or a different number if you prefer

my_driver = driver.Driver(
    filename, 'MCMC', num_secondary_bodies, system_mass, plx, mass_err=mass_err, plx_err=plx_err,
    system_kwargs = {'fit_secondary_mass':True, 'tau_ref_epoch':0},
    mcmc_kwargs={'num_temps': num_temps, 'num_walkers': num_walkers, 'num_threads': num_threads}
)

Since MCMC is an object in orbitize!, we can assign a variable to the sampler and work with this variable:

[3]:
m = my_driver.sampler

RV Priors

The priors for the two RV parameters, the radial velocity offset (gamma), and jitter (sigma), have default uniform prior and log uniform prior respectively. The gamma uniform prior is set between \((-5,5)\) \(\mathrm{km/s}\), and the jitter log uniform prior is set for (\(10^{-4},0.05\)) \(\mathrm{km/s}\). The prior for m1 is a log uniform prior and is set for (\(10^{-3},2.0\))\(M_\odot\). The current version of orbitize addressed in this tutorial returns the stellar radial velocity only.

NOTE: We may change the priors as instructed in the Modifying Priors tutorial:

[4]:
# getting the system object:
sys = my_driver.system

lab = sys.param_idx

print(sys.labels)
print(sys.sys_priors)

print(vars(sys.sys_priors[lab['m1']]))
['sma1', 'ecc1', 'inc1', 'aop1', 'pan1', 'tau1', 'plx', 'gamma', 'sigma', 'm1', 'm0']
[Log Uniform, Uniform, Sine, Uniform, Uniform, Uniform, Gaussian, Uniform, Log Uniform, Log Uniform, Gaussian]
{'minval': 0.001, 'maxval': 2, 'logmin': -6.907755278982137, 'logmax': 0.6931471805599453}
[5]:
# change the m1 prior:
sys.sys_priors[lab['m1']] = priors.LogUniformPrior(1e-4, 0.5)

print(sys.labels)
print(sys.sys_priors)
print(vars(sys.sys_priors[lab['m1']]))
['sma1', 'ecc1', 'inc1', 'aop1', 'pan1', 'tau1', 'plx', 'gamma', 'sigma', 'm1', 'm0']
[Log Uniform, Uniform, Sine, Uniform, Uniform, Uniform, Gaussian, Uniform, Log Uniform, Log Uniform, Gaussian]
{'minval': 0.0001, 'maxval': 0.5, 'logmin': -9.210340371976182, 'logmax': -0.6931471805599453}

Running the MCMC Sampler

As noted in the MCMC Introduction Tutorial, we must choose the sampler step for MCMC and can save every \(nth\) sample to avoid using too much disk space using thin.

[6]:
total_orbits = 1000 # number of steps x number of walkers (at lowest temperature)
burn_steps = 10 # steps to burn in per walker
thin = 2 # only save every 2 steps
[7]:
m.run_sampler(total_orbits, burn_steps=burn_steps, thin=thin)
/Users/Helios/orbitize/orbitize/priors.py:168: RuntimeWarning: invalid value encountered in log
  lnprob = -np.log((element_array*normalizer))
/Users/Helios/orbitize/orbitize/priors.py:277: RuntimeWarning: invalid value encountered in log
  lnprob = np.log(np.sin(element_array)/normalization)
/Users/Helios/orbitize/orbitize/priors.py:168: RuntimeWarning: invalid value encountered in log
  lnprob = -np.log((element_array*normalizer))
/Users/Helios/orbitize/orbitize/priors.py:277: RuntimeWarning: invalid value encountered in log
  lnprob = np.log(np.sin(element_array)/normalization)
Burn in complete
30/34 steps completed
Run complete
[7]:
<ptemcee.sampler.Sampler at 0x108a19f50>

Now we can plot the distribution of MCMC parameter of interest:

[8]:
accepted_m1 = m.results.post[:, lab['m1']]
plt.hist(accepted_m1,histtype='step')
plt.xlabel('m1'); plt.ylabel('number of orbits')
plt.show()
../_images/tutorials_RV_MCMC_Tutorial_15_0.png

Saving Results over Extended MCMC Run

Sometimes our MCMC run will need to run for an extended period of time to let the walkers converge. To observe the convergence, we often need to see the walkers’ progress along parameter space. We can save the sampler results periodically and keep running the sampler until convergence. To run for a greater number of steps and periodically save the results, we can create a for-loop and run for as many iterations as we’d like.

[9]:
filename = "{}/HD4747.csv".format(DATADIR)

total_orbits = 1000 # number of steps x number of walkers (at lowest temperature)
burn_steps = 10 # steps to burn in per walker
thin = 2 # only save every 10th step

# system parameters
num_secondary_bodies = 1
system_mass = 0.84 # [Msol]
plx = 53.18 # [mas]
mass_err = 0.04 # [Msol]
plx_err = 0.12 # [mas]

# MCMC parameters
num_temps = 5
num_walkers = 30
num_threads = mp.cpu_count() # or a different number if you prefer


my_driver = driver.Driver(
    filename, 'MCMC', num_secondary_bodies, system_mass, plx, mass_err=mass_err, plx_err=plx_err,
    system_kwargs = {'fit_secondary_mass':True, 'tau_ref_epoch':0},
    mcmc_kwargs={'num_temps': num_temps, 'num_walkers': num_walkers, 'num_threads': num_threads}
)

m = my_driver.sampler

We’re now ready for the loop! The results object contains a save_results function which lets us save the results for our directory, and we will use the load_results object from results to access the data later. We also define the n_iter below to mark how many MCMC runs to save our within results.

NOTE: To avoid long convergence periods, we may initialize the walkers in a sphere around the global minima of the parameter space as outlined in our Modifying MCMC Initial Positions Tutorial.

[10]:
# file name to save as:
hdf5_filename = 'my_rv_posterior_%1d.hdf5'
[11]:
n_iter = 2 # number of iterations
for i in range(n_iter):
    # running the sampler:
    orbits = m.run_sampler(total_orbits, burn_steps=burn_steps, thin=thin)
    myResults = m.results
    hdf5_filename = 'my_rv_posterior_%1d.hdf5' % i
    myResults.save_results(hdf5_filename) # saves results object as an hdf5 file
/Users/Helios/orbitize/orbitize/priors.py:168: RuntimeWarning: invalid value encountered in log
  lnprob = -np.log((element_array*normalizer))
/Users/Helios/orbitize/orbitize/priors.py:168: RuntimeWarning: invalid value encountered in log
  lnprob = -np.log((element_array*normalizer))
/Users/Helios/orbitize/orbitize/priors.py:277: RuntimeWarning: invalid value encountered in log
  lnprob = np.log(np.sin(element_array)/normalization)
/Users/Helios/orbitize/orbitize/priors.py:277: RuntimeWarning: invalid value encountered in log
  lnprob = np.log(np.sin(element_array)/normalization)
Burn in complete
30/34 steps completed
Run complete
/Users/Helios/orbitize/orbitize/priors.py:168: RuntimeWarning: invalid value encountered in log
  lnprob = -np.log((element_array*normalizer))
/Users/Helios/orbitize/orbitize/priors.py:168: RuntimeWarning: invalid value encountered in log
  lnprob = -np.log((element_array*normalizer))
/Users/Helios/orbitize/orbitize/priors.py:277: RuntimeWarning: invalid value encountered in log
  lnprob = np.log(np.sin(element_array)/normalization)
/Users/Helios/orbitize/orbitize/priors.py:277: RuntimeWarning: invalid value encountered in log
  lnprob = np.log(np.sin(element_array)/normalization)
Burn in complete
30/34 steps completed
Run complete

Plotting and Accesing Saved Results

We can plot the corner plot saved in the results object by following the steps in the Advanced Plotting Tutorial:

[12]:
median_values = np.median(myResults.post,axis=0) # Compute median of each parameter
range_values = np.ones_like(median_values)*0.95 # Plot only 95% range for each parameter
corner_figure_median_95 = myResults.plot_corner(
    range=range_values,
    truths=median_values
)
WARNING:root:Too few points to create valid contours
WARNING:root:Too few points to create valid contours
../_images/tutorials_RV_MCMC_Tutorial_23_1.png

As illustrated in the plot above, MCMC needs more time to run. We only performed two iterations in the loop to demonstrate its useage, but with increased n_iter, the trendplots saved in the loop and the corner plot will show how the walkers converge to absolute extrema in parameter space.

To access the saved data, we can read it into a results object as shown in the MCMC Introduction Tutorial:

[13]:
from orbitize import results

loaded_results = results.Results() # Create blank results object for loading
loaded_results.load_results('my_rv_posterior_%1d.hdf5' % (n_iter-1))

To demonstrate use of the loaded results file above, we can use the saved results to plot our orbital plots:

[14]:
epochs = my_driver.system.data_table['epoch']
orbit_plot_fig = loaded_results.plot_orbits(
    object_to_plot = 1, # Plot orbits for the first (and only, in this case) companion
    num_orbits_to_plot= 50, # Will plot 50 randomly selected orbits of this companion
    start_mjd=epochs[0], # Minimum MJD for colorbar (here we choose first data epoch)
)
<Figure size 1008x432 with 0 Axes>
../_images/tutorials_RV_MCMC_Tutorial_28_1.png

We can pass the rv_time_series = True argument in plot_orbits to display the RV time series plot as a third panel of plot_orbits:

[15]:
epochs = my_driver.system.data_table['epoch']
orbit_plot_fig = loaded_results.plot_orbits(
    object_to_plot = 1, # Plot orbits for the first (and only, in this case) companion
    num_orbits_to_plot= 50, # Will plot 50 randomly selected orbits of this companion
    start_mjd=epochs[0], # Minimum MJD for colorbar (here we choose first data epoch)
    show_colorbar = True,
    rv_time_series = True
)
orbit_plot_fig.savefig('HD4747_rvtimeseries_panelplot.pdf')
<Figure size 1008x432 with 0 Axes>
../_images/tutorials_RV_MCMC_Tutorial_30_1.png
[ ]: