Bayesian sampling

Bayesian sampling of reflectometry models is a common tool in the analysis of specular reflectometry data. The Python programming language has a powerful infrastructure for this modelling, including packages such as PyMC3 and PyStan. Here, we will show how the emcee Python may be used to enable Bayesian sampling in BornAgain and the differential evoulation optimisation algorithm from the scipy.

Example script

To generate these images of the probability distributions of the parameters and the maximum likelihood reflectometry profile

run this script

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
"""
An example of using the Bayesian sampling library emcee with BornAgain.

author: Andrew McCluskey (andrew.mccluskey@ess.eu)
"""

# Import necessary modules
from os import path, getenv
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import differential_evolution
import emcee
import corner
import bornagain as ba

np.random.seed(1)

datadir = getenv('BA_DATA_DIR', '')


# The sample
def get_sample(ni_thickness, ti_thickness):
    """
    Creates a sample and returns it
    :float ni_thickness: a value of the Ni thickness in nanometres
    :float ti_thickness: a value of the Ti thickness in nanometres
    :return: the sample defined
    """

    # pure real scattering-length densities (in angstrom^-2)
    si_sld_real = 2.0704e-06  # Si (substrate)
    ni_sld_real = 9.4245e-06  # Ni
    ti_sld_real = -1.9493e-06  # Ti

    n_repetitions = 10

    # defining materials
    vacuum = ba.MaterialBySLD()
    material_ni = ba.MaterialBySLD("Ni", ni_sld_real, 0)
    material_ti = ba.MaterialBySLD("Ti", ti_sld_real, 0)
    material_substrate = ba.MaterialBySLD("SiSubstrate", si_sld_real, 0)

    # vacuum layer and substrate form multilayer
    vacuum_layer = ba.Layer(vacuum)
    ni_layer = ba.Layer(material_ni, ni_thickness)
    ti_layer = ba.Layer(material_ti, ti_thickness)
    substrate_layer = ba.Layer(material_substrate)
    sample = ba.MultiLayer()
    sample.addLayer(vacuum_layer)
    for _ in range(n_repetitions):
        sample.addLayer(ti_layer)
        sample.addLayer(ni_layer)
    sample.addLayer(substrate_layer)
    return sample


# Source the real data and add an uncertainty to the ordinate
def get_real_data():
    """
    Loading data from genx_alternating_layers.dat
    A Nx3 array (N - the number of experimental data entries)
    with first column being coordinates,
    second one being values, and the third the uncertainties.
    """
    if not hasattr(get_real_data, "data"):
        filepath = path.join(datadir, 'genx_alternating_layers.dat.gz')
        real_data = ba.readData2D(filepath).npArray()
        # translating axis values from double incident angle (degs)
        # to incident angle (radians)
        real_data[:, 0] *= np.pi/360
        # setting artificial uncertainties (uncertainty sigma equals a ten
        # percent of experimental data value)
        real_data[:, 2] = real_data[:, 1]*0.1
    return real_data


# The simulation
def get_simulation(sample, alpha):
    wavelength = 0.154  #nm
    scan = ba.AlphaScan(wavelength, alpha)
    return ba.SpecularSimulation(scan, sample)


# Run the simulation
def run_simulation(alpha, ni_thickness, ti_thickness):
    """
    Runs simulation and returns its result.
    :array q: q-values to be simulated
    :float ni_thickness: a value of the Ni thickness
    :float ti_thickness: a value of the Ti thickness
    :return: simulated reflected intensity
    """
    sample = get_sample(ni_thickness, ti_thickness)
    simulation = get_simulation(sample, alpha)

    result = simulation.simulate()
    return result.array()


# A log-likelihood function
def log_likelihood(theta, x, y, yerr):
    """
    Calculate the log-likelihood for the normal uncertainties

    :tuple theta: the variable parameters
    :array x: the abscissa data (q-values)
    :array y: the ordinate data (R-values)
    :array x: the ordinate uncertainty (dR-values)
    :return: log-likelihood
    """
    model = run_simulation(x, *theta)
    sigma2 = yerr**2 + model**2
    return -0.5*np.sum((y - model)**2/sigma2 + np.log(sigma2))


if __name__ == '__main__':
    # Using scipy.optimize.differential_evolution find the
    # maximum likelihood estimate
    nll = lambda *args: -log_likelihood(*args)
    initial = np.array([9.0, 1.0]) + 0.1*np.random.randn(2)
    soln = differential_evolution(nll, ((5.0, 9.0), (1.0, 10.0)),
                                  args=(get_real_data()[:, 0],
                                        get_real_data()[:, 1],
                                        get_real_data()[:, 2]))
    ni_thickness_ml, ti_thickness_ml = soln.x
    print('MLE Ni Thickness', ni_thickness_ml, 'nm')
    print('MLE Ti Thickness', ti_thickness_ml, 'nm')

    # Perform the likelihood sampling
    pos = soln.x + 1e-4*np.random.randn(32, 2)
    nwalkers, ndim = pos.shape

    sampler = emcee.EnsembleSampler(nwalkers,
                                    ndim,
                                    log_likelihood,
                                    args=(get_real_data()[:, 0],
                                          get_real_data()[:, 1],
                                          get_real_data()[:, 2]))
    sampler.run_mcmc(pos, 1000, progress=True)

    # Plot and show corner plot of samples
    flat_samples = sampler.get_chain(flat=True)
    corner.corner(flat_samples,
                  labels=['Ni-thickness/nm', 'Ti-thickness/nm'])
    plt.show()

    # Plot and show MLE and data of reflectivity
    plt.errorbar(get_real_data()[:, 0],
                 get_real_data()[:, 1],
                 get_real_data()[:, 2],
                 marker='.',
                 ls='')
    plt.plot(
        get_real_data()[:, 0],
        run_simulation(get_real_data()[:, 0], *flat_samples.mean(axis=0)),
        '-')
    plt.xlabel('$\\alpha$/deg')
    plt.ylabel('$R$')
    plt.yscale('log')
    plt.show()
auto/Examples/bayesian/likelihood_sampling.py

Explanation

The system under investigation in the above example is a Ni-Ti multilayer material at the interface between an Si substrate and a vacuum. There are two parameters of interest, the thicknesses of the Ni and Ti layers. We know the scattering length densities for each, and that in total there are 10 repetitions of the Ni-Ti sandwich. This sample is created in the get_sample function.

Having built the sample, it is necessary to obtain the real experimental data. For the above code to work locally, the data file genx_alternating_layers.dat.gz is required and the Python script (in particular the get_real_data function) should be adapted appropriately. This function defined an uncertainty in the reflectivity of 10 %.

The simulation is then defined in the get_simulation function, which is passed a series of angles, however, this may be modified to perform a Q-scan as necessary. The final function that is necessary is the simulation of specular reflectometry is the run_simulation function. This will take the angle-value to be simulated and thickness for the Ni and Ti layers and then return the result of the simulation as a numpy.array.

We will use the emcee package to sampling the likelihood of the data (we are follow their data fitting example). Therefore, it is necessary to define a likelihood (the log_likelihood function) objective. Then, within the main body of the scirpt we firstly find the maximum likelihood solution using a differential evolution algorithm from the scipy.optimize library. This should print that then thicknesses are around 7 nm and 3 nm for the Ni and Ti respectively.

We can use the emcee.EnsembleSampler to probe the uncertainties in the parameters and there correlation. This will perform the sampling for some time (on my machine it took about 2.5 minutes to sample 1000 steps with 32 walkers). Having collected the samples, we can then unpack them and using the corner package visualise them. This will give the first image shown above.

Finally, we can plot the maximum likelihood estimate for the model along with the experimental data. This is the second image above.

Note that the flat_samples object describes the distributions shown in the corner plot. Therefore we can find values of interest, such as the standard deviation or confident intervals.