Bayesian Analyis with Python Chapter 2, Programming Probabilisticly
Probabilistic Programming
Probabilistic programming allows clear separation between models and inference. PP hides the details on how probabilities are performed, allowing you to focus on model specifications and analysis of results.Inference Engines
Non-Markovian Methods- Grid computing
- Quadratic approximation
- Variational methods
- Monte Carlo
- Metropolis Hastings
- Markov Chain
- Hamiltonian Monte Carlo/NUTS
Non-Markovian Methods
Can be faster than Markovian and provide a good starting point for Markovian methods.Grid Computing
May be able to compute the prior even if it can not get the whole posterior. An infinite number of points will get us the exact posterior. Does not work well for many parameters. Will end up spending the most time computing values with null contribution.Quadratic Method
Also known as LaPlace Method. This method uses a Gaussian (Normal) distribution to calculate the posterior.Variational Methods
The general idea is to approximate the posterior with a simpler distribution. The main drawback is that each model requires a different algorithm. ADVI (automatic differentiation variational inference)Markovian Methods
Monte Carlo
This method uses random sampling to compute or simulate a given processMarkov Chain
A Markov chain is a mathematical object that consists of a sequence of states and the probabilities of transitioning to those states.Detail Balance Condition if we move in a reversible way, the probability of moving from state i to state j should be the same probability of moving from j to i.
Metropolis-Hastings
There is a ton in this section. Take your time and look up all the terms included. The concept is easy.Hamilton Monte Carlo/NUTS
Hamilton Monte Carlo is a description of the total energy in a system. This system is faster since we are no longer making random moves. Think again of the lake. When we take a sample, we move a number of steps to the next lowest point until we get to the lowest point in the lake.Other MCMC methods
There are a lot of other methods we could explore. The book mentions 2, but leaves them as an exercise for the coder.Replica Exchange method (aka parallel tempering, Metropolis coupled MCMC)
PYMC3 Introduction
We are starting to get into the guts of the new module we will be using. PyMC3 is written using Theano and Numpy. We don't need to know Theano, but it might not be a bad idea to work through a few quick tutorials.Links:
Coin-flipping, the computational approach
Model Specification
We are told how closely the PyMC3 follows the mathematical notation.Pushing the inference button
The author explains quite a bit of the science behind the code. Luckily, we only need the code to make it work.Diagnosing the sampling process
Once we have a posterior we need to determine if it makes sense. We have a couple of options:- Increase samples
- Remove samples from beginning (burn-in)
- Reparametize the model
- transform the data
Convergence
One test we can run is a visual one. We build a KDE (kernel density estimation) plot. This will look like a smoothed histogram. A good chart will look like a bunch of noise. If they are different, it could indicate the need for burn-in. If there is a lack of similarity, or we see a pattern, we may need more steps, a different sampler or parametization. We can also use the Gelman-Rubin test as a quantitative way to test our model. This should work out to 1. Values > 1.1 signal a lack of convergence.Auto correlation
Ideally our sample will lack auto correlation. In practice our samples generated with MCML methods can be auto correlated but we expect the samples to drop to a low value of auto correlation.Effective Size
The idea to stress here is that give 2 samples of the same size, the one without auto correlation has more information. If we have a sample with high auto correlation, we could try to estimate the size of the sample without auto correlation. That is the effective size of our sample. One way that is suggested is to thin our sample. We only want to do this if we have to reduce storage.Summarizing the posterior
In python, we can use the plot_posterior function. It accepts a PyMC3 trace or a Numpy array and returns a histogram, with the mean and the 95% HPD.Posterior-based decisions
Even once we have this data, our decision on what to do with it is subjective. We can use it to make the most informed decision we can.ROPE
Rope (Region of Practical Equivalence) is a range we set based on our knowledge of the subject at hand. There are three scenarios that are all modeled using Venn diagrams.plot_posterior takes many options. There are two interesting ones. The ROPE is equal on top of the HPD. ref_value is the green vertical line and proportion of the posterior above and below it.
Link:
https://docs.pymc.io/api/plots.html
Comments
Post a Comment