Bayesian Analysis the good parts

One of the questions I’m often asked is what’s so powerful about Bayesian analysis? I speak regularly to analysts, who’ve heard of some of the powerful aspects of it, but haven’t heard enough to emotionally invest time in learning it. I’ve thought about this on and off for a few years now, and I came across an excellent collection of tweets by Sean J. Taylor. Sean is a Manager of the Core Statistics team at Facebook and works a lot on Bayesian Methods.

I’ve basically taken his tweets, and put them into this blog post with some extra commentary. I consider this an opinionated collection of ideas on why learning Bayesian methods is so powerful. I’ve not seen all of these written down in a blog post before, and wanted to put it together for posterity.

Good Part 1. Explicitly modelling your data generating process

Bayesian analysis means writing down complete models that could have plausibly generated your data. This forces you to think carefully about your assumptions, which are often implicit in other methods.

I think this is an under appreciated part of Bayesian analysis, often in classical machine learning methods, we assume the data is normally distributed implicitly say – without making it explicit in our model. Some people say things like ‘Bayesian Analysis is hard because you need to think about the data generation process’ but this is more honest, and produces a better model than just ignoring the problem.  Better models allow you to inform or make better decisions, and that marginal gain could be worth a lot of money for your company.

Good Part 2: No need to derive estimators

There are a increasingly full-featured and high-quality tools that allow you to fit almost any model you can write down. Being able to treat model fitting as an abstraction is great for analytical productivity.

Recall that an “estimator” or “point estimate” is a statistic (that is, a function of the data) that is used to infer the value of an unknown parameter in a statistical model. In Bayesian analysis thanks to tools such as MCMC you can fit almost any model you can write down. For example you can fit out of the box models such as Exponential, Mixture models, Poisson type models and the Student-T distribution in toolkits such as PyMC3 and Stan.

Good Part 3: Estimating a distribution

Bayesian analyses produce distributions as estimates rather than specific statistics about distributions. That means you deeply understand uncertainty and get a full-featured input into any downstream decision/calculation you need to make.

If you take something like Figure 1, here we have the difference of means (which is a loss function of sorts) and we can see our distribution of that difference of means. We can see more information – such as how the distribution is shaped (almost normal but not quite), the spread of the distribution too. We don’t get this in traditional frequentist or machine learning methods.

Good Part 4: Borrowing strength / sharing information

A common feature of Bayesian analysis is leveraging multiple sources of data (from different groups, times, or geographies) to share related parameters through a prior. This can help enormously with precision.

For example I share information across teams in the Rugby Analytics example. This is a super powerful technique in industry, for example you may have geographies like country, or you may have different kinds of risk categories for loans. There’s many examples of this in e-commerce, fintech and insurance. To me this is the secret sauce of Bayesian Analysis

Good Part 5: Model checking as a core activity

Good Bayesian analyses consider a wide range of models that vary in assumptions and flexibility in order to see how they affect substantive results. There are principled, practical procedures for doing this.

For example there are many methods included in Arviz and other toolkits to evaluate your model. You can critique your model and improve it.

You can either compare your model against the observed data, or look at the various model statistics such as BFMI.

Good Part 6: Interpretability of posteriors and adding loss functions to the posteriors

What a posterior means makes more intuitive sense to people than most statistical tests. Validity of posterior rests on underlying assumption about correctness of model, which is not hard to to reason about. What’s more we can apply loss functions to our posteriors, and get an even more interpretable posterior.

As we saw already with this example, we can see what our posterior means, and even apply loss functions to it.  A great example of loss functions is in the excellent Bayesian Methods for Hackers material.