You have a problem that you think might need some Bayesian modelling
A common question I’m asked is how do you start?
In this tutorial I take you from a fresh data set, the data set is an educational dataset. I don’t know anything about the data, and I have no specific domain knowledge. I adapt the model from the PyMC3 documentation.
I then evaluate the model using tools such as Arviz, to explain and evaluate your modelling decisions. I use this to inform a better model and we see from our evaluation of the model that the second model is much better. You can view the Binder link here on Github – in the census_data
notebook.
Our first step is to build a model. We describe it in the screenshot above.
We do the visualisation stage.
We see our first model is poor, look at the plot_ppc
plot our model isn’t fitting the observed data at all. This is poor.
How can we improve the model?
Well we can use our knowledge from the modelling process to realise that Beta distribution is a poor distribution and too tight, we can also see that the various model metrics are poor. (This is all in the notebooks). So let’s change from the Beta distribution to the Gamma distribution and see what happens.
We see the second model is much better. See how the red posterior predictive plot fits the black observed data much better. This is a much better model.
What’s the key takeaways?
It’s a good idea to start with a model from an example, and then see how that performs in your modelling use case. It’s the refining and criticising your model that’s the key part. You can see in the above how I use tools like Sample Posterior Predictive to criticise and improve the model.
It’s good to internalise this image – the Box loop, that’s the workflow you need in building Bayesian models.
The key thing is that you incorporate into the model what you learn about the domain you’re trying to model. You see this above when I move from a Beta distribution to a Gamma distribution. This is different to some Machine Learning workflows you may be used to.
Want to learn more?
If that wet your appetite and you may want to learn more. You can sign up for other free content and register your interest in my course here, or if you’re convinced already you can purchase it here on Probabilistic Programming Primer. In my course I give nearly 4 hours of screencasts to explain the concepts of Bayesian Modelling. I cover examples such as ‘Are Self Driving cars safe’, I give intros to a range of new probabilistic programming tools, and I also give exclusive screencasts on Arviz. If you can follow this blog post and understand it, then this course is for you.
- Data Science for Decision Support: Or why Bayesian Analysis matters
- Introduction to Probabilistic Programming Primer
- State of PPL: How are Bayesian methods used in industry?
- Think you need to learn Bayesian Analysis? Read this first
- 3 reasons to learn Bayesian Statistics in the new year
- Applications of Bayesian Statistics: Supply Chain
- New Screencast: How do I build a Logistic Regression model the Bayesian way?
- What is BFMI (Bayesian Fraction of Missing Information)?
- Why would I ever NEED Bayesian Statistics?
- How do I visualise the results of a Bayesian Model: Rugby models in Arviz