Why Probabilistic Programming is the next big thing in Data Science

TLDR: This is an opinionated post, but based on recent trends.

What is Probabilistic Programming?

I recently wrote a course teaching this. Probabilistic Programming is a newish paradigm used in Quantitative Finance, Biology, Insurance and Sports Analytics – it allows you to build generative models to infer latent parameters and the uncertainty of those parameters. It’s been recent improvements in MCMC algorithms and Variational Inference that have allowed these techniques to tackle previously impossible-to-tackle computational problems. In short it was the invention of NUTS (No-U-Turned-Samplers) and the democratisation of automatic differentiation (for computing gradients) that allowed these techniques to be available.

What are the applications?

Bayesian Statistics or Probabilistic Programming really come into their own in situations where you have small data (or more importantly smallish data for various categories), domain knowledge and/or a need for interpretability.

As practitioners we will in an era of enhancing concerns about trust in AI, fairness in AI and in an era where there will be more regulation of models, both on the consumer side and in terms of regulation like GDPR. It’s a blackbox, won’t be sufficient.

Interestingly one of the more interesting applications of Bayesian Statistics is actually in self driving cars technology or more generally in adding uncertainty quantification to Neural Networks.

There are other applications, they include insurance (risk modelling), e-commerce (pricing modelling for anywhere you have categories and subcategories), finance, real-estate modelling and many more.

What are the limitations?

I’d say our tools are still too difficult to use. It’s still very difficult to do large scale Bayesian Analysis – especially due to the problems with VI. There’s still quite a barrier to entry for newcomers to use the tools. (I say that as an OSS contributor).

Why is this the next big thing?

I think that increasingly stakeholders will want better interpretation of models, and as Data Science and AI move into more safety centric industries such as finance, insurance, etc quantification of uncertainty will become more and more important.

Wanna learn more?

You can learn more at my course I take you from beginner level to building models in less than 4 hours, you’ll learn about Bayesian AB testing, you’ll apply Bayesian linear regression and other techniques and you’ll leave with real world examples you can apply in your day to day work – www.probabilisticprogrammingprimer.com

Leave a Reply