Why Probabilistic Programming is the next big thing in Data Science

TLDR: This is an opinionated post, but based on recent trends.

What is Probabilistic Programming?

I recently wrote a course teaching this. Probabilistic Programming is a newish paradigm used in Quantitative Finance, Biology, Insurance and Sports Analytics – it allows you to build generative models to infer latent parameters and the uncertainty of those parameters. It’s been recent improvements in MCMC algorithms and Variational Inference that have allowed these techniques to tackle previously impossible-to-tackle computational problems. In short it was the invention of NUTS (No-U-Turned-Samplers) and the democratisation of automatic differentiation (for computing gradients) that allowed these techniques to be available.

What are the applications?

Bayesian Statistics or Probabilistic Programming really come into their own in situations where you have small data (or more importantly smallish data for various categories), domain knowledge and/or a need for interpretability.

As practitioners we will in an era of enhancing concerns about trust in AI, fairness in AI and in an era where there will be more regulation of models, both on the consumer side and in terms of regulation like GDPR. It’s a blackbox, won’t be sufficient.

Interestingly one of the more interesting applications of Bayesian Statistics is actually in self driving cars technology or more generally in adding uncertainty quantification to Neural Networks.

There are other applications, they include insurance (risk modelling), e-commerce (pricing modelling for anywhere you have categories and subcategories), finance, real-estate modelling and many more.

What are the limitations?

I’d say our tools are still too difficult to use. It’s still very difficult to do large scale Bayesian Analysis – especially due to the problems with VI. There’s still quite a barrier to entry for newcomers to use the tools. (I say that as an OSS contributor).

Why is this the next big thing?

I think that increasingly stakeholders will want better interpretation of models, and as Data Science and AI move into more safety centric industries such as finance, insurance, etc quantification of uncertainty will become more and more important.

Wanna learn more?

You can learn more at my course – www.probabilisticprogrammingprimer.com

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s