PyData Amsterdam

I recently attended and keynoted at PyData Amsterdam 2016.

(Clockwise from top right – ‘The Sunset when the event was closing’, ‘Peadar Coyle giving a keynote at PyDataAmsterdam’, ‘Video interviews with Holden Karau a Spark expert from IBM’, ‘The organizing committee’, ‘Maciej Kula of Lyst talking about Recommendation Engines’.)

Firstly this was a wonderful conference, the location (a boat), the food, and the quality of speakers and discussion was excellent. The energy of the organizers – most of them from GoDataDriven (a boutique data science/ engineering consultancy in Amsterdam) was great, and there was a good mixture of advanced, intermediate and basic tutorials and talks.

Some highlights – Andreas Mueller one of the core contributors to Scikit Learn gave a great advanced tutorial, he talked about neural networks, the out of core functionality, grid search and Bayesian Hyperparameter optimization. Like any advanced tutorial it’s hard to know you’re audience but I know I’ll be looking at his notebooks again and again.


(Sean Owen of Cloudera giving the opening keynote on Data Engineering and Genomics)pydata_bbq(

(The BBQ was awesome on Saturday, we had a competition to consume Beer and Burgers – which Giovanni won 🙂 )


(Andreas Mueller of NYU and a core-contributor of Scikit Learn gave a great Advanced Tutorial, the room was so packed it moved the boat!)


(Sergii Khomenko of Stylight gave a talk on Data Science going into Production)

Friso van Vollenhoven the CTO of GoDataDriven gave a nice comparison of meetup communities, this was largely an introductory talk but there were some nice ideas in there, like how to use Neo4j, some variants of matplotlib and using Word2vec via the excellent Gensim library.

James Powell of the NumFOCUS core members gave an entertaining series of hacks about Python 3 and python 2.7 – it’s worth watching just because his hackery and subversion are remarkable. This was slightly different than some of the other data focused talks.

The first keynote was by Sean Owen from Cloudera and this was largely focused on genomics and the data challenges that are out there – and the challenges of growing the data engineering toolkits to keep up with such data.

We had explanations of Julia, NLP, Spark Streaming, PySpark, Search relevance, Bayesian methods, Out-of-core computation, Pandas, the use of python in modelling Oil/Gas, Pokemon Recommendation engines, deploying machine learning models, financial mathematics (network theory applied to Finance), Search Quality analysis, etc and sadly I feel during the conference that I didn’t digest everything correctly. Thankfully the videos and notebooks/ slides will go up soon.

I liked Lucas Bernardi (of discussion of little tips and tricks of how to accelerate certain Machine learning libraries.

My Keynote – I felt very nervous before this – but the feedback was positive and over 100 people attended my Sunday morning 9.00 am discussion of the ‘Map of the PyData stack’ I talked about some of the projects I’m most excited about and gave case studies and/or code. I mentioned Blaze, Dask, Xarray, Bcolz and Ibis. The notebooks are available online and the conversation afterwards was very interesting. One of the most exciting things about using python for your own professional work – is that the ecosystem is getting more and more improved. I reminded the audience of a theme that came up in beers with various open source contributors. Open Source needs support, bug fixes, documentation and it rarely happens for free.

A highlight for me – was Maceij Kula of Lyst a UK based fashion startup gave a thorough introduction to his work on hybrid recommendation engines. A lot of the audience was very excited about this, since recommendation engines are a common aim for
data science teams. He spoke of the mathematics, the learning-to-rank, the speed improvements, why to use it, the advantages
such as topic extraction, the comparison to word2vec. And he’s a very engaging speaker, and some of the insights he shared
were fascinating. Such as how they use postgres for deploying their models. I’ll soon be working on Recommendation engines
in my next job, so I’ll be carefully reading and reading his notes/ code.

The videos will go up soon.

I felt the discussion was excellent, the food and beers (there were lots of beers), it was great to chat with some of the luminaries and core SciPy contributors. I was especially happy to speak to some of the non-technical specialists who attended for some of the conference – it is a reminder that data science teams need Marketing, Sales, Recruitment and other functions to help them achieve success. And it was great to see 300 python and data enthusiasts discuss their real world challenges, how they conquer them. The sponsors who included Optiver, GoDataDriven, ING, Continuum, and Dato are also a great view of how the Amsterdam data scene is. As far as I am aware they are all hiring data scientists and engineers so I hope someone who attended the environment found interesting job opportunities from it.


(Lucas Bernardi of sharing some pragmatic advice for data scientists)


(Lunch was excellent)

It is great that we have a community that shares such case studies, and best practices. And a community that allows young people like myself to give keynotes in front of such demanding audiences. It is a very exciting time to be doing data science – and I don’t think any other career is more exciting. We hear a lot of hype about ‘big data’ and ‘machine learning’, conferences like this where people share their success stories are great, I’m glad there is so much innovation going on in European Data Science!

I look forward to my next PyData – check out to see where the next one in your own geographical area is.

Leave a Reply

Your email address will not be published.