Links and retrospective from PyData Berlin

I attended PyData Berlin last weekend. It was a blast – well done to the organizers.

Some interesting remarks and links on PyData conference in Berlin.

  • Luigi was presented as a technological solution to the problem of data pipelines by Miguel Cabrera. I found this to be an interesting example of the usage of this technology for dealing with large amounts of data and various jobs for scraping data.
  • A keynote by Matthew Rocklin of Continuum Analytics was given. Matthew is an exceptionally smart computational engineer and he explained the architecture of the out-of-core data structures he was developing with Dask. Even if you’re not likely to use the technology it is a very interesting one.
    One key idea he had was the gigabyte level, terabyte level and petabyte level.
    He pointed out that hadoop and spark where probably only needed at the petabyte level – and that otherwise you just need a good workstation. I agree with this, and afterwards we spoke about this, and he said ‘We should still be using PostgreSQL for a lot of things with good indexing. I think the rise of SSD is very important too – so often you don’t have a big data problem you have just need a bigger computer or workstation.
    I checked the price online of an AWS instance – 240 GB of RAM is $2.80 per hour
  • Ascribe – protecting IP by using the Block Chain. Trent was a very interesting and engaging as a speaker.
  • What is data science – panel discussion – I chaired the panel at this event. This was fun but quite nerve-wracking 🙂
  • Overview from Felix Wick
  • FinTech discussions and risk analysis – these happened during coffee and beer and especially after the talk about CostCla by Alejandro Correra Bahnsen
  • Agriculture and Mittelstand – the opportunity for data science to be applied in industries outside of e-commerce and social network analysis.
  • Need for educated selling to management – ‘some of management are still not sold’. This was mentioned in the panel
  • Challenges credibility wise of ‘just analyzing data’ – panel
  • The need for good project management – some spoke of their failures with algorithm teams without good business direction. The need to manage expectations by sharing results. This reminds me of Ian Ozsvalds talk in Stockholm – when he shares his years of experience and reminds young data scientists to share results.
  • Bokeh – interesting technology can’t wait to check it out. I found this tutorial a bit long, but it is really hard to give a tutorial to a massive room of attendees. So well done Christine 🙂
  • Python for growth hacking – Ignacio Elola
  • Alejandro Correra Bahnsen – Cost-sensitive machine learning – I found this a very interesting talk, Alejandro is a good friend of mine – but I think it covers one of the challenges of converting results from Machine learning into actionable financial numbers. Alejandro is a good friend of mine and he has done a great job running the meetup in Luxembourg. I will miss him when he returns to Colombia.
  • Robert Obst of Pivotal gave an intriguing demo of the ‘connected car’ and I have no doubt that the ‘Internet of things’ will become a bigger and bigger thing for data analysts and for data scientists. It was interesting that he mentioned that there is a lack of interoperability from different standards in this area.
  • I gave a talk on Python used as a framework for Rugby Analysis. This got a lot of interesting questions afterwards about Probabilistic Programming and Rugby Analytics. Thanks to Matthew Rocklin for an interesting discussion of Computational problems and how cool Theano is 🙂
  • Those interested and attending the London PyData event, I’ll be giving a tutorial on PyMC, some PyMC3 and applications to Financial data and Rugby analysis in a few weeks. I’ll also discuss the differences between them and why you should use PyMC3.

People often wonder why I goto conferences. But the collection of ideas and techniques discussed above are things I’d never come across myself. Not to mention the fascinating conversations with other members of the Data Engineering, Data Analysis and Software Engineering communities.

Leave a Reply

Your email address will not be published.