It gives me great pleasure to interview Vicki Boykis – we’ve chatted a lot on Twitter over the past few years and her blog/ side projects have been inspiring for my own.
Vicki is a Data Scientist and Engineer who tweets awesome stuff. She’s well worth following.
Her twitter bio – Born: Jewish in Russia. Raised: guilty in America. My days: Building data products using Python, ML, and magic. Privacy. Open internet. #devart | @sovietartbot
Here’s the questions.
1. What project have you worked on do you wish you could go back to, and do better?
This is every project for me! If I come to a point where I do similar work two times in a row, and I look back, and I haven’t learned anything or done the work in a different way, it means I’m not learning and progressing as a data scientist. So far, every project I’ve done, I go back and find at least one thing that I could have done differently, either because I know a new methodology, or a new technology. For example, my first project didn’t have continuous integration. So then the next one has CI. Or I did a project using Markov chains, but now we can model similar behavior with neural nets, etc. It’s always an evolution.
2. What advice do you have to younger analytics professionals and in particular PhD students in the Sciences?
I didn’t take a path through a PhD or even Master’s program in statistics, so I don’t have any advice for those transitioning out of an academic path, although there are lots of great posts out there. My main advice for people who are junior in their careers is to find someone to mentor them. This is usually not a formal relationship. Like, you don’t go up to someone and say, “will you mentor me,”, but it’s very much just a byproduct of working with more senior people who will give you their time and help you. If you can’t find that, read a lot about the field you’re interested in. Being intellectually curious but more junior is almost always more advantageous over having a ton of degrees but not asking questions and collaborating with people. Oh, and don’t use Excel 🙂
3. What do you wish you knew earlier about being a data scientist? That it’s a lot of gruntwork. You’re going to spent maybe 10-20% of your time doing the stuff everyone talks about: picking models, doing machine learning, etc. Most of the time it’s, how do I get this matrix of data from one place to another. You have to decide if figuring out why stuff doesn’t work, all the time, in different ways, is something you’re interested in.
4. How do you respond when you hear the phrase ‘big data’?
With terror, because it means hashtags about deep learning and AI winter will soon follow.
5. What is the most exciting thing about your field?
If you play your cards right, you can impact the way a company operates. That’s always super interesting to me.
6. How do you go about framing a data problem – in particular, how do you avoid spending too long, how do you manage expectations etc. How do you know what is good enough?
My 9th grade English teacher made us write stories using vocabulary words we used that week, and he said that constraints makes us better writers because we’d have to be more creative. I like that agile enforces order in that you have 2 weeks to get x or y done. If it’s not done by the end of two weeks, we either refine or move on. I don’t like to spend too long once we figure out the general thrust of the problem because otherwise you just get mired in trying to model something that’s often too complex to model. I like to generate an idea or model and continue to iterate on it. There was a good podcast on this recently on Linear digressions.
7. How do you feel we’re doing in terms of diversity and inclusion in data science?
There was a really good quote about diversity, and I’m sorry to say I don’t remember who said it, but the quote was something along the lines of, if you have to talk about how a scientist is a woman, we don’t have diversity yet. Diversity means the ability to do your job as a woman or minority and not have it seem extraordinary or have articles talk about the fact that you’re a woman and not a scientist. We’re still a ways away from that, but I think a lot of factors, particularly the Python community, which is my home base, and which I love, does a great job at pushing us forward, closer to that goal.