I have had some pretty epic failures in my career, both technical and non-technical. When I look back at those moments and cringe at my egregious mistakes (e.g. bringing down Petabyte scale databases, neglecting to check multicollinearity, saying the wrong thing to executive stakeholders), I realize that these moments are the ones that taught me the most about how to improve and do better the next time.
However, if I could, I would go back to my Master’s thesis and clean up the pages and pages of spaghetti code.
Two pieces of advice:
Include interesting projects on your resume
Have an open mind about roles
What qualifies as an ‘interesting project’ can be anything from your PhD dissertation, a work project, a personal pet project, or something in between. It gives you something to talk about during interviews and demonstrates that you can apply your technical skills to solve actual problems with data.
‘Data Scientist’ seems to be the job title with the most buzz, but there’s actually a wide range of roles where people combine skills in programming, Math/Stats, and data. You’ll see teams with names that include Analytics, Data Science, Data Engineering, Machine Learning, and Artificial Intelligence. The actual title may include Scientist, Engineer, Architect, or Analyst. Roles will vary between industries, companies, and even lines of business. A Data Scientist in InfoSec will be doing different things than a Data Scientist in HR, for example.
You also don’t have to stick to any one role, either. If you try something and discover that it’s a bad fit, build upon your current skills and look for something that fits better. I know people from many different industry and academic backgrounds successfully become Data Scientists, and I’ve also seen many transition out of Data Science into roles in Engineering and Product.
I learned firsthand very quickly that even the most accurate models are worthless if no one uses them or no one understands them. I learned that a project can be a failure even if the models are good, much to my dismay. I wish I had known earlier that it was more important for me to hone the skills of communication and project management than it was to learn any technology or algorithm.
Things I learned but did not know I needed to learn when I started:
How to properly scope a project
How to get alignment with key stakeholders
When and how to effectively communicate needs, expectations, and progress across multiple teams.
How to facilitate productive meetings
Many people (read: me) do not naturally have these skills. You know who is specifically trained to have these skills? Product Managers and Designers.
I spent many years trying to learn these skills by just doing better than I had done the last time, like a really slowly converging optimization. A few years ago, I started working closer with Product teams and I learned how much better I could be communicating and collaborating. Instead of plodding through learning on my own, I realized I could reduce it to a problem that has already been solved.
We can all agree that ‘Big Data’ is overused, right?
One of the most common situations when I’ve encountered this phrase has been in client meetings during my time as a consultant. The customer will say sheepishly, ‘We don’t have Big Data yet, but…’ and will proceed to rattle off concerns about scalability or ideas for downstream analytics or smart features they want to include in their product.
These are all valid discussion points that should be addressed, regardless of the current size of their data. If a company is growing, their data will eventually (hopefully) be “Big”. It is always better to be proactive!
On the other hand, I’ve also seen companies do the opposite; they assume they do not need to have any substantial data strategy because their current data is small and they do not see an immediate need for analytics. I think this is foolish. Eventually that data technical debt is going to catch up with them.
I love seeing how much of our day-to-day lives are touched and improved by Data Science. It’s not just advertising; healthcare, transportation, food, education and more are all seeing data-driven innovations happen every single day. It’s so mainstream that I see TV commercials talking about Artificial Intelligence on a regular basis.
I also enjoy that this has made it ever so slightly cool to be a Math nerd. If not actually cool, then at least it has made Mathematicians more hirable than it was when I graduated from school. There are now many job descriptions that have “Mathematics” listed as a desired degree, when that was definitely not the case 10-15 years ago. The other day, I drove by a billboard sign advertising a Masters in Data Analytics at the local university. Times have definitely changed!
Every project I’ve ever worked on has had an estimated timeline and a goal set from the beginning. The timelines have ranged anywhere from a few days to several months, some with reasonable goals and others with downright impossible ones. Getting everyone to agree on what is both feasible in the time frame and what counts as success upfront is absolutely critical.
Some projects follow a waterfall paradigm with fixed milestones based on timeline, and other projects that follow the agile methodology. In both cases, communicating progress between the stakeholders and the project team with a regular cadence is key to managing expectations. Stakeholders should not show up at the beginning of a project and then disappear until the polished, final results are revealed. Instead, they should have a general idea how the team is progressing at any given check-in, and what incremental change they expect to see by the next one.
Time management can be tricky for Data Scientists. Exploring data and experimenting is a lot of fun, but you can easily get stuck in that step of a project. You have to ask yourself and/or your team, ‘Do you know enough that you can meaningfully work on the step?’ If you do, then it’s likely time to move on. Staying in that step may lead to diminishing returns. Data Science is an iterative process, so you can always revisit additional ideas you had for a previous step if you have time.
What’s ‘good enough’ for a model depends on how you defined success for the project. Does it meet the accuracy expectations for whatever decisions or tools it has to help? Does it fit into the technical workflow?
We still have a long way to go. In my personal experience, the field of Data Science has felt a bit more diverse than the tech industry at large. This may be partially because industry Data Scientists come from such a wide range of academic and professional backgrounds. I’ve seen more diversity in both the speakers and audiences at conferences over time, which is encouraging.
When I attended my first big tech conference nearly 10 years ago, there were over 10,000 attendees. I remember keeping count of all the other women I had seen that week and it was barely in the 100s. The same could have been said for any underrepresented minority group if I was keeping count. A few years later, I was standing in line at the bathroom during a break at a different conference when all of us in line realized, “There are finally enough women at this conference that there’s actually a line!”
It’s tough being the odd one out. It’s tough when a job cannot accommodate for life happening. As a woman of color and a parent, I am reminded about this truth on a regular basis. However, I’m not any less valuable an employee simply because I may be different from my peers.
I am so pleased to see that diversity and inclusion has become a more common theme across the industry. It is not just grassroots efforts, but also full time positions being created specifically to lead these efforts. Two years ago, Pivotal hired its first Diversity and Inclusion leader. They now have a growing number of Employee Resource Groups, various internal training programs, commitment to hiring more diverse candidates, all which is having a positive impact there. I recently joined Salesforce and learned that they have a Chief Equality Officer, and a commitment for equality in both the workplace and the wider community.
My hope is that this trend continues. In the next 10-20 years when the kids of today eventually replace us all, maybe the tech surrounding our lives will all be built by teams that are a representative sample of the world.