8 Lessons I Learned as a Freshman Data Scientist

June 05th, 2018

Miguel Sanda


Starting my journey as a Data Scientist has been one of the best learning experiences of my life. Coming from the Engineering and Project Management world, I was really looking forward to it.

The following are 8 lessons I learned during my freshman year as a Data Scientist.

1. FOCUS ON THE SOLUTION, NOT THE TOOL

Yes, development is fun, but Data Science is not development. Our job is not to build frameworks nor make our code look nice. The real goal is to be able to connect the dots. Of course there are skills you must have to get there. Knowledge of NumPy, Pandas, Scikit-learn and a full installation of Anaconda are a must in the Data Scientist toolbox.

What businesses want from you is to generate new insights and business value from their data. Once you are able to connect the dots you will most likely deploy your code. Only then it becomes a development and architectural problem. And guess what? Chances are that it will be somebody else’s problem. So, don’t fall in the development trap..

2. GET THE LOW HANGING FRUIT FIRST

It's hard not the get distracted by the ever increasing applications of Data Science. The reality is that there's a lot of work and research needed before implementing predictive algorithms. In the world of the Artificial Intelligence, Deep Learning and Neural Nets is easy to loose perspective. These approaches are nice, but are also time consuming for most people. In the cases where a simple linear regression could do the trick, go for it and gain some traction. Take advantage of the data that is readily available first, then move on with bigger challenges. Walk before you run.

3. REDEFINE SUCCESS

You’ve spent countless hours doing research, you read papers, write a lot of code, do all the data cleaning. Now you push the button and you or your team determines that it is not a good approach.

Well done!, You were successful in finding out what doesn’t work... Move on!.

4. BE AWARE WHERE YOU ARE COMING FROM (AND WHO YOU TALK TO)

In it’s simplest form, you either come from the technical side or the people-skills management side. If you have domain over both mind sets, congratulations, you have huge advantage over your competition.

Regardless of what your background is, you must do what's necessary to round up your skills. Being able to close the communication gap between both worlds is the most critical skill of Data Scientists. Sometimes you will find yourself having to talk in parables to drive the point across.

If your message goes from the technical to the non-technical side, make sure you don’t use any fancy words. Try to convey the intuition behind what you are doing.

If your message goes from the non-technical to the technical side, be very specific and don\'t be shy. Technical terms will be very important to developers and solution architects.

Be the broker between both groups and you will increase your chance of success.

5. BE CONFIDENT, BUT SKEPTICAL

When experimenting with new algorithms and strategies, the “gut feeling” sometimes is on the driver seat. At the end of the day if you don’t experiment and go on discovery mode it is doubtful that you will have significant breakthroughs.

If you find yourself spending too much time and effort without progress, start questioning yourself. Transform your inner critic into your inner coach, and always have a questioning attitude.

6. THINK BUSINESS FIRST

Every algorithm and strategy must either reduce cost or increase revenues. Thus it is crucial you understand the business model under which you or your employer operates.

Crunching numbers is a good place to start to understand your data. Yet, you need to know what you are trying to do and, most importantly, why you are investing time on that approach.

7. MAKE IT SHINY

If you use Python as your main programming language chances are that you are familiar with the popular matplotlib library. It is very practical and works well for most exploratory work, but unfortunately it will not impact your audience.

Tools like Power BI and Tableau are gaining momentum, but they also have limitations. There are other options like Bokeh, Holoviews and Dash that close the gap between data science and visualization. They also have the advantage to visualize data using pure Python.

If you can master D3js it will give you the flexibility you need to impress your audience in your own way. The trade off is that you would be dealing with a different programming language and it will slow you down if you are not familiar with JavaScript. If you want to shine, integrate frameworks like Angular with D3js and the sky is the limit.

8. BE RESPONSIBLE

Recently I went to the 2018 Open Data Science Conference in Boston, where I had the opportunity to listen to Cathy O'Neil keynote. I didn't know who she was nor was familiar with her work, but to her point you need to use data responsibility.

Recent scandals such as Facebook's Cambridge Analytica are fueled by ethical issues and data misuse. As businesses embrace the era of predictive modeling, it becomes necessary to expose some confidential customer information to controlled groups. Data must be secured and used drive innovation in an ethical manner. Say no to social injustice and exploitation.