Becoming a Data Scientist —a guide

Erick Medeiros Anastácio
5 min readJul 8, 2019

Every now and then some friend of mine asks for tips on how to become a data scientist, so I decided to write this guide hoping that it will be helpful somehow.

So, without further ado, lets go over this step-by-step guide:

Well, the first step is: you have to study, a lot.

Second step: study more.

Third step: now that you are starting to grasp what data science is all about, you probably already know that you have to study a lot more. As a matter of fact, you will never stop studying anymore. It is important to understand that and to learn to enjoy it.

You will have to learn technologies like programing languages (Python or R), SQL, the cloud, dashboards, git, etc… so you will be able to get and manipulate data, apply statistics techniques and produce data science products from it.

Also, whether you choose R or Python as a programming language, you will probably end up learning both. Python is awesome, but only R has the Rmarkdown. This technology allows you to build reproducible reports on top of your analysis code. This means that once your analysis is finished, you will also have your beautiful report done. Rmarkdown can output to pdf, word, PowerPoint, websites, etc… it is a wonderful tool. My own online portfolio was produced using Rmarkdown. Of course Python has the jupyter notebooks, but it is not the same thing. Even then, despite Rmarkdown, Python is my absolute passion.

Update on 2020–06–03: Python has Sphinx and it is even better than Rmarkdown — ha! https://www.sphinx-doc.org/en/master/

This is my online portfolio, in case you are wondering what it looks like:

Ok, enough talking (writing). Lets get to the point: where and what to study in order to become a data scientist?

Please keep in mind that this reflects only my personal experience.

First point: you can become a data scientist through online classes. You just need to pick up the good ones. Choosing from a accredited institution is a good start.

Also, having a background on statistics is important. That means being graduated in statistics, math or related areas. That would be the hardest part. That said, some positions will require you having a masters degree or even a PhD.

Bellow you will find some Coursera classes that helped me getting started on my career as a data scientist. Through Coursera I had classes from Johns Hopkins University, Duke University and University of Michigan. Keep in mind Coursera is not free, but you can almost always ask for a scholarship. Also, most classes allows you to participate as a free listener.

Finally, the list of classes I believe that would be a good start, in chronological order:

Phase A: R and statistics

1 — R programming

2 — Getting and cleaning Data

3 — Exploratory data analysis

4 — Reproducible Research

5 — Statistical Inference

6 — Regression models

7 — Developing data products

Phase B: databases and visualization

8 — Managing Big Data with MySQL

9 — Data visualization with Tableau

Phase C: Python

10 — Data Science with Python

11- Visualization with Python

12 — Machine Learning

13 — NLP

Well, that would be a good start.

Next in the list would be learning about:

  • Time series analysis
  • Deep Learning, neural networks
  • Python Flask
  • cloud computing technologies, like Amazon Web Services, MS Azure an Google Cloud.
  • Big Data: the Hadoop ecosystem, Spark

The key here is to take your time to enjoy the studying. You shouldn’t rush to get to the end.

Good luck and have fun!

Update on 2020–06–03: Udemy path

Udemy has a lot of awesome classes on almost anything. Just name it and there is a class at udemy for that. I’ve purchased a lot of classes there myself. Among what I have found, I’d like to mention the classes from Jose Portila: awesome content.

Just today I was recommending the following learning path for another friend:

01 — Python bootcamp

02 — Pandas & Visualization

03 — General Data Science

04 — SQL

--

--