Michael's Guide to Becoming a Data Scientist by Michael A. Alcorn is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
I was once asked about transitioning to a career in data science by three different UChicago grad students over a short period of time, so I decided to put together this outline in case anyone else was curious.
- My CV
- General Information
- 8 Skills You Need to be a Data Scientist
- What's the difference between a data architect, data analyst, data engineer, and data scientist?
- "Data analyst" will probably be less exciting than "data scientist" for those with a scientific background.
- Advice from a Data Scientist at Quora
- /r/MachineLearning
- Get Experience!
- Intern - this is the best possible thing you can do.
- Try out Kaggle competitions.
- Create a LinkedIn account and keep it updated.
- Curriculum
- Free Courses - use them
- Coursera, edX, Udacity, Saylor, Khan Academy
- Can use my course history as a guide.
- Math
- Calculus (at least up to partial derivatives, which is typically Calculus III)
- Linear Algebra
- Analysis (advanced)
- Statistics - know Bayesian and frequentist theory
- Algorithms
- Machine Learning - know the big algorithms; natural language processing is probably the most useful subfield to learn
- Other Topics - graphs, game theory, information theory, etc.
- Free Courses - use them
- Programming
- Must know Python. Almost all data scientist positions require cleansing and transforming data on a large scale and Python is typically the language of choice for this task.
- Important Python packages/libraries → scikit-learn, NumPy, Keras, TensorFlow, Theano, SciPy, Pandas, Statsmodels
- Must know R.
- Should know your way around a *nix terminal.
- Version control - should know basics of Git.
- Put personal projects on GitHub.
- Contribute to open source projects.
- Databases - definitely know SQL, should probably look into NoSQL databases as well (e.g., MongoDB)
- The best way to learn databases is by working with them. Find a database and practice writing queries for it.
- Big Data Tools
- Be familiar with the following: Apache Hadoop, MapReduce, Apache Spark, Apache Pig, Apache Hive, Apache Mahout, Apache Solr, Apache Lucene