Akash190104 / ML_Roadmap

The only ML Roadmap that you will ever need.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The Only ML Roadmap, that you will ever need!

So, you want to dive deep into Machine Learning and do something in the field of Artificial Intelligence but don't know where to begin. You have come to the right place! I would be putting out everything that I know and things that I feel you need to know to get into any kind of data role. This document would also help you explore every broad data role and might help you decide what you should be doing.

Is ML and Data Science for you?

Before you dive in any deeper. I guess it makes sense to understand if you should even pursue this field. I would jot down a few points and if it matches your personality, then you should definitely give it a shot!

  • You love mathematics (or atleast you don't hate it).
  • You like looking at percentages, statistical data and would love to create such insights if given a chance.
  • You have an eye for patterns and you figure out interesting details that most people might miss.
  • You don't get scared of large excel sheets and would be okay to clean them to work on cool projects with the cleaned data.
  • You are a good communicator, or you are comfortable with communicating your ideas and findings to people around you (SUPER IMPORTANT).
  • You are okay with not finding the right answer and have accepted that you will never find one. Your job would be to get as close to the correct answer as possible but it's okay if you don't reach it.
  • You like brainstorming about things and can tolerate working with vague problem statements and figure out what to do.

It is absolutely fine if all the points do not match the person you are, but if most of them make sense then you should continue reading further.

The paths to choose from

If you are still reading, I believe you want to give a chance to data science and Machine Learning. In this section, I would talk about the paths which you can choose from. Follow the table below, to get a basic idea.

Role What do they do?
Data Analyst Analyze data provided to them to generate insights. Present their findings using visualization tools and PPTs.
Data Engineer The data which data analysts work with doesn't come easily. The ones who work hard to create data pipelines to bring the data to the table are data engineers.
Machine Learning Engineer Data analysts provide insights of the data based on what happened in the past. The ones who build predictive models to forecast the future using the present data are machine learning engineers
MLOps Engineer Building a model doesn't mean that the model would perform well all the time. every model degrades with time. The ones who look after the deployment of the model and maintain it's performance are MLOps engineers.
Data Scientist The one who has the ability to handle most of the above mentioned roles has the potential to be a data scientist. They are also more inclined towards doing mathematical and statistical research to improve model performance
AI Researcher A data scientist who focuses more on maths and statistics. Focuses less on data engineering but puts a lot of effort to read about the recent advancements in the field of AI and tries to build better and more sophisticated algorithms. They are more often than not in specialized fields like Computer Vision (image related tasks), Natural Language processing (text related tasks), speech recognition, etc.

What to learn?

Now that you are aware of the types of roles that are present in the data realm, you might want to know how to get started and what you need to learn for each role.

Data Analyst

A skilled data analyst has

  • strong communication skills
  • good presentation skills and ability to explain visualizations
  • Proficiency in SQL
  • Proficiency in data visualization tools
  • Proficiency in Python (optional but recommended)

So to be a data analyst, the primary skill you need is SQL. There's no substitute of SQL, so prepare this well. There are multiple resources to learn SQL There is this amazing playlist from campusX to learn SQL. You should also check out Joey Blue who has done exceptional work as well. Practice Questions from Data Lemur after you learn SQL and you are good to go. You need to learn a data visualization tool like Power BI or Tableau. I personally prefer Power BI. There's an amazing course by Codebasics to learn Power BI. The link is attached here. In case, you want to learn python, you need to learn the fundamentals and a few libraries like Numpy, Pandas and matplotlib/seaborn (I prefer matplotlib). FreeCodeCamp is an youtube channel that provides enormous amount of knowledge for free! There's a 12 hour long video that teaches you how to install python even if you have never done any programming and it also makes you learn about these above-mentioned libraries.

Finally, I would also ask you to do something that most people ignore. LEARN ADVANCED EXCEL. You might feel, you know how to use Excel, I promise you that you don't. There are so many ways that excel can simplify your life, it's just amazing. Check out this playlist from freecodecamp where they teach you excel from basics and progressively teach you the advanced concepts.

If youtube is not for you and you require structured online courses, I have a few suggestions as well.

Try to do it in the mentioned order. Coursera offers financial aid, so you can do all three of the above mentioned courses for FREE.

Data Engineer

I would be honest here, the purpose of this roadmap is to make you guys aware of all the roles present in the data industry. I am not pursuing Data Engineering, so I am not the right person to advise you about this. yet there are things that I can tell you about data engineering. It is mostly for people who are very efficient programmers. Infact, I would say, data engineering is one of the most programmatically heavy roles out of all the above-mentioned roles. Yet, if you are someone who would prefer to code more than talking to people, then data engineering is something that you might like.

A few tools that data engineers use are:

  • SQL
  • Python
  • Scala
  • Hadoop/Spark/Kafka for handling big data
  • Apache Nifi and Apache Airflow for building ETL pipelines
  • Snowflake/Google Big Query, etc for data warehousing
  • Familiarity with Cloud Systems like GCP/AWS
  • Familiarity with containerization technologies like Docker and Kubernetes.

I am sorry but since I don't have much experience with data engineering technologies, I can't help here. Yet, you can google and find resources in this field that you deem useful. I have a better option though in the opportunities section down below.

Machine Learning Engineer

If you made it here. I hope you like maths. Data Analytics was communication heavy and Data Engineering was programming heavy. The roles that you would be seeing now are all mathematics and statistics heavy. This shouldn't scare you but you need to learn a bit of Linear Algebra, Calculus and Probability to do well in these fields, to properly understand how things work. Let's dive deep into it.

If you are a Machine Learning Engineer, you probably would have your data prepared by a team of data engineers. Depending on the size of the company, it would be either analyzed by data analysts or in a small company, you would have to analyze the data before working on your machine learning models. So, to be well prepared, you need to already have the necessary data analyst skills (you can leave out the dashboarding aspects but it's good to know to keep your options open). The extra skills that a machine learning engineer has other than just having data analysis skills are:

  • Solid foundations of linear algebra, probability statistics and calculus.
  • Knowledge of some other python libraries like Scikit learn, beautiful_soup/spacy, etc.
  • Knowledge of frameworks like tensorflow/pytorch.
  • Understanding of Standard Machine Learning Algorithms like Linear Regression, Logistic Regression, K nearest Neighbors, Support Vector Machine, Decision tree, Random Forest, Gaussian Naive Bayes, etc.
  • Understanding of concepts like gradient descent, multi-layer perceptrons, feed-forward neural networks, back propagation, etc. to build the base for understanding deep learning.
  • Knowing how to deploy your projects using Streamlit (if possible learn Flask too).
  • Learning about specialized fields like Computer Vision and Natural Language Processing (optional).

There are a lot of free courses available on the internet related to Machine Learning. I would tell you guys to look at the ones that I feel the most important and have actually helped me learn.

If you like reading books, you can try reading:

  • Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2 by Sebastian Raschka Vahid Mirjalili
  • Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

For Mathematics and Statistics you can read:

  • Practical Statistics for Data Scientists (Oreilly)
  • Mathematics of Machine Learning by Martin Lotz
  • CS229 Lecture Notes Andrew Ng and Tengyu Ma (this is much more than notes trust me.)

For Advanced Readers out there:

  • Approaching (Almost) Any Machine Learning Problem by Abhishek Thakur
  • The Black Swan: The Impact of the Highly Improbable (you can easily go ahead without reading this but this is one of the greatest books that I have ever read).

For the ones who want to dive deep into Computer Vision and NLP:

  • For NLP, you should follow Pradip Nichite. He is one of my personal favourites.
  • For Computer Vision, You should follow Nicholas Renotte. He has also done work in NLP as well but his tutorials on Motion Sensing are amazing.
  • For an all rounder, you can follow Rob Mulla.

MLOps Engineer

This is another field that I have very low idea about and hence I won't be talking much about it. Yet, in layman's terms, MLOps Engineer = Devops Engineer who understands ML

You can follow Ayush Singh (I know he's amazing, just don't feel you haven't done anything in your life. You weren't aware. Now you are). Also, be aware that this role is also very programmatically heavy but requires lower communication skills compared to other roles. You can look at the MLOps roadmap by Marvelous MLOps which I feel is a good place to start. That's all that I can help you guys with here. You would find something interesting in the opportunities section below.

Data Scientist

If you're reading this, then know that this role prefers candidates with a masters. Although there is a probability that you can get hired as a data scientist right after your undergraduate degree but the chances are extremely slim. Read Further, only if you wish to go for a master's (if not a PhD). To become a data scientist you need to understand the image below.

Data Science Venn Diagram

To be a data scientist you need to:

  • Know python and machine learning.
  • Understand statistics and mathematics
  • Have domain specific knowledge.

Data Scientists use data to help companies make decisions that have a direct impact. So being comfortable with hypothesis testing and performing statistical tests to check if your decisions are correct would be important.

As mentioned above, this isn't an entry level role and requires expertise. So the best you could do is try to get expertise in mathematical and statistical concepts that are important to data science. The books presented above in Machine Learning would help you with that.

AI Researcher

I consider this to be the top role in the field of AI and machine learning. The ones who are in this role definitely have a masters and in most cases, a PhD. You need to have publications in top conferences and a lot of interest in research to actually get to this position. The data scientists use algorithms to solve problems. AI Researchers are the ones who study the underlying mathematics of these algorithms and try to improve them. This is very technical and is only for ones who want to study and dissect the field of Artificial intelligence. This is where I want to reach. I hope some of you guys do as well. It is very academically heavy and is easily the most mathematical. Yet, if you are very interested in solving mathematical problems and proving theorems, this could be for you.

What Next?

Thank you for reading this far! I hope you have got an idea of all the major roles that exist and have a vague idea of what you want to pursue. Now, to get a job in the data industry, you need to stand out from the rest. For that you need to have a portfolio website.

A portfolio website consists of all the projects that you have built and helps the recruiter figure out what you are good at in a few seconds. The resources mentioned above already have a lot of guided projects that you can mention in your resume but other than that you can also do a few projects on the side to help you out.

Opportunities

I talked about portfolio website but I did not mention how you can do projects of your own. For that there's a platform that comes to your rescue. It's called Kaggle. Here you would find thousands of datasets to practice your data analysis/machine learning skills to work on and there are also experienced people sharing their code that could help you learn the best practices (What more do you need?).

Hackathons

Showcasing that you have experience to build an end to end solution in a small time is very attractive to potential employers. There are multiple websites where you can find Data Science/Machine Learning related hackathons. I would name a few:

Keep checking these websites on a timely basis as new opportunities can come up any time. Stay aware. Knowing about opportunities is a gift that people take for granted.

Special mentions

There are a few opportunities that I would like to talk about which do not fit the above mentioned criterias but could be helpful.

  • AWS AI & ML Scholarship: This is a 20 hour long course that you need to take. After which you have to give an exam and score 80% to be eligible to apply. The top 2000 students who apply get a 4 month long nanodegree course about Python programming and AI on Udacity by AWS (worth $4000) for FREE. The top 500 who do well in that course get another advanced nanodegree program for free and receive guidance from AWS Mentors for a year. The last date to complete your application after completing all the tests and course work is 30th September, 2023. BE QUICK!
  • Hamoye: This one is for the Data Engineering Enthusiasts and MLOps Enthusiasts. I asked you to scroll to the opportunities section for this. Hamoye is an excellent platform that offers students the chance to study in a competitive environment. They offer you four tracks to choose from:
  • Data Science
  • Data Engineering
  • Cloud Engineering and Devops
  • Data Storytelling

You would be in a 4 month long internship here where you would have to submit weekly quizzes and assignments to get a rank among every other student who applied for the internship. The best performers tend to get internship opportunities (and even if you don't, you would have learnt a lot, made projects to showcase and would have a leaderboard position to flex about ). The last date to apply to the fall cohort 2023 is 8th September, 2023. Be QUICKER!!!

  • Omdena: You have built some projects and have some experience to contribute to projects that actually make an impact. Omdena could help you with that. They offer you a chance to work with people from all around the world for 6-8 weeks where you try to solve a real world problem and make an impact. The project actually gets implemented in the real world and is a very solid add to your resume. Projects on omdena are open all around the year and you can apply whenever you feel ready.

Conclusion

Whoa, this was long, Thanks for reading this far. I would admit that I am not an experienced professional and I am still learning but this is everything that I had used (or still am using) to improve myself. I believe all these resources could help you all as well. If you feel there's something that you don't agree with or want to contribute something that I missed out. Feel free to raise a PR. I hope this helps some of you!

~Akash

About

The only ML Roadmap that you will ever need.