Data Science For EveryOne Mentorship and BootCamp Program
Data Science For Everyone Program is LuxDevHQ program that aims to make the field of data science accessible and understandable to a wide range of people, regardless of their background or expertise.
In the program, we recognize that data science has the potential to bring valuable insights and solutions to various domains and industries, and therefore, it is important to demystify and democratize this field.
Data science involves extracting knowledge and insights from large and complex datasets using various techniques, such as data mining, statistical analysis, machine learning, and visualization. Traditionally, data science has been associated with specialized skills and technical expertise, requiring a strong background in mathematics, statistics, programming, and domain knowledge.
This Free Bootcamp emphasizes the need for clear communication, intuitive visualizations, and user-friendly tools that enable individuals to explore and analyze data without requiring an in-depth understanding of complex algorithms or programming languages.
Overall, we are aiming to bridge the gap between technical experts and non-technical professionals, enabling a broader audience to leverage the power of data and make informed decisions based on evidence and insights derived from data analysis.
Possible Data Science Career Paths
1). Data Scientist.
-
Data scientists are responsible for collecting, cleaning, and analyzing large datasets to extract valuable insights and make data-driven decisions. They use various machine learning and statistical techniques to build predictive models and solve complex problems.
-
Data scientists often work closely with business stakeholders to identify opportunities for leveraging data to drive business growth.
2). Data Anlayst.
- Data analysts focus on examining data to provide actionable insights to their organizations. They perform data cleaning, data visualization, and basic statistical analysis to help businesses understand trends, patterns, and make informed decisions.
- Data analysts may work in various industries such as finance, marketing, or healthcare.
3). Data Engineer.
- Data engineers are responsible for the design, construction, and maintenance of data pipelines and infrastructure. They ensure that data is collected, stored, and made accessible for analysis by data scientists and analysts.
- Data engineers work with tools like Hadoop, Spark, and databases to manage and process large volumes of data efficiently.
4). Data Architect.
- Data architects design the overall structure and organization of data within an organization. They create data models, define data standards, and ensure data is stored, integrated, and accessed effectively.
- Data architects play a critical role in establishing data governance and ensuring data quality.
Note:
-
These are just a few of the many career paths within the data science and analytics field. Depending on your interests and skills, you may also consider roles such as Machine Learning Engineer, Business Intelligence Analyst, Statistician, or even specialized roles like Natural Language Processing (NLP) Engineer or Computer Vision Engineer. The field of data science is continually evolving, so there are always new opportunities and roles emerging as technology advances and businesses become more data-driven. It's important to choose a path that aligns with your interests and career goals.
-
Also note that we will not be able to cover all the topics and concepts but we will set a solid foundation for your data career.
Course Overview - Program information:
-
Duration: 5 Weeks.
-
Learning Mode: Online with weekly project and technical article.
Week 1: Learn the fundamentals of data science.
- Understand the key concepts of data science and the possible data science career paths.
- Familiarize yourself with fundamental, that is statistical, mathematical, and programming concepts.
- Learn the basics of Python and SQL for data manipulation, wrangling, and analysis Introduction to SQL and Python for data science.
- Understand the fundamentals of data science, statistics, probability, linear algebra, calculus, Python and SQL programming languages.
- Introduce specific data visualization tools like Matplotlib and Seaborn. Provide hands-on exercises or tutorials for students to create visualizations using these tools.
Tools you will learn on week 1: Python, SQL, pandas, numpy, matplotlib, seaborn and statistics module which was introduced in Python 3.4
Week 1 Classes:
(i). Wednesday, September 27th, 2023 8:00 PM EAT.
-
Wangui Ngina, MSc, Lecturer and Data Scientist.
-
Class Guide: https://docs.google.com/document/d/1xJSktD8EXBbrN2ETE8rjZq5rgcUtO4Q7a431sTWa21s/edit?usp=sharing
-
Recording: Not Available Yet
(ii) Saturday, September 30th, 2023 10:00 AM EAT.
-
Lains Wanjiku, Machine Learning Engineer and Data Analyst.
-
Class Guide: https://docs.google.com/document/d/1BznXhlxiAH2lKbAcoUhTFynl90etGczmyfbVwZPpQMg/edit?usp=sharing
-
Recording: Not Available Yet
Week 1 Article: Data Science for Beginners: 2023 - 2024 Complete Roadmap.
Week 1 Projects:
Question 1). Imagine you're working with Sprint, one of the biggest telecom companies in the USA. They're really keen on figuring out how many customers might decide to leave them in the coming months. Luckily, they've got a bunch of past data about when customers have left before, as well as info about who these customers are, what they've bought, and other things like that.
So, if you were in charge of predicting customer churn how would you go about using machine learning to make a good guess about which customers might leave? Like, what steps would you take to create a machine learning model that can predict if someone's going to leave or not?
Question 2). Let’s say you’re a Product Data Scientist at Instagram. How would you measure the success of the Instagram TV product?
Week 2: Learn Basics Data Science Concepts.
-
Learn about data visualization, exploratory data analysis (EDA), and basic statistical measures.
-
Learn about exploratory data analysis, feature engineering, and modelling using real-world data.
-
Learn how problem statements, developing KPIs, Working and collaborating with a remote team, communacation skills, problem solving skills, and to write modern data resume.
Tools you will learn on week 1: Python, SQL, Pyspark, Problem Solving, and Non-Technical Concepts
Week 2 Classes:
(i). Wednesday, October 4th, 2023 8:00 PM EAT.
- Wycliffe Bosire Data Scientist at BAT.
- Recording: Not Available Yet
(ii) Saturday, October 7th, 2023 10:00 AM EAT.
- Lucille Wanjiku, Data Scientist.
- Recording: Not Available Yet
Week 2 Article: Exploratory Data Analysis using Data Visualization Techniques.
Week 2 Project:
Question 1). Read through this case study and solve it https://statso.io/rfm-analysis-case-study/
Question 2). Let’s say we want to build a model to predict booking prices on Airbnb. Between linear regression and random forest regression, which model would perform better and why?
Week 3: Teach Someone Data Science #TeachSomeoneDataScience.
In week 3, you will have some time to explore the different data science career paths and decide which one you want to specialize in. As an assignment, you will find someone new to data science and teach them about it for at least 30 minutes. You will also pick a specific topic in data science and write about it, such as creating a data analysis roadmap on Twitter or LinkedIn.
Week 4: Learn Intermediate Data Science Concepts and Time Series Modeling.
- Learn dimensionality reduction techniques, a way to reduce the number of features in a dataset without losing too much information. This will be helpful in improving the performance of machine learning models.
- Learn feature engineering, the process of transforming raw data into features that are more informative and useful for machine learning models.
- Learn ensemble learning, a technique that combines multiple machine learning models to improve the overall performance.
- Learn neural networks, a type of machine learning model that can learn complex relationships between features and labels.
- LearnTime series analysis modelling, the process of analyzing data that is collected over time. This can be used to forecast future trends or identify patterns in the data.
Week 4 Classes:
(i). Wednesday, October 18th, 2023 8:00 PM EAT.
- ********************, Data Scientist.
- Recording: Not Available Yet
(ii) Saturday, October 21th, 2023 10:00 AM EAT.
- ********************, Analytics and Data Engineer.
- Recording: Not Available Yet
Week 4 Article: The Complete Guide to Time Series Models
Week 4 Project:
Using the Craigslist Vehicles Dataset available on Kaggle (https://www.kaggle.com/datasets/mbaabuharun/craigslist-vehicles), we'd like you to create a Time-Series Model following the approach outlined below.
Here are the key steps:
- Start by addressing missing values in the dataset. You can handle this by filling in missing values with the median for numerical columns and the mode for categorical columns.
- Ensure that the data types of the columns are appropriate. Specifically, make sure to convert the 'posting_date' column to a datetime data type.
- Utilize the 'posting_date' column to create a datetime index for the dataset. This will facilitate the analysis of temporal patterns.
- With clean data, explore it using various visualizations and statistical analysis techniques. This step is crucial for understanding temporal patterns, identifying seasonal trends, and analyzing demand-supply dynamics by region and vehicle type.
- Build the time-series chart.
- Finally, create a GitHub Repository and push your work there, also document your process through each of the steps and demonstrate your understanding by implementing them on the dataset.
Week 5 Project: Exploring Data Engineering and Analytics Engineering with Harun Mbaabu.
- Clearly differentiate between Data Engineering and Analytics Engineering.
- Learn about ETL and ELT and when is the Best Time to use which method.
- Master the modern data stack, how to optimize Python and SQL code, track metrics, and impact as a team play in a data team.
- Learn how to write modern data profession CVs/resumes, where you apply for data jobs, how to get started freelancing, and how to optimize your chances of being hired.
Week 5 Article: Data Engineering for Beginners: A Step-by-Step Guide
Week 5 Project:
1). Project 1:
As a lead data engineer at Data Science East Africa, you are responsible for building a data engineering pipeline to move weather data from a public API to Azure Synapse Analytics. You will then use Power BI to access the data from Azure Synapse Analytics and create a modern dashboard.
(i). What are the best practices for moving data from a public API to a data lake?
(ii). How can we ensure the security and reliability of the data pipeline?
(iii). Implement this project, optimize your process, and create a GitHub repository where you will document all the processes including Screenshort, and a short video explaining the whole process.
2). Project 5.
In week 4, we performed time series modelling on the Craigslist vehicles dataset, which is available on Kaggle at https://www.kaggle.com/datasets/mbaabuharun/craigslist-vehicles. This project builds on that work. You will need to download the dataset, copy the data using SQL to a local PostgreSQL database, move the data from your local database to Snowflake, perform data transformation with DBT (data build tool), and use your preferred data visualization tool to create a report and dashboard.
Note.
To be eligible for a certificate of completion, you must write four articles of at least 400 words each and complete all four projects. I know you can do this! You are talented nerds and I believe in you. Writing these articles will help you improve your writing skills and learn a lot about the subject matter. They will also be a valuable addition to your portfolio.
Important Road Maps.
1). Preparing for a Data Analyst career? Here's a roadmap:
2). Complete Guide to Becoming a Data Scientist
3). Analytics Engineer Road-map With Free Resources : Modern Data Stack
4). Ultimate Data Engineering Road Map: Become a Data Engineer In 2023.
Bonus: Different Ways to Make Money in Data Science.
This is a very intense program that requires a commitment of up to 20 hours per week. Only those who are willing to put in the time and effort will be able to complete it. By the end of the bootcamp, you will be in a position to build basic end-to-end data projects.