Nadimsaad / Group-Project-3

The goal of this project is to do: data wrangling, cleaning, manipulation with Pandas

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Group-Project-3

Data Cleaning - Salaries Data

Project Description Project Goals Technical Requirements Necessary Deliverables Presentation Suggested Ways to Get Started Useful Resources Project Description

The goal of this project is to combine everything you have learned about data wrangling, cleaning, and manipulation with Pandas so you can see how it all works together.

Project Goals and Steps

The goals of this project is to apply the data preparation knowledge and to train more in Python and SQL.

Plan your work Manage your git repository. Build your code from scratch. Put into practice the basic data processing concepts learned during the week. Get used to public presentations. Technical Requirements

The technical requirements for this project are as follows:

We provide you a dataset of real data. Thus, there is some probability of having issues there. Import the data using Pandas. Examine the data for potential issues (missing data, data inconsistency, outliers, duplicates etc). Apply the different cleaning and manipulation techniques you have learned. Explain all the approaches you are going to use (why and how) Produce a Python code that shows the steps you took and the code you used to clean and transform your data set. Export clean version of your data into CSV file using Pandas. Check sqlalchemy library Export clean data into MySQL using Pandas. Prepare at least 3 tables analyzing the dataset in MySQL. Consider using group by statement for it. Extra: try to build the helphul functions for data preparation. Necessary Deliverables

The following deliverables should be pushed to your Github repo for this chapter.

CSV file with clean data containing the results of your data wrangling work. Python file containing all Python code and commands used in the importing, cleaning, manipulation, and exporting of your data set. MySQL queries file containing the code to obtain table of your analysis. A README.md file containing a detailed explanation of the process followed in the importing, cleaning, manipulation, and exporting of your data as well as your results, obstacles encountered, and lessons learned. Look here for tips on how to structure a README.md file. Presentation

The presentation time limit is 5 minutes (I begin to think that this is impossible, but ...)! You will have 3 minutes to present your project to the class and then 2 minutes for Q&A.

The slides of your presentation must include the content listed below:

Title of the project + Student name Description of your dataset Challenges Process (don't forget to explain your decisions) Learnings Improvements Comparison of the initial and final datasets Highlights Suggested Ways to Get Started

Examine the data and come up with a deliverable before diving in and applying any manipulation methods. Specify the specific task beforehand. You will probably spend less time on project if you specify the task and expected outcomes beforehand. Raise questions, come up with some hypotheses to be tested. Break the project down into different steps - use the topics covered in the lessons to form a check list, add anything else you can think of that may be wrong with your data set, and then work through the check list. Use the tools in your tool kit - your knowledge of intermediate Python as well as some of the things you've learned in previous chapters. This is a great way to start tying everything you've learned together! Work through the lessons in class & ask questions when you need to! Think about adding relevant code to your project each night, instead of, you know... procrastinating. Commit early, commit often, don’t be afraid of doing something incorrectly because you can always roll back to a previous version. Consult documentation and resources provided to better understand the tools you are using and how to accomplish what you want. Useful Resources

Pandas Documentation Pandas Tutorials StackOverflow Pandas Questions StackOverflow Pandas Questions Export pandas to MySQL

About

The goal of this project is to do: data wrangling, cleaning, manipulation with Pandas


Languages

Language:Jupyter Notebook 100.0%