ProgAhmedFathi / Investigate-a-Dataset

2nd project in the egFWD scholarship for Data Analysis Nanodegree Program from Udacity.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Investigate a Dataset

The 2nd project in the egFWD scholarship for Data Analysis Nanodegree Program from Udacity.

These projects REVIEWED by Udacity reviewers (real persons) according to Project Rubricand, and the projects MUST "meet specifications" or "exceed specifications" in each category in order for my submission to pass.

About the project

In this project, I chose a dataset from offered datasets and I investigate it using NumPy and pandas libraries. Go through the entire data analysis process, starting by posing a questions and finishing by sharing my findings. Go through the data analysis process and see how everything fits together. Later Nanodegree projects will focus on individual pieces of the data analysis process. Use the Python libraries NumPy, pandas, and Matplotlib, which make writing data analysis code in Python a lot easier!

The Dataset

You can find the dataset which I used on this link Datasets.
I worked on "Database_No_show_appointments". So, to run the notebook correctly choose "Database_No_show_appointments" from the datasets and change the name of csv file to "Medical Appointment No Shows".

I start by taking a look at the dataset and brainstorming what questions I could answer using it. Then I used pandas and NumPy to answer the questions I am most interested in, and create a report sharing the answers. This project is open-ended in that they are not looking for one right answer.

What I learned?

After completing the project, I learned:

  • Know all the steps involved in a typical data analysis process.
  • Be comfortable posing questions that can be answered with a given dataset and then answering those questions.
  • Know how to investigate problems in a dataset and wrangle the data into a format you can use.
  • Have experience communicating the results of your analysis.
  • Be able to use vectorized operations in NumPy and pandas to speed up your data analysis code.
  • Be familiar with pandas' Series and DataFrame objects, which let you access your data more conveniently.
  • Know how to use Matplotlib to produce plots showing your findings.

How did I complete this project?

1. Choose Your Data Set

Click this link (available in a Google doc here) to open a document with links and information about data sets that you can investigate for this project. You must choose one of these datasets to complete the project.

2. Get Organized

Eventually you’ll want to submit your project (and share it with friends, family, and employers). Get organized before you begin. We recommend creating a single folder that will eventually contain:

  • The report communicating your findings.
  • Any Python code you wrote as part of your analysis.
  • The data set you used (which you will not need to submit).

You may wish to use a Jupyter notebook, in which case you can submit both the code you wrote and the report of your findings in the same document. Otherwise, you will need to submit your report and code separately. If you would like a notebook template to help organize your investigation, you can click here. Or there may be a page in the project here called Project Workspace: Complete and Submit Project, where you can do all your work and submit the project.

3. Analyze Your Data

Brainstorm some questions you could answer using the data set you chose, then start answering those questions. You can find some questions in the data set options to help you get started.

Try and suggest questions that promote looking at relationships between multiple variables. You should aim to analyze at least one dependent variable and three independent variables in your investigation. Make sure you use NumPy and pandas where they are appropriate!

4. Share Your Findings

Once you have finished analyzing the data, create a report that shares the findings you found most interesting. If you use a Jupyter notebook, share your findings alongside the code you used to perform the analysis. Make sure that your report text is contained in Markdown cells to clearly distinguish your comments and findings from your code work. You should also feel free to use other tools and software to craft your final report, but make sure that you can submit your report as an HTML or PDF file so that it can be opened easily.

5. Review

Use the Project Rubric to review your project. If you are happy with your submission, then you're ready to submit your project. If you see room for improvement, keep working to improve your project!

About

2nd project in the egFWD scholarship for Data Analysis Nanodegree Program from Udacity.


Languages

Language:Jupyter Notebook 100.0%