anewt225 / Mod1Challenge

Submitting code for FlatIron Mod1 Challenge

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Module 1 Code Challenge

This code challenge is designed to test your understanding of the Module 1 material. It covers:

  • Pandas
  • Data Visualization
  • Exploring Statistical Data
  • Python Data Structures

Read the instructions carefully. You will be asked both to write code and to respond to a few short answer questions.

Note on the short answer questions

For the short answer questions please use your own words. The expectation is that you have not copied and pasted from an external source, even if you consult another source to help craft your response. While the short answer questions are not necessarily being assessed on grammatical correctness or sentence structure, you should do your best to communicate yourself clearly.


Part 1: Pandas [Suggested Time: 15 minutes]


In this section you will be doing some preprocessing for a dataset for the videogame FIFA19. The dataset contains both data for the game as well as information about the players' real life careers.

# Run this cell without changes

import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

1.1) Read the CSV file into a pandas DataFrame

The data you'll be working with is in a file called './data/fifa.csv'. Use your knowledge of pandas to create a new DataFrame, called df, using the data from this CSV file.

Check the contents of the first few rows of your DataFrame, then show the number of rows and columns in the DataFrame.

# Replace None with appropriate code
df = None
# Code here to check the first few rows of the DataFrame
# Code here to see the number of rows and columns in the DataFrame

1.2) Drop rows from the DataFrame with missing values for 'Release Clause'

Drop rows from the DataFrame for which "Release Clause" is missing. This is part of a soccer player's contract dealing with being bought out by another team. After you have dropped them, see how many rows are remaining in the DataFrame.

# Code here to drop rows from the DataFrame with missing values for 'Release Clause'
# Code here to check how many rows are left in the DataFrame

1.3) Convert the 'Release Clause' Price from Euros to Dollars

Now that there are no missing values, we can change the values in the 'Release Clause' column from Euro to Dollar amounts.

Assume the current exchange rate is 1 Euro = 1.2 Dollars

# Code here to convert the column of euros to dollars

Part 2: Data Visualization [Suggested Time: 20 minutes]


Continuing to use the same FIFA dataset, plot data using whichever plotting library you are most comfortable with.

# Run this cell without changes

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

2.1) Find the top 10 countries with the most players (using the 'Nationality' column). Create a bar chart showing the number of players from those 10 countries.

Don't forget to add a title and x axis label to your charts.

If you are unable to find the top 10 countries but want the chance to demonstrate your plotting skills use the following dummy data to create a bar chart:

Country Name  | Num Players
============  | ===========
Country A     | 100
Country B     | 60
Country C     | 125
Country D     | 89
# Code here to get the top 10 countries with the most players
# Code here to plot a bar chart.  A recommended figsize is (10, 6)

2.2) Describe the relationship between StandingTackle and SlidingTackle, as shown in the scatter plot produced below.

# Run this cell without changes

fig, ax = plt.subplots()

ax.set_title('Standing Tackle vs. Sliding Tackle')
ax.set_xlabel('Standing Tackle')
ax.set_ylabel('Sliding Tackle')

x = df['StandingTackle']
y = df['SlidingTackle']

ax.scatter(x, y)

Please describe in words the relationship between these two features.

# Your written answer here

Part 3: Exploring Statistical Data [Suggested Time: 20 minutes]


3.1) What are the mean age and the median age for the players in this dataset?

# Code here to find the mean age and median age

In your own words, how are the mean and median related to each other and what do these values tell us about the distribution of the column 'Age'?

# Your written answer here

3.2) Who is the oldest player from Argentina and how old is he?

Use the Nationality column.

# Code here to find the oldest player in Argentina
# Your written answer here

Part 4: Python Data Structures [Suggested Time: 20 min]


In this final section, we will work with various Python data types and try to accomplish certain tasks using some fundamental data structures in Python, rather than using Pandas DataFrames. Below, we've defined a dictionary with soccer player names as keys for nested dictionaries containing information about each player's age, nationality, and a list of teams they have played for.

# Run this cell without changes

players = {
    'L. Messi': {
        'age': 31,
        'nationality': 'Argentina',
        'teams': ['Barcelona']
    },
    'Cristiano Ronaldo': {
        'age': 33,
        'nationality': 'Portugal',
        'teams': ['Juventus', 'Real Madrid', 'Manchester United']
    },
    'Neymar Jr': {
        'age': 26,
        'nationality': 'Brazil',
        'teams': ['Santos', 'Barcelona', 'Paris Saint-German']
    },
    'De Gea': {
        'age': 27,
        'nationality': 'Spain',
        'teams': ['Atletico Madrid', 'Manchester United']
    },
    'K. De Bruyne': {
        'age': 27,
        'nationality': 'Belgium',
        'teams': ['Chelsea', 'Manchester City']
    }
}

4.1) Create a list of all the keys in the players dictionary. Store the list of player names in a variable called player_names to use in the next question.

Use Python's documentation on dictionaries for help if needed.

# Replace None with appropriate code to get the list of all player names

player_names = None
# Run this cell without changes to check your answer

print(player_names)

4.2) Great! Now that we have the names of all players, let's use that information to create a list of tuples containing each player's name along with their nationality. Store the list in a variable called player_nationalities.

# Replace None with appropriate code to generate list of tuples such that 
# the first element is a players name and the second is their nationality 
# Ex: [('L. Messi', 'Argentina'), ('Christiano Ronaldo', 'Portugal'), ...]

player_nationalities = None
# Run this cell without changes to check your answer

print(player_nationalities)

4.3) Define a function called get_players_on_team() that returns a list of the names of all the players who have played on a given team.

Your function should take two arguments:

  • a dictionary of player information
  • the team name (as a string) you are trying to find the players for

Be sure that your function has a return statement.

# Code here to define your get_players_on_team() function 
# Run this cell without changes to check your answer

players_on_manchester_united = get_players_on_team(players, 'Manchester United')
print(players_on_manchester_united)

About

Submitting code for FlatIron Mod1 Challenge

License:Other


Languages

Language:Jupyter Notebook 100.0%