Jason-R-Turner/my_notes

About the Project
Goals | Background | The Data | Deliverables | Outline
Data Dictionary
Original Features | Engineered Features
Initial Thoughts & Hypotheses
Thoughts | Hypotheses
Project Steps
Acquire | Prepare | Explore | Model | Conclusions
How to Reproduce & More
Steps | Tools & Requirements | License | Creators

About the Project

What makes a magic card valuable?. You can check out our presentation here.

Goals

Build a dataset of cards using Scryfall's API
Identify the drivers of card prices
Create a regression model to predict the price of a card that has an RMSE lower than the baseline

Background

What makes a card valuable? According to MTGGoldfish News here,

"Determining a collectible cards value is not necessarily a straight forward affair. Often a cards popularity is affected by the design of the card or the desirability of the art itself from esteemed artists."

Here's a link to some Magic FAQs that helped us understand the fundamentals involved to make informed decisions in our project

By analyzing Scryfall's API data, we will determine what influences a card's value.

The Data

Our dataset came from https://scryfall.com/docs/api/bulk-data. It includes over 20,000 functionally unique cards from a collectible card game starting from the 90s till present day.

Deliverables

7-10 minute live presentation
Presentation slides via Canva here
GitHub repository with analysis

Project Outline

The files within the repository are organized as follows. The /images and /sandbox contents are not necessary for reproduction.
![Outline](https://github.com/Jemison-DavidJasonStephenVasiliy/Magic-Trick/(tbd)

Timeline

Project Planning: July 28th
Aquisition and Prep: July 29th
Exploration: Aug 3rd
Modeling: Aug 4th
Finalize Minimum Viable Product (MVP): EOD Aug 5th
Improve/Iterate MVP: Aug 11th
Finalize Presentation: Aug 19th

Acknowledgments

Back to Table of Contents

Data Dictionary

Below are the features that we used after preparing our dataset acquired from the Scryfall API which are defined on their Card Objects site.

Column	Data type	Description
artist	String	The name of the illustrator of this card. Newly spoiled cards may not have this field yet.
cmc	Decimal	The card’s converted mana cost. Note that some funny cards have fractional mana costs.
games	Array	A list of games that this card print is available in, paper, arena, and/or mtgo.
id	UUID	An unique ID for this card in Scryfall’s database.
lang	String	A language code for this printing.
legalities	Object	An object describing the legality of this card across play formats. Possible legalities are legal, not_legal, restricted, and banned.
name	String	The name of this card. If this card has multiple faces, this field will contain both names separated by ␣//␣.
rarity	String	This card’s rarity. One of common, uncommon, rare, special, mythic, or bonus.
released_at	Date	The date this card was first released.
reprint	Boolean	True if this card is a reprint.
set_name	String	This card’s full set name.
type_line	String	The type line of this particular face, if the card is reversible.

Engineered Features

We engineered the features seen below from the original data using domain expertise and insights gleaned from our dataset.

Feature Name	Description
card_type	Categories for the different card type lines.
first_prints_usd	Price in USD for the original printings of cards.
foil	Boolean that is True for foil cards.
foil_and_nonfoil	Boolean that is True for cards that have both foil & non-foil versions.
foil_and_nonfoil_usd	Card prices in USD for which 'foil_and_nonfoil' is True.
nonfoil	Boolean that is True for non-foil cards.
nonfoil_only	Boolean that is True for cards with only non-foil versions.
nonfoil_only_usd	Card prices in USD for which 'nonfoil_only' is True.
reprints_usd	Card prices in USD for which 'reprint' is True.
usd	Prices in USD for which 'price' is has a 'usd' value.
year_released	Year value derived from 'released_at' Date.

Back to Table of Contents

Initial Thoughts & Hypotheses

Thoughts

What affect does game style have on price?
Does the artist affect the price?
Is there a difference in price between sets?
What affect does reprint have on the price?
How have card prices been affected by lockdowns due to the Covid-19 pandemic?
Does legality affect the price?
Does rarity affect the price?
What affect does release date have on the price?
What affect does foil have on the price?
Does the set type and rarity of the cards affect the price in USD?
What is the overlap between card types and rarity?
What effect does basic card type have on price?

Hypotheses

$H_0$: There is not a signficant difference between foil & non-foil cards.
$H_a$: There is a significant difference between foil & non-foil cards.

$H_0$ = There is not a signficant difference in USD prices between reprints & first printings.
$H_a$ = There is a significant difference in USD prices between reprints & first printings.

$H_0$ Year $x$ has a average price per card equal to or less than the general average price.
$H_a$ Year $x$ has a average price per card greater than the general average price.

$H_0$ : Mean price for card type $x$ is less than or equal to the overall mean price.
$H_a$ : Mean price for card type $x$ is greater than the overall mean price.

$H_0$ : The mean usd price for cards which are available in game type $x$ is less than or equal to the mean usd price of those not available in game type $x$.
$H_a$ : The mean usd price for cards which are available in game type $x$ is greater than the mean usd price of those not available in game type $x$.

$H_0$ : Cards of $raritytype$ and $cardtype$ have a mean value less than or equal to the overall card mean value.
$H_a$ : Cards of $raritytype$ and $cardtype$ have a mean value greater than the overall card mean value.

$H_0$: The total amount of cards created by artist x is = to the value of cards there cards.
$H_a$: The total amount of cards created by artist x is not = to the value of cards there cards.

Back to Table of Contents

Project Steps

Acquire

We acquired our data from the Scryfall API using their "Default Card" json file at this site: https://scryfall.com/docs/api/bulk-data. ![Acquire-Visual](https://github.com/Jemison-DavidJasonStephenVasiliy/Magic-Trick/(tbd)

The dataframe is saved as a json file and has around 71,000 observations. The acquire.py file has a function for grabbing the latest dataset. There are 56 columns in the original data frame. There are many NaNs which have been left until the later sections to be dealt with.

Prepare

Functions to prepare the dataframe are stored in two seperate files depending on their purpose, prepare.py and model.py:

prepare.py: Functions for cleaning and ordering data

nulls are dropped
change dtypes to correct type
extracted USD from 'prices' column into its own column
split dataframe into test, validate, and train

Explore

During exploration we looked at these features:

Set Type
Card Types
Rarity
Foil/Non-Foil
Language and Locality

![Card Frame Styles](https://github.com/Jemison-DavidJasonStephenVasiliy/Magic-Trick/(tbd)

Model

The final model is a RandomForestRegressor with 200 features and predictions are made on unseen test data.
The model metrics improve with the test data in both MAE and within-a-dime.
From plotting the residuals, it is clear that more expensive cards are likely to be underestimated; sometimes by a significant amount.

Feature Groups We used these sets of feauture groups.

Price point clusters

Models Evaluated

RandomForestRegressor
KNeighborsRegressor
DecisionTreeRegressor
LinearSVR

Evaluation Metric

Mean Absolute Error (MAE). This is chosen because models tended to poorly predict expensive cards, leading to the RMSE being a poor metric since it over emphasizes large errors
Card predictions within a dime of the actual value. The models tended to be better at predicting lower priced cards. Since the use case is to determine if new cards hit many price points in the market, if a larger percentage of cards are accurately predicted, more price points can be easily marketed to with the model.

![Model-Error](https://github.com/Jemison-DavidJasonStephenVasiliy/Magic-Trick/(tbd)

Final Model:
RandomForestRegressor was our final model we performed on test, predicting ##% better than the baseline.

Model	Train	Validate	Test
Baseline Mean	5.827456	5.724399e+00
Baseline Median	3.721033	3.632730e+00
Random Forest Regressor	1.442782	2.765467e+00	2.532968
K-Neighbors Regressor-weighted	0.926974	2.927149e+00
K-Neighbors Regressor	2.482755	2.994726e+00
Decision Tree Regressor	0.913396	3.065815e+00
Linear SVR	3.379344	3.292252e+00
XG Boost Regressor	3.407522	3.599480e+00
Radius Neighbors Regressor-weighted	0.913396	3.639562e+00
Radius Neighbors Regressor	5.428092	5.330757e+00
Linear Regression	5.509399	1.057490e+07

![Model_Evaluation](https://github.com/Jemison-DavidJasonStephenVasiliy/Magic-Trick(tbd)

Conclusions

Using Scryfall data we are predicting the USD price of Magic The Gathering cards. This allows the publisher to determine the price of newly published cards by their similarity to other cards to make sure that they are hitting all the likely price points within the current market. It will also allow buyers and sellers to determine the price of cards with no or little current price data.

Back to Table of Contents

How to Reproduce

Steps

~~Read through the README.md file~~ ✅
Download the acquire.py, prepare.py, explore.py, and modeling.py modules.
Visit https://scryfall.com/docs/api/bulk-data for the most up-to-date dataset.
Use the acquire function to import your dataset.
Pip install XGboost.
Use the prepare function to clean up your dataframe.
Explore the data as you like.

Tools & Requirements

License

Creators

David Mitchell, Jason Turner, Stephen FitzSimon, Vasiliy Melkozerov
Back to Table of Contents