peterdobbs77 / statistical_simulation

investigating methods to simulate data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

statistical_simulation

investigating various data sources and advanced statistical methods

InvestigateCheXpert

Let's look at radiology imaging data

data/chexpert

  • valid.csv - This data file comes from the CheXpert competition. It serves as metadata for a validation set of chest radiographs
  • train.csv - This data file comes from the CheXpert competition. It serves as metadata for a training set of chest radiographs

These files were processed through extractingPatientId.py to create a column for the patient identifier (PID) and study identifier (StudyID). That made it easier to create descriptive statistics of the information.

NbaComparisonsFy20

Let's compare the top teams in the league

data/basketball

  • BucksRosterFY20.csv - data from BasketballReference on the Milwaukee Bucks 2019-2020 roster. Downloaded 3 March 2020.
  • LakersRosterFY20.csv - data from BasketballReference on the Los Angeles Lakers 2019-2020 roster. Downloaded 3 March 2020.

USAU_HistData

Let's work on acquiring Ultimate Frisbee game data

data/ultimate

The data files in this directory were manually collected on 8 April 2020

  • _tournaments.xlsx - combined manually collected data on a collection of ultimate frisbee tournaments
  • tournament_results.csv - game results for a collection of ultimate frisbee tournaments
  • tournament_teams.csv - teams and seedings for a collection of ultimate frisbee tournaments

event subdirectories

These are the output locations for data scraped from the USAU tournament websites with USAU_HistData.ipynb. For example, the Florida Warmup tournament play is stored in the following structure:

Florida-Warm-Up-2019/schedule/Men/CollegeMen/
  • bracketplay.csv - data scraped from brackets on USAU score reporter
  • poolplay.csv - data scraped from HTML data tables on USAU score reporter

About

investigating methods to simulate data


Languages

Language:Jupyter Notebook 73.8%Language:HTML 25.4%Language:Python 0.8%Language:R 0.0%