tmvien / ada-2020-ncses

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Applied Data Analytics Program at National Center for Science and Engineering Statistics (NCSES), National Science Foundation (NSF)

The National Center for Science and Engineering Statistics (NCSES) hosted the Fall 2020 Coleridge Initiative Applied Data Analytics training program.

Participants work in teams to define and complete a project aimed at gaining insights into how doctoral recipients are funded during their academic careersand and how does this differ by research field, race/ethnicity, and sex. The program provides up-to-date perspectives on the use of administrative and survey data for policy analysis, and instruction on how to manage and analyze micro data according to best practices. Instructors facilitate hands-on coding of micro data in SQL and R for the following tasks: data management, text analysis, data visualization, and machine learning.

Datasets Used in the Class:

  • Survey of Earned Doctorates, Survey of Doctorate Recipients, Higher Education Research and Development Survey (provided by NCSES)
  • UMETRICS (provided by the Institute for Research on Innovation and Science)
  • United States Patent data (PatentsView: https://www.patentsview.org, open source)
  • Federal RePORTER Grant data (https://federalreporter.nih.gov, open source)

Class Program:

Module 1 - Introduction to R and SQL

Module 2
October 14 - Introduction, Database Management and Project Scoping
October 15 - Dataset Exploration
October 16 - Applications of Dataset Exploration
October 19 - Basics of Data Visualization
October 20 - Applications of Data Visualization
October 22 - Text Analysis
October 23 - Interim Presentations

Module 3
November 16 - Privacy, Confidentiality, and Ethics
November 17 - Machine Learning
November 19 - Inference
November 20 - Pulling it all Together
November 30 - Interim Presentations

About


Languages

Language:Jupyter Notebook 100.0%