Mufasa98 / SAT

SAT

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A Comprehensive Analysis of Student SAT Correlation with Retention and Graduation Rates

Team 10 Members: Francis Crawford, Mufasa Naeem, Isabelle Roetcisoender, Isabel Sy

The purpose of this project is to showcase knowledge reflective of course materials including data and delivery, back end knowledge, and visualizations. This project will cover the correlation between student academics or test scores (SAT and ACT scores), student retention rates, and student graduation rates. The information for this project is sourced from the United States Department of Education's CollegeScoreCard website. This website is an online tool with data on college costs, graduation, and post-college earnings. The following components are covered in this project:

Topic:

We wanted to look at the corelation between College Tests and Graduation Rates.

Data Processing:

The College Score Card Website has a free API including information regarding over 6000 different Schools. We were able to call the API and retrieve data for only the information we needed. We looked at multiple different things including the following:

  • Demographics
  • Gender
  • Income
  • Academics
  • Fafsa Information
  • Depedency

For each one of the above groups, we called multiple calls to collect as much information as we could to graph and statistically analyze our data. Hence, we seperated the calls into groups which we then placed in Data Frames and CSVs. A major issue we had with using the Data from the API was that not all of the fields had valid values. Multiple columns contained Nan values, or were privacy surpressed. To narrow down from 6000 Schools, we chose to group by just the Schools that had SAT information in them. This also helped our Topic since we were only concentrating on College Tests and Grad Rates. A for loop was created to get all the information from different pages of the API. Once we had pulled the 6 sets of API Calls, we put them into 6 individual Data Frames so we could start the cleaning process. Some processing was DataSet specific, however most of the DataFrames needed the following changes:

  • Changing values to Percents % image

  • Renaming columns with shorter/more readable names image

  • Restricting all Double values to 2 decimal places image

  • Setting the index to the School Name image

  • Changing all NaN values to 0 image

  • Dropping all Schools that had a 0 for the SAT Average Overall image

  • Saving the DataFrames into CSVs and JSONs for SQL & JavaScript implementations image

  • The next part of the Data Processing was plotting graphs using Matplotlib. We retrieved multiple different columns from different DataFrames that we had created to grapgh Scatter Plots to help visualize any trends and corelations. We also displayed R Values to analyze the strength of the corelations: image image image

Data Storage

After processing and cleaning the Data, we used PostGress to store the 6 individual DataFrames into SQL Schemas. We created the Tables in SQL and then imported our previously created CSVs files into the Schemas. We set constraints for all the tables to our main linked value = 'School Name'. Once the tables were made and values were imported, we ran a few queries to check if the values had pulled that was succesful. image image image

The actual queries for creating the tables and setting contraints is in the Schema.txt file

Visualizations

A githubpage (https://mufasa98.github.io/SAT/) was initialized to showcase an overall summary of the study. Utilizing HTML, (minor) CSS, and JavaScript's D3 library a json consisting of the compiled dataset was parsed, creating a dynamic webpage that changes as the user selects a university from the dropdown menu.

The following image is a snippet of the JavaScript code. In this particular snippet the d3 library is used to select the html sample-metadata tag to update the data on the webpage when a new university is chosen by the user as well as the code to build the plotly gauge.

image

Again, the data was cleaned to fit the study's need, not all universities are included in the dropdown.

image image

Issues: One of the biggest issues faced during this process of the project was figuring out how to resize a leaflet map. This was resolved with the help of colleagues and intense web research.

Another issue that popped up was initialzing the github page. The html and css files would load in as source files; however, the JavaScript code would not. This was cause by the live server also being activated while trying to load the web page. Once the live server was disconnected, the webpage initialized properly.

About

SAT


Languages

Language:JavaScript 61.9%Language:HTML 31.2%Language:CSS 6.9%