FrederickGeek8 / nyuclasses-resource-archive

A script for archiving the contents of the `Resources` folder for each class on newclasses.nyu.edu

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NYU Classes Resources Scraper

newclasses.nyu.edu is going away August 15th and with it all the course material there will die. This script allows you to download some of the course materials from the website.

This repo contains a script to download all of the Resources for classes listed on NYU Classes (newclasses.nyu.edu).

A screenshot of the Resources tab on a class

New: Download the a web archive of NYU Classes using this tool: https://github.com/FrederickGeek8/nyuclasses-web-archive

Grabbing the code

You can get a copy of this code by

git clone https://github.com/FrederickGeek8/nyuclasses-resource-scraper.git;
cd nyuclasses-resource-scraper

Install Dependencies:

pip install -r requirements.txt

Get Class List

You can get the list of classes to scrape by going to "My Memberships" on newclasses.nyu.edu and running the following Javascript in your web browser's developer console.

The Membership tab on the home page of newclasses.nyu.edu

You can get to the console by Right-Clicking on the webpage, then clicking Inspect, then clicking the Console tab, paste the following code:

JSON.stringify(
  Object.fromEntries(
    Array.from(document.querySelectorAll("[headers=worksite]")).map((x) => [
      x.children[0]?.innerText,
      x.children[0]?.href,
    ])
  )
);

Copy the result and save it to a text file. We recommend classes.txt in this folder. It should look like

{
  "Class Name 1": "https://newclasses.nyu.edu/portal/site/numbers",
  "Class Name 2": "https://newclasses.nyu.edu/portal/site/differentnumbers"
}

Javascript command result that we copy to the clipboard to save to a file.

spacing doesn't matter. You can remove the "key": "value", pair from that file for classes you don't want to scrape. By default we scrape all of the classes listed on that webpage.

Usage:

You can run the following:

python GetClassResourceData.py --user user --pass mypassword123 --json classes.txt --outdir /Backup/NYUClasses

with

  • user: Your NYU username
  • pass: Your NYU password
  • json: The path to the JSON containing file with the classes you want to save, default: classes.txt
  • outdir: The path to where you want to save the class folders

It will produce at outdir a structure like the following: the contents of the Resources tab for each class in the JSON file: A screenshot of the folder structure produced by the script. Class name and underneath the contents.

This script will skip downloading files that already exist and (1) are newer on the local machine and (2) have the same file size as the remote. In order to redownload files by force, you must delete the files from your filesystem first.

Notes

  1. This was tested on Python 3.9.16. It should hopefully work for Python >=3.9

About

A script for archiving the contents of the `Resources` folder for each class on newclasses.nyu.edu

License:MIT License


Languages

Language:Python 100.0%