srobert4 / 290t-project

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

290t-project

Set up

Python environment

  • Tested on Python 3.7

For MacOS

For WindowsOS

  • Create virtual enviroment: python -m venv myenv
  • Activate virtual environment: myenv/Scripts/activate
  • Install required packages: pip install -r "<folder path>/requirements.txt"
  • Add virtual environment to Jupyter: python -m ipykernel install --user --name=myenv

Neo4j

  • Download neo4j desktop - add instructions for set up here

Reddit API

Credentials

  • Copy neo4j and reddit credentials into config.txt and then move this file to /etc/ (or somewhere else, but then change graph.py and scraper.py to reflect this)
  • Neo4j graph server needs to be running in order to add and query data.

Getting Started

  • Example code to load data into your graph, add annotations and analyze annotations is in Example notebook.ipynb

Class Structure

  • Data_Loader

The Data_Loader class exports functions to scrape data from Reddit and load it into your Neo4j graph using py2neo

* `Scraper` - the scraper class interfaces with the Reddit API through `praw`
* Object-Graph Mapping - `nodes.py` contains the Object-Graph Mapping. It defines a class for each node type in the Reddit graph
  • Data_Viewer

The Data_Viewer class exports functions to query the graph and view the data. There are functions to view content and to view nodes.

  • Annotator

The Annotator class exports functions to add annotations (Code nodes) to content (Submission and Comment nodes) in the graph.

About


Languages

Language:Jupyter Notebook 69.5%Language:Python 30.5%