michael153 / autociter

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Autociter

Authors: Michael Wan, Balaji Veeramani

Overview

Uses NLP to accurately extract citation information from any online website

Dependencies

  • dateparser
  • html2text
  • keras
  • numpy
  • PyPDF2
  • scikit-learn
  • termcolor
  • timeoutdecorator
  • tensorflow
  • fake-useragent pip install dateparser html2text keras PyPDF2 termcolor

Open-Ended Questions Regarding Implementation / ML Model

  • Would preserving capitalization help the model? (E.g names usually are capitalized or all-caps, titles are usually capitalized)

G-cloud Compute Engine (Credentials needed)

SSH onto Instance: gcloud compute --project "autocitertraining" ssh --zone "us-west1-a" "overpowered-autociter"

SCP Files to Instance: gcloud compute scp --recurse * overpowered-autociter:~/[$PWD]

SCP Remote Instance Files to Local: gcloud compute scp --recurse overpowered-autociter:~/[$PWD]/assets/files/ml assets/files/ml

To-do

Project Guideline Doc

Tasklist Spreadsheet

About


Languages

Language:Python 99.9%Language:HTML 0.1%