Authors: Michael Wan, Balaji Veeramani
Uses NLP to accurately extract citation information from any online website
- dateparser
- html2text
- keras
- numpy
- PyPDF2
- scikit-learn
- termcolor
- timeoutdecorator
- tensorflow
- fake-useragent
pip install dateparser html2text keras PyPDF2 termcolor
- Would preserving capitalization help the model? (E.g names usually are capitalized or all-caps, titles are usually capitalized)
SSH onto Instance: gcloud compute --project "autocitertraining" ssh --zone "us-west1-a" "overpowered-autociter"
SCP Files to Instance: gcloud compute scp --recurse * overpowered-autociter:~/[$PWD]
SCP Remote Instance Files to Local: gcloud compute scp --recurse overpowered-autociter:~/[$PWD]/assets/files/ml assets/files/ml