An ETL pipeline to scrape information from news articles, transform the article text to XML via Spacy, and load into SQLite to use for modeling/analysis.
crawler.py
- Web crawler class built with BeautifulSoupdatabase.py
- Database class for sqlite database connectionxml.py
- Extract entities and dependancies from text using spaCymain.py
- Main script for crawling a news site, extracting text using Newspaper, and uploading to a SQLite database
Contact Method | |
---|---|
adamr@hey.com | |
https://www.linkedin.com/in/adamrauckhorst/ |