News Article Text -> SQLite Database

An ETL pipeline to scrape information from news articles, transform the article text to XML via Spacy, and load into SQLite to use for modeling/analysis.

Files

crawler.py - Web crawler class built with BeautifulSoup
database.py - Database class for sqlite database connection
xml.py - Extract entities and dependancies from text using spaCy
main.py - Main script for crawling a news site, extracting text using Newspaper, and uploading to a SQLite database

Contact Me

Contact Method
Email	adamr@hey.com
LinkedIn	https://www.linkedin.com/in/adamrauckhorst/

About

A simple ETL pipeline to extract information from news articles, transform the article text to XML via Spacy, and load into SQLite.

Languages

Language:Python 100.0%