nova-land / NewsFetch

News API - fetch news from CommonCrawl, parse with NewsPlease, enrich with pre-trained machine-learning models, to structured searchable format

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NewsFetch

subprojects

This repository contains the following subprojects:

Projects that NewsFetch depends on

For enriching the news articles, NewsFetch uses the following projects:

  • Spacy: Spacy is a Python library for natural language processing
  • HuggingFace: HuggingFace hosts pre-trained ML models that is used in NewsFetch for natural language processing

Setup

First install the following:

Recommended, use pyenv to manage your python versions.

Virtual environment

It is highly recommended to use a virtual environment. This is done to avoid conflicts with other projects.

To create a virtual environment, run the following command:

In each subproject, run the following command:

# Create a virtual environment
python -m venv venv

# Activate the virtual environment
source venv/bin/activate

Install dependencies

Poetry is used to install and manage dependencies. It is also used to package the modules/libraries.

Note: The subprojects use relative paths to import the other subprojects/libraries. This is done to make it easier to develop the subprojects.

To install the dependencies, run the following command:

poetry install

About

News API - fetch news from CommonCrawl, parse with NewsPlease, enrich with pre-trained machine-learning models, to structured searchable format

License:MIT License


Languages

Language:Python 97.3%Language:JavaScript 1.7%Language:HTML 0.5%Language:CSS 0.3%Language:Shell 0.1%