weaviate / DEMO-NewsPublications

Weaviate demo with news publications

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Newspapers and Magazine // Weaviate

This demo shows how Weaviate can be used to explore magazines and newspapers.

Notes:

  • We use this dataset for the demos in the Weaviate documentation
  • You can use this dataset but note that it's not made for any production use case
  • Run the demo directly from the docs

Installation (for collecting data) (ENGLISH and DUTCH)

Make sure to have Python3 and pip3 installed.

Only if you want to start with a fresh dataset (the cache folder is populated);

  1. $ rm ./cache-en/*.json for English dataset and $ rm ./cache-nl/*.json for Dutch dataset
  2. $ python3 download.py <nl | en> <'newspaper' | -a>. The -a option is for all newspapers implemented. Below is the list of the available newspapers.

List of available newspapers.

  1. English:
  2. Dutch:

Execute the import WITHOUT Docker

  1. $ pip3 install -r requirements.txt
  2. $ python3 import.py <WEAVIATE_URL> <CACHE_DIR> [BATCH_SIZE]
    • e.g.: $ python3 import.py http://localhost:8080 cache-en 10

Execute the import WITH Docker (Weaviate with Docker Compose)

  1. Make sure you have Weaviate running (See Install Weaviate with Docker Compose).
  2. $ export WEAVIATE_ID=$(echo ${PWD##*/}_weaviate_1 | tr "[:upper:]" "[:lower:]")
  3. $ export WEAVIATE_ORIGIN="http://$(docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' $WEAVIATE_ID):8080"
  4. $ export WEAVIATE_NETWORK=$(docker inspect -f '{{range .NetworkSettings.Networks}}{{.NetworkID}}{{end}}' $WEAVIATE_ID)
  5. $ docker run -i --network=$WEAVIATE_NETWORK -e weaviate_host=$WEAVIATE_ORIGIN -e cache_dir=<YOUR_CACHE_DIR> -e batch_size=<_YOUR_BATCH_SIZE> semitechnologies/weaviate-demo-newspublications:latest
# in the same directory as the docker-compose.yaml
$ export WEAVIATE_ID=$(echo ${PWD##*/}_weaviate_1 | tr "[:upper:]" "[:lower:]")
$ export WEAVIATE_ORIGIN="http://$(docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' $WEAVIATE_ID):8080"
$ export WEAVIATE_NETWORK=$(docker inspect -f '{{range .NetworkSettings.Networks}}{{.NetworkID}}{{end}}' $WEAVIATE_ID)
$ docker run -i --network=$WEAVIATE_NETWORK -e weaviate_host=$WEAVIATE_ORIGIN -e cache_dir=cache-nl -e batch_size=12 semitechnologies/weaviate-demo-newspublications

Execute the import WITH Docker (Weaviate on external host or local WITHOUT Docker Compose)

  1. Make sure you have Weaviate running .
  2. $ eexport WEAVIATE_ORIGIN=WEAVIATE_ORIGIN
  3. $ docker run -i -e weaviate_host=$WEAVIATE_ORIGIN -e cache_dir=<YOUR_CACHE_DIR> -e batch_size=<_YOUR_BATCH_SIZE> semitechnologies/weaviate-demo-newspublications:latest

Create Docker

  1. $ docker build . --tag="semitechnologies/weaviate-demo-newspublications:latest" --no-cache
  2. $ docker push semitechnologies/weaviate-demo-newspublications:latest

Default values for the envirionmental variables:

Running with GPU and compose (Debian)

https://gist.github.com/bobvanluijt/af6fe0fa392ca8f93e1fdc96fc1c86d8

You can validate if the GPUs are supported by running:

$ docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

About

Weaviate demo with news publications

License:MIT License


Languages

Language:Python 98.6%Language:Dockerfile 1.4%