This demo shows how Weaviate can be used to explore magazines and newspapers.
Notes:
- We use this dataset for the demos in the Weaviate documentation
- You can use this dataset but note that it's not made for any production use case
- Run the demo directly from the docs
Make sure to have Python3 and pip3 installed.
Only if you want to start with a fresh dataset (the cache folder is populated);
$ rm ./cache-en/*.json
for English dataset and$ rm ./cache-nl/*.json
for Dutch dataset$ python3 download.py <nl | en> <'newspaper' | -a>
. The-a
option is for all newspapers implemented. Below is the list of the available newspapers.
List of available newspapers.
- English:
- 'ft': 'https://www.ft.com'
- 'nyt': 'https://www.theguardian.com/international'
- 'guardian': 'https://www.nytimes.com'
- 'wsj': 'https://www.wsj.com'
- 'cnn': 'https://edition.cnn.com/'
- 'fn': 'https://www.foxnews.com/'
- 'econ': 'https://www.economist.com/'
- 'newyorker': 'https://www.newyorker.com/'
- 'wired': 'https://www.wired.com/'
- 'vogue': 'https://www.vogue.com/'
- 'gi': 'https://www.gameinformer.com/'
- Dutch:
- 'dvhn': 'https://www.dvhn.nl/'
- 'nos': 'https://nos.nl/'
- 'nu': 'https://nu.nl/'
- 'ad': 'https://ad.nl/'
- 'nrc': 'https://nrc.nl/'
- 'telegraaf': 'https://www.telegraaf.nl/'
- 'fd': 'https://fd.nl/'
- 'volkskrant': 'https://volkskrant.nl/'
- 'trouw': 'https://trouw.nl/'
- 'rtl': 'https://www.rtlnieuws.nl/'
- 'elsevier': 'https://www.ewmagazine.nl/'
- 'parool': 'https://www.parool.nl/'
- 'gelderlander': 'https://www.gelderlander.nl/'
- 'tweakers': 'https://tweakers.net/nieuws/'
- 'metro': 'https://www.metronieuws.nl/'
$ pip3 install -r requirements.txt
$ python3 import.py <WEAVIATE_URL> <CACHE_DIR> [BATCH_SIZE]
- e.g.:
$ python3 import.py http://localhost:8080 cache-en 10
- e.g.:
- Make sure you have Weaviate running (See Install Weaviate with Docker Compose).
$ export WEAVIATE_ID=$(echo ${PWD##*/}_weaviate_1 | tr "[:upper:]" "[:lower:]")
$ export WEAVIATE_ORIGIN="http://$(docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' $WEAVIATE_ID):8080"
$ export WEAVIATE_NETWORK=$(docker inspect -f '{{range .NetworkSettings.Networks}}{{.NetworkID}}{{end}}' $WEAVIATE_ID)
$ docker run -i --network=$WEAVIATE_NETWORK -e weaviate_host=$WEAVIATE_ORIGIN -e cache_dir=<YOUR_CACHE_DIR> -e batch_size=<_YOUR_BATCH_SIZE> semitechnologies/weaviate-demo-newspublications:latest
# in the same directory as the docker-compose.yaml
$ export WEAVIATE_ID=$(echo ${PWD##*/}_weaviate_1 | tr "[:upper:]" "[:lower:]")
$ export WEAVIATE_ORIGIN="http://$(docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' $WEAVIATE_ID):8080"
$ export WEAVIATE_NETWORK=$(docker inspect -f '{{range .NetworkSettings.Networks}}{{.NetworkID}}{{end}}' $WEAVIATE_ID)
$ docker run -i --network=$WEAVIATE_NETWORK -e weaviate_host=$WEAVIATE_ORIGIN -e cache_dir=cache-nl -e batch_size=12 semitechnologies/weaviate-demo-newspublications
- Make sure you have Weaviate running .
$ eexport WEAVIATE_ORIGIN=WEAVIATE_ORIGIN
$ docker run -i -e weaviate_host=$WEAVIATE_ORIGIN -e cache_dir=<YOUR_CACHE_DIR> -e batch_size=<_YOUR_BATCH_SIZE> semitechnologies/weaviate-demo-newspublications:latest
$ docker build . --tag="semitechnologies/weaviate-demo-newspublications:latest" --no-cache
$ docker push semitechnologies/weaviate-demo-newspublications:latest
Default values for the envirionmental variables:
- weaviate_host=http://localhost:8080
- cache_dir=cache-en
- batch_size=200
https://gist.github.com/bobvanluijt/af6fe0fa392ca8f93e1fdc96fc1c86d8
You can validate if the GPUs are supported by running:
$ docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi