Repository contains Django application with microservice getting URL from user with parameters to get text or images or both and saves it to database. Next periodically celery worker extracts given URL for those parameters and saves it in database. All tasks, saved images and saved texts are visible in REST API.
Firstly copy file with environ variables .env_template
with new name .env
. You can do it by command:
cp .env_template .env
Because of whole environment is containerized by Docker. You have to make sure that docker and docker-compose are installed. To run pull all images, create database store directory and finally run all service run command:
docker-compose up -d
To add new task to extract texts or/and images run curl command:
curl -d "url=http://www.example_url.com/&get_image=true" -X POST http://localhost:8000/api/tasks/
Above command get all images from url 'http://www.example_url.com'. If you want get also whole text, then add to data
get_text=true
.
All tasks are visible on localhost:8000/api/tasks/
which can be filtering and ordering. For example if you want to see
only completed tasks add parameter ?state=success
.
All completed tasks which extract images are visible on localhost:8000/api/images/
where you can download image clicking
on path value of image
key.
All completed tasks which extract texts are visible on localhost:8000/api/texts/
. Texts are saved in database in json
list, because this type of structure should be more helpful for ML developers than joined one huge string.