algolia / docsearch-scraper

DocSearch - Scraper

Home Page:https://docsearch.algolia.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

run and docker:run commands don't exist

CodeSandwich opened this issue · comments

The linked documentation is out of sync with code in the repository. Following its steps to run the crawler without Docker is impossible, because the commands don't exist anymore.

To be honest I don't think that running a non-dockerized crawler is even possible anymore 🤔

Could you clarify please? It is possible to run the crawl within docker thanks to ./docsearch docker:run

It seems that neither ./docsearch run /path/to/your/config.json nor ./docsearch docker:run exist anymore. Both of them yield command not found error. I'm not surprised, because running ./docsearch --help doesn't print these commands:

Docsearch CLI

Usage:
  ./docsearch command [options] [arguments]

Options:
  --help    Display help message

Available commands:
  bootstrap     Bootstrap a DocSearch config
  test          Run tests
  playground    Launch the playground
 docker
  docker:build  Build scraper images (dev, prod)

If I understand correctly, the only way to run a crawler now is to build a docker image and use it. This is fine, but the docs need to reflect that.

Ok so I will fix that but docker:run and run do work

It does display the full list of available options once you have set up tour environment variables:

./docsearch --help
Docsearch CLI

Usage:
  ./docsearch command [options] [arguments]

Options:
  --help    Display help message

Available commands:
  bootstrap     Bootstrap a DocSearch config
  test          Run tests
  playground    Launch the playground
  run           Run a config
 docker
  docker:build  Build scraper images (dev, prod)
  docker:run    Run a config using docker

Wow, I've never seen CLI behavior like this! 🤯

I'm still having this problem though. I've set up API_KEY and APPLICATION_ID variables, but I'm still getting no run or docker:run. What variables did you set? So they need to be set directly in call to ./docsearch, like API_KEY=<KEY> APPLICATION_ID=<ID> ./docsearch --help and then it's working.

Thank you for your help and sorry about the noise!

All good then. Happy to help! Do not hesitate to raise issue :)

commented

Yeah, I was caught by this too. I am not a python dev, just blindly following the instructions here:
https://docsearch.algolia.com/docs/run-your-own/

I guess I did them a bit out of order.

Make sure you .env file is populated with your keys before you run pipenv shell, if you modify your .env file you have to restart pipenv shell to load your changes. I mean it makes sense, just pipenv is all new to me.