algolia / docsearch-scraper

DocSearch - Scraper

Home Page:https://docsearch.algolia.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The error message for host unreachable in older versions was significantly better...

elucidsoft opened this issue · comments

commented

I have been banging my head trying to get this to work and kept getting host unreachable error. After 3 hours, I tried the 1.13.0 docker image instead. The error message it gave was SIGNIFICANTLY better and I was immediately able to recognize the issue. I really suggest you put that back to how it was, I just wasted an immense amount of time.

Hey, could you please provide more context? What are the errors/differences?

The scraper uses the Algolia Python client so I don't think the issue is related to this repo

commented

The error message I was getting was host unreachable on the latest version. On the v1.13.0 message, it told me exactly what was wrong as it showed the entire neterror stack and I could clearly see that I had a malformed credentials.

I had the same behavior. My auto-update script incorrectly added trailing extra whitespace to APPLICATION_ID , because of that docsearch-scrapprer made incorrect hostname. But in the latest version, I got AlgoliaUnreachableHostException: Unreachable hosts without any useful information. After downgrading to v1.13.0 I got some details that allowed me to solve the issue.

Hey @elucidsoft, @Markeli, looking at the past commit, I can only see one change that could cause that: we upgraded the scraper to the latest major version of our Python client, which might handle errors differently.

After downgrading to v1.13.0 I got some details that allowed me to solve the issue.

I believe using an older version won't change the indexing, most updates were to make the scraper more stable and detect the website structure when bootstrapping config.

Note that prior to our new infra, we will only accept community contribution unless there's an urgent fix to do.