crate / crate-jdbc

A JDBC driver for CrateDB.

Home Page:https://crate.io/docs/jdbc/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Outdated documentation variants/versions still in Google index

amotl opened this issue · comments

Hi there,

originally, @msbt found #342 on https://crate.io/docs/jdbc/en/2.0/.

It looks like some of the outdated versions are still indexed by Google, see https://eu.startpage.com/do/dsearch?query=cratedb+jdbc. Maybe we should add appropriate redirects?

With kind regards,
Andreas.

commented

adding redirects for these old versions would work, but it doesn't solve the underlying issue (why are these versions still in google?)

these versions should have fallen out of google's index after we added the custom robot.txt files

https://crate.io/docs/jdbc/en/robots.txt
https://crate.io/docs/jdbc/en/latest/robots.txt
https://crate.io/docs/jdbc/en/2.0/robots.txt

PR here: #330

discussion here: #330

specifically, see my comment here:

https://github.com/crate/twd-archive/issues/11#issuecomment-675532185

take a look at the discussion and lemme know what you think. you might also want to investigate this on your own to see if you can find anything I missed the first time around

one potential lead: do you have access to the google webmaster console? it has tools that let you inspect the robots directives and sitemaps it is aware of for your site. seems like a good place to start

commented

bounced it over to you. feel free to bounce it back

one potential lead: do you have access to the google webmaster console? it has tools that let you inspect the robots directives and sitemaps it is aware of for your site. seems like a good place to start.

Can you have a look at this, @msbt or @proddata?

@norosa @amotl "Crawlers don't check for robots.txt files in subdirectories." Our goal back then was to remove the RTD links from the google index, which worked fine because there the robots.txt resides in root, but not if it's being routed through the reverse proxy.

The search console says "Indexed, not submitted in sitemap", so as long as the page is available publicly, it will stay there I reckon.

Adding redirects for these old versions would work, but it doesn't solve the underlying issue (why are these versions still in google?) [@norosa]

Crawlers don't check for robots.txt files in subdirectories. The search console says "Indexed, not submitted in sitemap", so as long as the page is available publicly, it will stay there I reckon. [@msbt]

@norosa: Wouldn't adding appropriate redirects be a valid option then?
@msbt: Is it possible to delete specific pages from the index within google webmaster console?

@msbt: Is it possible to delete specific pages from the index within google webmaster console?

Sure, we can request url-removal for outdated content, but don't we want multiple versions available for older releases?

I believe it will be fine to have them available, right? I would vote for removing them from the Google index because I believe it will be sufficient to let users always be guided to the most recent version of the documentation when searching on Google.

Please check https://eu.startpage.com/do/dsearch?query=cratedb+jdbc vs. https://www.bing.com/search?q=cratedb+jdbc. Doesn't Bing have the "better" search results there?

@norosa: Do you have any objections on deleting outdated versions of the documentation from the google index?

commented

no objections. in fact, anything not currently included in the active versions dropdown should be perged. from the RTD servers and google if possible. that stuff should be gone gone. we have rewrites in place to send visitors to the correct and more up-to-date versions

no objections

Fine. Let's proceed then and delete the outdated versions from anything which we can attach to?

I could not find any other outdated documentation variants as listed in the original post. Thanks for all your attention, closing this now. Please let me know if you think something is still wrong.