Outdated documentation variants/versions still in Google index
amotl opened this issue · comments
Hi there,
originally, @msbt found #342 on https://crate.io/docs/jdbc/en/2.0/.
It looks like some of the outdated versions are still indexed by Google, see https://eu.startpage.com/do/dsearch?query=cratedb+jdbc. Maybe we should add appropriate redirects?
With kind regards,
Andreas.
adding redirects for these old versions would work, but it doesn't solve the underlying issue (why are these versions still in google?)
these versions should have fallen out of google's index after we added the custom robot.txt files
https://crate.io/docs/jdbc/en/robots.txt
https://crate.io/docs/jdbc/en/latest/robots.txt
https://crate.io/docs/jdbc/en/2.0/robots.txt
PR here: #330
discussion here: #330
specifically, see my comment here:
https://github.com/crate/twd-archive/issues/11#issuecomment-675532185
take a look at the discussion and lemme know what you think. you might also want to investigate this on your own to see if you can find anything I missed the first time around
one potential lead: do you have access to the google webmaster console? it has tools that let you inspect the robots directives and sitemaps it is aware of for your site. seems like a good place to start
bounced it over to you. feel free to bounce it back
@norosa @amotl "Crawlers don't check for robots.txt files in subdirectories." Our goal back then was to remove the RTD links from the google index, which worked fine because there the robots.txt resides in root, but not if it's being routed through the reverse proxy.
The search console says "Indexed, not submitted in sitemap", so as long as the page is available publicly, it will stay there I reckon.
Adding redirects for these old versions would work, but it doesn't solve the underlying issue (why are these versions still in google?) [@norosa]
Crawlers don't check for robots.txt files in subdirectories. The search console says "Indexed, not submitted in sitemap", so as long as the page is available publicly, it will stay there I reckon. [@msbt]
@norosa: Wouldn't adding appropriate redirects be a valid option then?
@msbt: Is it possible to delete specific pages from the index within google webmaster console?
@msbt: Is it possible to delete specific pages from the index within google webmaster console?
Sure, we can request url-removal for outdated content, but don't we want multiple versions available for older releases?
I believe it will be fine to have them available, right? I would vote for removing them from the Google index because I believe it will be sufficient to let users always be guided to the most recent version of the documentation when searching on Google.
Please check https://eu.startpage.com/do/dsearch?query=cratedb+jdbc vs. https://www.bing.com/search?q=cratedb+jdbc. Doesn't Bing have the "better" search results there?
@norosa: Do you have any objections on deleting outdated versions of the documentation from the google index?
no objections. in fact, anything not currently included in the active versions dropdown should be perged. from the RTD servers and google if possible. that stuff should be gone gone. we have rewrites in place to send visitors to the correct and more up-to-date versions
no objections
Fine. Let's proceed then and delete the outdated versions from anything which we can attach to?
I could not find any other outdated documentation variants as listed in the original post. Thanks for all your attention, closing this now. Please let me know if you think something is still wrong.