Check every `is_robots_noindex` usage to ensure no bugs
leonidasmi opened this issue · comments
Leonidas Milosis commented
To understand the investigation that needs to happen, this is the current situation:
- In the builder of the homepage indexable, we check the value of the WP-native
blog_public
setting to set theis_robot_noindex
value of the homepage indexable - But that
blog_public
setting is an int, so no matter if the Discourage search engines from indexing this site is checked or not, the theis_robot_noindex
of the homepage indexable will always be FALSE! - That
is_robot_noindex
value is the source of truth for showing index or noindex in the robots meta tag - We luckily don't show index for sites that have the Discourage search engines from indexing this site setting checked, solely because at the last moment we also filter on the
is_robot_noindex
value in the indexable presentation. And that filter is used if theblog_public
setting is false-y, to set the robots to noindex. This works here because we cast the setting to string before checking- The really weird thing about this is that the first check in the builder of the homepage indexable used to work seamlessly (!), until we explicitly changed it way back.
- Now, even though the above case is still working as expected, we still use the
is_robot_noindex
around our codebase. And considering that- the source of truth is invalid, if the Discourage search engines from indexing this site is checked
- it's a pretty important part of our product
it is recommended a deep dive on this, to ensure we're covered in all cases where is_robot_noindex
is used.
If nothing else, we probably need to change the homepage indexable handling, so as to start storing the right is_robot_noindex
value (and also consider an upgrade routine to remedy past mishaps).