Empty rules should be ignored
1player opened this issue · comments
I'm dealing with a server which has the following robots.txt file:
# START YOAST BLOCK
# ---------------------------
User-agent: *
Disallow:
# ---------------------------
# END YOAST BLOCK
This library assumes everything is disallowed:
iex(1)> {:ok, rules} = :robots.parse("User-agent: *\nDisallow:\n\n", 200)
%{"*" => {[], [""]}}
iex(2)> :robots.is_allowed("example/1.0", "/", rules)
false
This is incorrect. Google says its crawler completely ignore any empty rules: https://developers.google.com/search/docs/crawling-indexing/robots/robots_txt#disallow
And Yoast, a SEO service which generated the robots file, explicitly shows that snippet as an example of an "allow all" rule: https://yoast.com/ultimate-guide-robots-txt/#syntax (see section titled "The disallow directive")
The simplest way of fixing this would be to do what Google does: if there is no path specified in Allow or Disallow rules, ignore the rule completely.
Sorry for the delay and thanks for submitting the issue. This should be fixed by #18. I will publish a release with the fix soon
Thanks, you have saved me from spending a weekend learning Erlang to push a fix myself :) Much appreciated!