Sitemap shuld be useragent independent
JanPetterMG opened this issue · comments
According to Google, sitemaps is a "non-group-member" record.
This means no matter where sitemaps are placed in a robots.txt file, it shuld not be grouped by any useragent, its actually completely independent.
Source: https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt#google-supported-non-group-member-records
Sitemap: http://example.com/sitemap.xml
User-agent: *
Disallow: /admin/
Sitemap: http://somesite.com/sitemap.xml
User-agent: Googlebot
Disallow: /private/
Sitemap: http://internet.com/sitemap.xml
User-agent: Yahoo
Disallow: /noaccess/
Sitemap: http://worldwideweb.com/sitemap.xml
The above robots.txt file returns only 2 sitemaps, instead of 4 (bug)...
require_once("/vendor/autoload.php");
$parser = new RobotsTxtParser(file_get_contents("robots.txt"));
var_dump($parser->getSitemaps());
array(2) { [0]=> string(30) "http://example.com/sitemap.xml" [1]=> string(31) "http://somesite.com/sitemap.xml" }