t1gor / Robots.txt-Parser-Class

Php class for robots.txt parse

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sitemap shuld be useragent independent

JanPetterMG opened this issue · comments

According to Google, sitemaps is a "non-group-member" record.
This means no matter where sitemaps are placed in a robots.txt file, it shuld not be grouped by any useragent, its actually completely independent.
Source: https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt#google-supported-non-group-member-records

Sitemap: http://example.com/sitemap.xml

User-agent: *
Disallow: /admin/
Sitemap: http://somesite.com/sitemap.xml

User-agent: Googlebot
Disallow: /private/
Sitemap: http://internet.com/sitemap.xml

User-agent: Yahoo
Disallow: /noaccess/
Sitemap: http://worldwideweb.com/sitemap.xml

The above robots.txt file returns only 2 sitemaps, instead of 4 (bug)...

require_once("/vendor/autoload.php"); 
$parser = new RobotsTxtParser(file_get_contents("robots.txt")); 

var_dump($parser->getSitemaps()); 
array(2) { [0]=> string(30) "http://example.com/sitemap.xml" [1]=> string(31) "http://somesite.com/sitemap.xml" }