Issue with rules containing both wildcards * and an end anchor $
webarchitect609 opened this issue · comments
Bug Report
SUMMARY
The end of the URL directive $
is not supported as described here
https://developers.google.com/search/reference/robots_txt
STEPS TO REPRODUCE
Save in a file and run following code:
<?php
require 'vendor/autoload.php';
$robotsTxtContent = <<<END
User-agent: *
Disallow: /*.jpg$
END;
$txtClient = new \vipnytt\RobotsTxtParser\TxtClient('http://example.com', 200, $robotsTxtContent);
var_dump($txtClient->userAgent('*')->isDisallowed('/image.jpg'));
var_dump($txtClient->userAgent('*')->isDisallowed('http://example.com/image.jpg'));
var_dump($txtClient->userAgent('*')->isDisallowed('http://example.com/foo/bar/image.jpg'));
EXPECTED RESULTS
Three times of bool(true)
"var_dumped"
ACTUAL RESULTS
Three times of bool(false)
"var_dumped"
/usr/bin/php test.php
bool(false)
bool(false)
bool(false)
Process finished with exit code 0
The issue is isolated to rules containing both *
(wildcard) and $
(end anchor). Rules containing only one of these, are unaffected (all kind of different tests passes).
The bug is here: /src/Client/Directives/DirectiveClientTrait.php#L113
Seems like no other robots.txt
parsers has any solution either (that I'm aware of). Any help is appreciated!
Fixed in version 2.0.1
Thank you for the bug report!