suggest to catch exception for unexpected sitemap xml content
woei66 opened this issue · comments
Sometimes, we will fetch sitemap xml with deny message like below
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>7F33HDMXW5ACA6XM</RequestId><HostId>fDtErV52NjvRWoigB8xew2jT1lHs/PILta/bNsoisgcjt7QFS4i1UQeKvV/4fMk56GSF7cGu398=</HostId></Error>
For this case, I suggest to add try {} catch () to handle this case in the SitemapParser class
try {
$response = is_string($urlContent) ? $urlContent : $this->getContent();
} catch (Exceptions\SitemapParserException $e) {
throw new Exceptions\SitemapParserException($e->getMessage());
} catch (Exceptions\TransferException $e) {
throw new Exceptions\TransferException($e->getMessage());
}
Do you have a link?
If the web server is returning an HTTP 401 Unauthorized (or another non-2xx code), this should already be caught at line 234.
You can try this url
https://img-fnc.ebc.net.tw/EbcFnc/Rss/Sitemap/sitemap-201706.xml
curl --head "https://img-fnc.ebc.net.tw/EbcFnc/Rss/Sitemap/sitemap-201706.xml"
HTTP/2 403
content-type: application/xml
date: Sun, 11 Jul 2021 15:20:37 GMT
server: AmazonS3
x-cache: Error from cloudfront
via: 1.1 2231c949de9065c80ccf59ccb6e56be2.cloudfront.net (CloudFront)
x-amz-cf-pop: TPE50-C1
x-amz-cf-id: H9Yd2gpbqpQblLRw62aymSGTFIu0tzsc3kHGevODUADL07JkD_tNRw==
Have you tried catching \vipnytt\SitemapParser\Exceptions\TransferException
or even \vipnytt\SitemapParser\Exceptions\SitemapParserException
?
Rather than hiding/silencing any HTTP errors during parsing, it is actually caught by TransferException
.
Try this code:
use \vipnytt\SitemapParser;
$parser = new SitemapParser();
try {
$parser->parse('https://img-fnc.ebc.net.tw/EbcFnc/Rss/Sitemap/sitemap-201706.xml');
} catch (SitemapParser\Exceptions\TransferException $e) {
echo 'TransferException caught: '.$e->getMessage();
} catch (SitemapParser\Exceptions\SitemapParserException $e) {
echo 'SitemapParserException caught: '.$e->getMessage();
}
Output:
TransferException caught: Unable to fetch URL contents
FYI: When parsing recursively using $parser->parseRecursive('https://example.com/sitemap.xml')
, no such exceptions will be thrown. This is also by design, to prevent a single URL from messing up, while also parsing (any) nested URLs.
Replacing $parser->parse(...)
with $parser->parseRecursive(...)
in the above example, will result in an empty output.
ok, thank you.