VIPnytt / SitemapParser

XML Sitemap parser class compliant with the Sitemaps.org protocol.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

suggest to catch exception for unexpected sitemap xml content

woei66 opened this issue · comments

Sometimes, we will fetch sitemap xml with deny message like below

<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>7F33HDMXW5ACA6XM</RequestId><HostId>fDtErV52NjvRWoigB8xew2jT1lHs/PILta/bNsoisgcjt7QFS4i1UQeKvV/4fMk56GSF7cGu398=</HostId></Error>

For this case, I suggest to add try {} catch () to handle this case in the SitemapParser class

try {
            $response = is_string($urlContent) ? $urlContent : $this->getContent();
        } catch (Exceptions\SitemapParserException $e) {
            throw new Exceptions\SitemapParserException($e->getMessage());
        } catch (Exceptions\TransferException $e) {
            throw new Exceptions\TransferException($e->getMessage());
        }

Do you have a link?

If the web server is returning an HTTP 401 Unauthorized (or another non-2xx code), this should already be caught at line 234.

curl --head "https://img-fnc.ebc.net.tw/EbcFnc/Rss/Sitemap/sitemap-201706.xml"
HTTP/2 403
content-type: application/xml
date: Sun, 11 Jul 2021 15:20:37 GMT
server: AmazonS3
x-cache: Error from cloudfront
via: 1.1 2231c949de9065c80ccf59ccb6e56be2.cloudfront.net (CloudFront)
x-amz-cf-pop: TPE50-C1
x-amz-cf-id: H9Yd2gpbqpQblLRw62aymSGTFIu0tzsc3kHGevODUADL07JkD_tNRw==

Have you tried catching \vipnytt\SitemapParser\Exceptions\TransferException or even \vipnytt\SitemapParser\Exceptions\SitemapParserException ?

Rather than hiding/silencing any HTTP errors during parsing, it is actually caught by TransferException.

Try this code:

use \vipnytt\SitemapParser;

$parser = new SitemapParser();

try {
    $parser->parse('https://img-fnc.ebc.net.tw/EbcFnc/Rss/Sitemap/sitemap-201706.xml');
} catch (SitemapParser\Exceptions\TransferException $e) {
    echo 'TransferException caught: '.$e->getMessage();
} catch (SitemapParser\Exceptions\SitemapParserException $e) {
    echo 'SitemapParserException caught: '.$e->getMessage();
}

Output:

TransferException caught: Unable to fetch URL contents

FYI: When parsing recursively using $parser->parseRecursive('https://example.com/sitemap.xml'), no such exceptions will be thrown. This is also by design, to prevent a single URL from messing up, while also parsing (any) nested URLs.
Replacing $parser->parse(...) with $parser->parseRecursive(...) in the above example, will result in an empty output.

ok, thank you.