rockdaboot / mget

Multithreaded metalink/file/website downloader (like Wget) and C library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

support sitemap files

rockdaboot opened this issue · comments

Download sitemap urls from robots.txt (zipped and unzipped).
Parse these files with Mgets XML parser to fetch all urls.
Respect additional information/schemas from 'urlset', e.g.http://www.google.com/schemas/sitemap-image/1.1.

See http://www.sitemaps.org/protocol.html for more information.

Mget now supports sitemap index files and sitemap files in 'sitemap' format (gzip compressed and uncompressed) and in plain text format. Snanning of RSS and Atom feed formats for sitemap files and within HTML will be supported soon.

Added parsing RSS 2.0 and Atom 1.0 feeds.