getsitemap is a Python library that retrieves all of the URLs that are found in all of the sitemaps on a website.
This project may be useful if you are building a search crawler or sitemap URL status code validators.
You can read the documentation for this project on Read the Docs.
To get started, pip install `getsitemap`:
pip install getsitemap
import getsitemap
urls = getsitemap.get_individual_sitemap("https://jamesg.blog/sitemap.xml")
print(urls)
import getsitemap
all_urls = getsitemap.retrieve_sitemap_urls("https://sitemap")
print(all_urls)
This library uses tox, pytest, and flake8 to assure code quality.
To run code quality checks, run the following command:
tox
License 👩⚖️ ----------
This project is licensed under an MIT License.
We would love to have your help in improving getsitemap. Have an idea for a new feature or a bug to fix? Leave information in a GitHub Issue to start a discussion!
If you have
- capjamesg