Sitemap.xml is not being parsed correctly
mkantautas opened this issue · comments
e.g. https://taxibambino.com/sitemap.xml only 2 pages are being parsed.
This tool does not parse the sitemap.xml file, it creates one.
I just assumed, because simplecrawler parses sitemaps directives by default and as I understand simplecrawler is the core of this package. Any the main issue seems to be with the site itself, giving inconsistent results - one day sitemap generator works(indexing all pages), the next day it only catches the main page and main page's sitemap.xml (because there is a link to it in the robot.txt) - otherwise it doesn't finds sitemap.xml
Actually you are right. If the sitemap is linked in the robots.txt
it should be parsed.