eliashaeussler / cache-warmup

🔥 PHP library to warm up caches of URLs located in XML sitemaps

Home Page:https://cache-warmup.dev/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] the script hang after long time running

onlinebizsoft opened this issue · comments

PHP version

PHP8

Package version

2.3.0

Application

Phar

Operating system

Ubuntu 18

Current behavior

The script hang after running some hours. I have checked the server log as well, no new progress at all. it doesn't stop at the same time every time, each time it will stop at a different progress (but always after a few hours)

Expected behavior

The script should show the error or output to a log to see why it stopped

Steps to reproduce

Maybe happen when run with a large sitemap

Additional context

image
image
P/S : 4 failures were reported from very beginning so no relation to the stop.

Hi @onlinebizsoft, to be honest, I've never tested this with such large sitemaps. Can you please add a -v option to the command to get a more verbose output during crawling? That may help debugging why it stopped. I'll probably come up with an option to log errors during crawling.

@onlinebizsoft I just released v2.4.0 of the library which includes a basic log handling. Just pass the --log-file errors.log option and all errors will be logged to a file errors.log. You can of course name the log file anything you like. Read more about it here: https://github.com/eliashaeussler/cache-warmup#--log-file

@eliashaeussler I'm running and will keep you updated on Monday

@onlinebizsoft Have you by chance already tested cache warmup with logging enabled?

@eliashaeussler the log works but I didn't get the case when the script hang yet, lets keep this for now and I will get back when I have some information

Alright, will close the issue for the time being. If the problem occurs again, I'll be happy to reopen it for further investigation.

@eliashaeussler it happened again and the log file doesn't have any related error or log (it only has some messages for some failed URL which after that, the script was still running well). So somehow when I leave the terminal to run when I leave the desk, it will run for a few hours then hang? I can press Ctrl + C to halt the process normally.

@onlinebizsoft Could it be a server-side performance issue? You could try running cache warmup with less parallel runs by using --crawler-options '{"concurrency": 2}', for example. You could also add a short delay between each crawled url with --crawler-options '{"request_options": {"delay": 1000}}. See https://github.com/eliashaeussler/cache-warmup#crawler-configuration for more information about possible crawler options.

I currently don't have a better idea how to tackle this. I'll probably implement some "chunk size" handling in the future to limit number of URLs to crawl per one iteration and continue crawling in subsequent runs.

@eliashaeussler oh now I could see why it happens. If the network has a problem or switch between different internet line then the problem happens, the script will hang.

@onlinebizsoft Interesting, but makes totally sense. OTOH, I'd say that this is out of the scope of the library to handle such cases. I'll close the issue.