webchela ("web" + "chela") is a daemon for interacting with web pages through automated browsers (Chrome or Firefox).
Provide a plugin endpoint to other tool - gosquito.
- Accepts tasks from clients over single GRPC connection (control/data links).
- Combines tasks into batches, control how many browser instances/tabs should run in parallel.
- Splits fetched data into sized chunks (avoid transport limits).
- Fully controlled on client side (browser type/arguments/extensions etc.).
- Exposes server load to clients (client may skip busy one and switch to an idle server).
- Can utilize GPU with help of VirtualGL/TurboVNC for graphics offloading.
- Works in fully graphical mode (not native headless mode), exposes 590x/VNC ports per browser instance for visual inspection.
- Resizes browser window dynamically.
- Makes full page screenshots and/or specific page elements.
- Passing cookies to pages in JSON.
Start daemon (software rendering)
user@localhost / $ docker run -p 50051:50051 -ti --rm -v /dev/shm:/dev/shm ghcr.io/livelace/webchela:v1.8.2
2024-06-17 21:17:19,438 INFO supervisord started with pid 1
2024-06-17 21:17:20,440 INFO spawned: 'xorg' with pid 2
2024-06-17 21:17:20,442 INFO spawned: 'webchela' with pid 3
2024-06-17 21:17:20,470 WARN exited: xorg (exit status 1; not expected)
2024-06-17 21:17:20 webchela.config INFO Config sample was written successfully: /home/user/.webchela.toml
2024-06-17 21:17:20 webchela.server INFO webchela v1.8.2
Start daemon (GPU rendering)
user@localhost / $ docker run --privileged -p 50051:50051 -ti --rm -v /dev/shm:/dev/shm ghcr.io/livelace/webchela:v1.8.2
2024-06-17 21:19:55,179 INFO supervisord started with pid 1
2024-06-17 21:19:56,180 INFO spawned: 'xorg' with pid 7
2024-06-17 21:19:56,182 INFO spawned: 'webchela' with pid 8
2024-06-17 21:19:56 webchela.server INFO webchela v1.8.2
2024-06-17 21:19:57,782 INFO success: xorg entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-06-17 21:19:57,782 INFO success: webchela entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Use gosquito configuration example
[default]
#browser_type = "chrome"
#browser_extension = [] # crx files included into webchela package
#browser_type = "firefox"
#browser_extension = [] # xpi files included into webchela package
#browser_geometry = "1920x1080"
#browser_geometry = "dynamic" # window will be resized to page content
#browser_instance = 1 # amount of instances will be launched in parallel
#browser_instance_tab = 10
#browser_proxy = "http://1.2.3.4:3128"
#browser_proxy = "socks5://user:pass@1.2.3.4:1080"
#browser_retry_codes = []
#browser_retry_codes_tries = 1
#chrome_driver_path = "/usr/bin/chromedriver"
#chrome_extensions_dir = "<INSTALL_PATH>/extensions/chrome"
#chrome_path = "/usr/bin/google-chrome-stable"
#chrome_profile = "" # only one browser instance at time if set
#chrome_profiles_dir = "/tmp/webchela/chrome"
#firefox_driver_path = "/usr/logcal/bin/geckodriver"
#firefox_extensions_dir = "<INSTALL_PATH>/extensions/firefox"
#firefox_path = "/usr/bin/firefox"
#firefox_profile = "" # only one browser instance at time if set
#firefox_profiles_dir = "/tmp/webchela/firefox"
#chunk_size = "3M"
#cpu_load = 30 # browser is a heavy thing, be careful with limits
#keep_temp = false
#log_level = "DEBUG"
#mem_free = "1G" # browser is a heavy thing, be careful with limits
#page_size = "10M"
#page_timeout = 60
#screenshot_timeout = 30
#script_timeout = 30
#task_timeout = 600
[server]
#listen = "0.0.0.0:50051"
#workers = 10 # set a lower value if you experiencing issues
Making full page screenshots in Chrome may finish with error: "Message: disconnected: Unable to receive message from renderer"
Chromedriver has hardcoded timeout value for making page screenshots - 10 seconds, it could be not enough for really big screenshots. Use Firefox or wait for fix.