collective / collective.taskqueue2

A modern taskqueue for Plone 5.2/6 on top of Python 3 using the Huey taskqueue package

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

collective.taskqueue2

A taskqueue implementation for Plone 5/6 based on the Huey package.

See https://huey.readthedocs.io/en/latest/

Features

This package can be used as a nearly seamless replacement for collective.taskqueue. It does not interfere with WSGI or ZServer and should be compatible with most up-to-date Plone 5.2 and Plone 6.X installations. Its main purpose is to allow you to schedule asynchronous operations directly from your application code. Additionally, thanks to the integration of Huey as the foundation for collective.taskqueue2, you can also schedule periodic tasks in a cron-style manner.

The collective.taskqueue package supports multiple backend storage options, including Redis, Sqlite, in-memory, and filesystem. However, in most cases, Redis is the preferred choice for production environments, while Sqlite or in-memory storage are commonly used for development purposes.

Installation

Install collective.taskqueue2 by adding it to your buildout::

[buildout]

...

eggs =
    collective.taskqueue2

and then running bin/buildout

Configuration

Environment variable HUEY_CONSUMER

The HUEY_CONSUMER environment variable determines whether the current Plone/Zope instance functions as a consumer of the task queue. It can have the values 1, True, true, or on to indicate that the instance is a consumer. Any other value will be considered as indicating that the instance is not a consumer.

Environment variable HUEY_LOG_LEVEL

HUEY_LOG_LEVEL is an environment variable used to configure the logging level for the collective.taskqueue2 package, which is based on the Huey package.

Here are some key points about HUEY_LOG_LEVEL:

  • It is an environment variable, which means it is a configuration setting that can be set outside of the code.
  • The variable is used to control the logging level of the task queue implementation.
  • The logging level determines the verbosity of the log messages generated by the collective.taskqueue2 package.
  • The available logging levels vary depending on the logging framework being used. Common levels include DEBUG, INFO, WARNING, ERROR, and CRITICAL, with DEBUG being the most verbose and CRITICAL being the least verbose.
  • By setting the HUEY_LOG_LEVEL environment variable, you can control the amount of log output produced by the task queue implementation.
  • The specific values that HUEY_LOG_LEVEL can take and their corresponding meanings will depend on the implementation details of the collective.taskqueue2 package.

Environment variable HUEY_TASKQUEUE_URL

The code relies on the HUEY_TASKQUEUE_URL environment variable to determine the configuration of the task queue. If the environment variable is not set, it falls back to a default value (sqlite:///tmp/huey_queue.sqlite). The HUEY_TASKQUEUE_URL should be set as a string representing the URL of the task queue configuration.

To use the code with different task queue configurations, you can set the HUEY_TASKQUEUE_URL environment variable with a URL representing the desired configuration. Here are some examples of URL formats for different configurations:

  • SQLite: HUEY_TASKQUEUE_URL=sqlite:///path/to/database.sqlite
  • Redis: HUEY_TASKQUEUE_URL=redis://localhost:6379/0
  • Memory: HUEY_TASKQUEUE_URL=memory://
  • File system: HUEY_TASKQUEUE_URL=file:///path/to/queue/folder

Make sure to adjust the URLs according to your specific environment.

Examples

Here are examples of different URL configurations for each supported scheme:

  1. SQLite:

    HUEY_TASKQUEUE_URL=sqlite:///path/to/database.sqlite

    This URL configures the task queue to use SQLite with a specific database file.

  2. Redis:

    HUEY_TASKQUEUE_URL=redis://localhost:6379/0

    This URL configures the task queue to use Redis with a specific host (localhost), port (6379), and database (0).

  3. Memory:

    HUEY_TASKQUEUE_URL=memory://

    This URL configures the task queue to use an in-memory storage. No additional parameters are needed.

  4. File:

    HUEY_TASKQUEUE_URL=file:///path/to/queue/folder

    This URL configures the task queue to use a file-based storage with a specific folder path.

Ensure that you set the appropriate URL corresponding to the desired scheme before running the code.

The huey_taskqueue object created based on the URL configuration can be used further in the application for task queuing and processing.

Consumer configuration

The configuration options for the Huey consumer are defined in the consumer_options dictionary. These options can be overridden using environment variables. Here are the available options:

  • backoff: The backoff factor for retrying failed tasks.
  • check_worker_health: Whether to periodically check the health of the worker.
  • extra_locks: Additional locks to acquire during task execution.
  • flush_locks: Whether to flush locks after task execution.
  • health_check_interval: The interval (in seconds) for checking worker health.
  • initial_delay: The initial delay (in seconds) before processing tasks.
  • max_delay: The maximum delay (in seconds) for exponential backoff.
  • periodic: Whether to enable periodic tasks.
  • scheduler_interval: The interval (in seconds) for running periodic tasks.
  • worker_type: The type of worker to use (e.g., "thread" or "process").
  • workers: The number of worker threads or processes to use.
  • verbose: Whether to enable verbose logging.

Environment Variables

The configuration options can be overridden using environment variables. The environment variables should be prefixed with HUEY_. Here are some examples:

  • HUEY_WORKERS: The number of worker threads or processes.
  • HUEY_LOGFILE: The path to the log file.
  • HUEY_VERBOSE: Whether to enable verbose logging.
  • HUEY_WORKER_TYPE: The type of worker to use.
  • HUEY_PERIODIC: Whether to enable periodic tasks.
  • HUEY_SCHEDULER_INTERVAL: The interval (in seconds) for running periodic tasks.
  • HUEY_INITIAL_DELAY: The initial delay (in seconds) before processing tasks.
  • HUEY_MAX_DELAY: The maximum delay (in seconds) for exponential backoff.
  • HUEY_BACKOFF: The backoff factor for retrying failed tasks.
  • HUEY_HEALTH_CHECK_INTERVAL: The interval (in seconds) for checking worker health.
  • HUEY_CHECK_WORKER_HEALTH: Whether to periodically check the health of the worker.
  • HUEY_EXTRA_LOCKS: Additional locks to acquire during task execution.
  • HUEY_FLUSH_LOCKS: Whether to flush locks after task execution.

It is strongly recommended to keep the existing configuration default values and change the configuration only if you know what you are doing. Please refer the Huey documentation first for understanding the configuration options and their impact.

Console output

After installing collective.taskqueue2 in Plone, you should see the following output on the console (with HUEY_LOG_LEVEL=DEBUG and HUEY_CONSUMER=1 set):

2023-11-21 11:02:59,012 INFO    [huey.consumer:386][Thread-1 (run)] Huey consumer started with 1 thread, PID 76861 at 2023-11-21 10:02:59.012894
2023-11-21 11:02:59,012 INFO    [huey:77][MainThread] collective.taskqueue2: consumer thread started.
2023-11-21 11:02:59,013 INFO    [huey.consumer:389][Thread-1 (run)] Scheduler runs every 1 second(s).
2023-11-21 11:02:59,013 INFO    [huey.consumer:391][Thread-1 (run)] Periodic tasks are enabled.
Starting server in PID 76861.
2023-11-21 11:02:59,014 INFO    [huey.consumer:398][Thread-1 (run)] The following commands are available:
+ collective.taskqueue2.huey_tasks.dump_queue_stats
+ collective.taskqueue2.huey_tasks.schedule_browser_view

Using collective.taskqueue2 inside your Plone application code

The example demonstrates a common use case of starting a dedicated browser view asynchronously.

In this scenario, we schedule the browser view /magazine/@@debug-demo-view to run in the context of the portal object located at context_path.

The function takes the following parameters:

  • view_name: The name of the browser view to be executed asynchronously. It is important to specify the view name with a leading @@ symbol.
  • context_path: The path of the context object for the view within the Plone portal. It can be obtained by joining the physical path of the context object using "/".join(context.getPhysicalPath()).
  • site_path: The path to the root of the Plone portal.
  • username: The name of the user under which the view will be executed. It's important to exercise caution when using third-party code that may provide a username with higher privileges.
  • params: A Python dictionary of parameters that will be passed to the browser request. These parameters will be available in self.context.request.form within the browser view.
# bin/instance run scripts/huey_client.py

from datetime import datetime

import logging

from collective.taskqueue2.huey_tasks import schedule_browser_view

now = datetime.now().isoformat()
schedule_browser_view(
    view_name="debug-demo-view",
    context_path="/magazine",
    site_path="/magazine",
    username="admin",
    params=dict(foo="bar", bar="foo", meaning_of_life=42, now=now),
)

You may wrap the code above into a custom method that would provide the context_path, site_path and username from the current calling context like:

from datetime import datetime
import plone.api

from collective.taskqueue2.huey_tasks import schedule_browser_view

now = datetime.now().isoformat()
schedule_browser_view(
    view_name="debug-demo-view",
    context_path="/".join(context.getPhysicalPath()),
    site_path="/".join(plone.api.portal.get().getPhysicalPath()),
    username=plone.api.user.get_current().getId(),
    params=dict(foo="bar", bar="foo", meaning_of_life=42, now=now),
)

Writing your own Huey tasks in your application

In case you have specific requirements beyond scheduling a browser view, you have the option to create your own Huey tasks. You can refer to the documentation at https://huey.readthedocs.io/en/latest/ for more details on creating custom tasks with Huey.

To use your custom Huey tasks effectively, it is important to register them during the startup phase of Plone and Zope. This ensures that your tasks are properly initialized and available for execution.

Inside your package foo.bar you may provide your own tasks in a file foo.bar/foo/bar/huey_tasks.py like

from collective.taskqueue2.huey_config import huey_taskqueue

@huey_taskqueue.task()
def my_task(*args, **kw):
   # do something

and import huey_tasks.py e.g. inside foo.bar/foo/bar/__init__.py.

Your own application (e.g. as part of an event listener) may call

def listen_event(event):

   context = event.context
   result = my_task(context=context, foo="bar", bar="42")

Please read the Huey documentation on result handling (in case you need to access the result for whatever reason).

Security

The current implementation of collective.taskqueue2 is intended for internal environments where you have complete control over your code and dependencies. It is designed to be used within trusted environments.

It is important to note that when scheduling a browser view call, it will be executed using the specified username. This can potentially introduce a significant security risk if you are using third-party code that is not under your control with collective.taskqueue2. In such cases, an attacker could potentially specify a common username such as "admin", which is typically associated with manager-level rights. It is crucial to exercise caution in these situations.

Given the importance of security, it is recommended to consider implementing a stronger security mechanism in future versions to address this potential vulnerability.

Browser view(s)

The package provides a browser view @@taskqueue-stats (on the Plone root) that returned the current queue status as JSON:

{
  "pending": 0,
  "scheduled": 0
}

Logging

The package utilizes the standard Plone/Zope logger for logging purposes. Typically, the log information is stored in the var/log/instance.log file or a related file if using a ZEO setup.

Configuration of collective.taskqueue2 with a ZEO setup

In a ZEO setup, you need to determine which ZEO client(s) will serve as task queue consumers. This is done by setting HUEY_CONSUMER=1 in the environment of the relevant ZEO client(s).

Additionally, the HUEY_TASKQUEUE_URL must be configured for all ZEO clients that will add tasks to the task queue. It's important to ensure that all ZEO clients point to the same storage backend.

It is possible to have multiple consumers, where each consumer is responsible for executing a specific task. Having multiple consumers can be beneficial in certain situations. However, it's important to be aware that conflict errors can occur with ZEO clients, just like with any other ZEO setup. It's worth noting that collective.taskqueue2 does not provide any special support for handling conflict errors.

To do:

  • make consumer configuration configurable in huey_consumer.py

Authors

Andreas Jung info@zopyx.com for UniversitĂ  di Bologna/University of Bologna.

Project sponsor

The collective.taskqueue2 was developed as a component of a Plone 6 migration project for UniversitĂ  di Bologna and has been made available as an open-source solution.

Contribute

License

The project is licensed under the GPLv2.

About

A modern taskqueue for Plone 5.2/6 on top of Python 3 using the Huey taskqueue package

License:GNU General Public License v2.0


Languages

Language:Python 97.6%Language:Shell 2.4%