Dashboard doesn't show jobs

Question

Dashboard doesn't show jobs

ekeric13 opened this issue a year ago · comments

I am having a lot of trouble getting jobs to show up in the dashboard

Here is my setup:

class Queue(saq.Queue):
    """[SAQ Queue](https://github.com/tobymao/saq/blob/master/saq/queue.py)

    Configures `orjson` for JSON serialization/deserialization if not otherwise configured.

    Parameters
    ----------
    *args : Any
        Passed through to `saq.Queue.__init__()`
    **kwargs : Any
        Passed through to `saq.Queue.__init__()`
    """

    def __init__(self, *args: Any, **kwargs: Any) -> None:
        kwargs.setdefault("dump", encoder.encode)
        kwargs.setdefault("load", decoder.decode)
        kwargs.setdefault("name", "background-worker")
        super().__init__(*args, **kwargs)

    def namespace(self, key: str) -> str:
        """Make the namespace unique per app."""
        return f"{settings.app.slug}:{self.name}:{key}"


queue = Queue(redis)

...

queue_settings = {
    "queue": queue,
    "concurrency": 10,
}

def start_worker(functions: Collection[WorkerFunction]) -> None:
    global queue_settings
    """

    Args:
        functions: Functions to be called via the async workers.

    Returns:
        The worker instance, instantiated with `functions`.
    """
    # return Worker(queue, functions, concurrency=10)
    # saq.start
    queue_settings["functions"] = functions
    start("app.lib.worker.queue_settings", web=True, port=8080)

    res = await queue.enqueue(
        "initiate_training",
        data=data,
        key=initiate_training_namespace(order_id),
        timeout=30,
        retries=20,
        retry_delay=5,
        retry_backoff=True,
    )

And this is my redis keys:

1) "starlite-pg-redis-docker:background-worker:stats:30d27d7c-c20b-11ed-adf6-0242ac130003"
2) "starlite-pg-redis-docker:background-worker:incomplete"
3) "saq:job:background-worker:initiate_training:2c7880db-90a3-4791-baa0-83ace2d0098b"
4) "starlite-pg-redis-docker:background-worker:stats"
5) "starlite-pg-redis-docker:background-worker:schedule"
6) "starlite-pg-redis-docker:background-worker:sweep"
7) "saq:job:background-worker:initiate_training:c673af4b-fcd5-43cc-9c66-36de903e51a6"

Do you see anything wrong with what i am doing?

Toby Mao · Answer 1 · Tue Mar 14 2023 10:09:37 GMT+0800 (China Standard Time)

seems like you're doing some custom things that could be affecting it. does it work if you don't have any overrides especially with namespace?

Toby Mao · Answer 2 · Tue Mar 14 2023 10:11:37 GMT+0800 (China Standard Time)

i'm able to see the ui on latest main, so i don't think this is an issue with saq, but moreso an issue with what you're doing, closing for now, happy to help you debug. my guess is that the issue is with you overriding name space

Eric Kennedy · Answer 3 · Tue Mar 14 2023 10:15:50 GMT+0800 (China Standard Time)

Yeah I am using a modified version of this: https://github.com/starlite-api/starlite-pg-redis-docker/blob/main/app/lib/worker.py
(wanted to get the web interface working and just doing Worker() didn't seem to support that).

Yeah avoiding the namespace getting overwritten worked!

Also how do I make it so the web ui is only accessible via username password other than env variables?

  AUTH_USER     basic auth user, defaults to admin
  AUTH_PASSWORD basic auth password, if not specified, no auth will be used

Like can i pass some these values explicitly into extra_web_settings?

Toby Mao · Answer 4 · Tue Mar 14 2023 10:34:26 GMT+0800 (China Standard Time)

the only way is with env variables at the moment, look at web.py create_app

Eric Kennedy · Answer 5 · Tue Mar 14 2023 11:38:13 GMT+0800 (China Standard Time)

If a job is continuosly retrying and failing and i realize something is wrong with the associated data, do i just delete the key for that job:
"saq:job:background-worker:initiate_training:c673af4b-fcd5-43cc-9c66-36de903e51a6"? or is it more complicated for that.

Essentially the admin UI is great but it is read only. I don't mind having to use the redis-cli for admin actions but I don't know what actions i should take...

Toby Mao · Answer 6 · Tue Mar 14 2023 11:46:41 GMT+0800 (China Standard Time)

given a job, you can call abort on it. you can also have a retry limit

Eric Kennedy · Answer 7 · Tue Mar 14 2023 12:00:28 GMT+0800 (China Standard Time)

Yeah I saw I can call abort on a job but i am talking about after the fact. Like i queue up a job and then an hour later i see in the logs it is failing and the reason it is failing is unrecoverable.
I need to delete that job.
I no longer have a reference to the job I queued as it was from an endpoint that exited.

How would I abort that on going job? is there not a way i can manually update redis to abort? the way i might change the consumer offset for a consumer group in kafka if i wanted to skip a job that would just keep on retrying.

Toby Mao · Answer 8 · Tue Mar 14 2023 12:07:38 GMT+0800 (China Standard Time)

you need to keep a reference to jobs if you want to abort it.
you can get a job by it's id with queue.job(job_key), then you can abort it

Eric Kennedy · Answer 9 · Wed Mar 15 2023 10:46:22 GMT+0800 (China Standard Time)

Okay created some api endpoints around that.

I am having issues with using a connection pool of redis + saq.

Is there any guidance on the max number of connections needed?

I am using a redis that only allows 25 connections but when I try and limit the number of connections redis can open saq fails to work.

app_dev  | ERROR:saq:Error processing job None
app_dev  | Traceback (most recent call last):
app_dev  |   File "/usr/local/lib/python3.11/site-packages/saq/worker.py", line 234, in process
app_dev  |     job = await self.queue.dequeue(self.dequeue_timeout)
app_dev  |           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
app_dev  |   File "/usr/local/lib/python3.11/site-packages/saq/queue.py", line 425, in dequeue
app_dev  |     if await self.version() < (6, 2, 0):
app_dev  |        ^^^^^^^^^^^^^^^^^^^^
app_dev  |   File "/usr/local/lib/python3.11/site-packages/saq/queue.py", line 138, in version
app_dev  |     info = await self.redis.info()
app_dev  |            ^^^^^^^^^^^^^^^^^^^^^^^
app_dev  |   File "/usr/local/lib/python3.11/site-packages/redis/asyncio/client.py", line 484, in execute_command
app_dev  |     conn = self.connection or await pool.get_connection(command_name, **options)
app_dev  |                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
app_dev  |   File "/usr/local/lib/python3.11/site-packages/redis/asyncio/connection.py", line 1525, in get_connection
app_dev  |     await connection.connect()
app_dev  |   File "/usr/local/lib/python3.11/site-packages/redis/asyncio/connection.py", line 722, in connect
app_dev  |     await self.on_connect()
app_dev  |   File "/usr/local/lib/python3.11/site-packages/redis/asyncio/connection.py", line 804, in on_connect
app_dev  |     auth_response = await self.read_response()
app_dev  |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
app_dev  |   File "/usr/local/lib/python3.11/site-packages/redis/asyncio/connection.py", line 940, in read_response
app_dev  |     response = await self._parser.read_response(
app_dev  |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
app_dev  |   File "/usr/local/lib/python3.11/site-packages/redis/asyncio/connection.py", line 387, in read_response
app_dev  |     raw = await self._buffer.readline()
app_dev  |           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
app_dev  |   File "/usr/local/lib/python3.11/site-packages/redis/asyncio/connection.py", line 313, in readline
app_dev  |     await self._read_from_socket()
app_dev  |   File "/usr/local/lib/python3.11/site-packages/redis/asyncio/connection.py", line 256, in _read_from_socket
app_dev  |     raise ConnectionError(SERVER_CLOSED_CONNECTION_ERROR)
app_dev  | redis.exceptions.ConnectionError: Connection closed by server.

Toby Mao · Answer 10 · Wed Mar 15 2023 12:40:56 GMT+0800 (China Standard Time)

well that depends on the number of workers you have. i would imagine that 25 connections with one worker should be enough

Eric Kennedy · Answer 11 · Wed Mar 15 2023 15:13:10 GMT+0800 (China Standard Time)

Yeah but i am also using redis for other things. Wondering if there is a good way to ballpark it... 10? 15? 20? etc

For context i am using a free redis that only allows 50 connections:
https://render.com/pricing

'

So if i have 5 instances of the server running... i am trying to use 10 max connections each.

I also tried upstash redis which allows 100 connections and i have hit my daily limit of 10k commands already:
https://upstash.com/#section-pricing

Is SAQ long polling and continuously making redis commands?

edit:In the docs it says it avoids long polling actually so not quite sure how i hit 10k commands already...

edit2:

In my own tests when running SAQ I am seeing the number of commands increase continuosly:

total_commands_processed:474004
(base) ➜  ~ dev-redis-cli info | grep total_commands_processed
total_commands_processed:474060
(base) ➜  ~ dev-redis-cli info | grep total_commands_processed
total_commands_processed:474073
(base) ➜  ~ dev-redis-cli info | grep total_commands_processed
total_commands_processed:474088
(base) ➜  ~ dev-redis-cli info | grep total_commands_processed
total_commands_processed:474097
(base) ➜  ~ dev-redis-cli info | grep total_commands_processed
total_commands_processed:474118