Parallel running of nanoid might get stuck

Question

Parallel running of nanoid might get stuck

omris94 opened this issue a year ago · comments

Hey,
When I run nanoid with more than ~20 rows on my GCP Postgres instance (8 vCPUs) it might get stuck in the while loop. This problem was solved by indicating this function is PARALLEL UNSAFE.

I opened a PR solving this issue 😄

thanks,
Omri

Nikola Stanković · Answer 1 · Wed Aug 30 2023 06:17:18 GMT+0800 (China Standard Time)

Hi @omris94,

Firstly, thank you for taking the time to open an issue and submitting a pull request simultaneously. Contributions like yours truly drive improvements in our project and motivate me.

I appreciate the details you gave me about the behaviour on GCP's Postgres instance. Addressing such concerns is crucial for the project's robustness, and I'm glad you identified a workaround with the PARALLEL UNSAFE indication.

While marking the function as PARALLEL UNSAFE is a valid workaround, I am also concerned about its limitations. There should be a solution that allows parallel execution, and I will analyse this further to seek an optimal algorithm.

I'll review the PR shortly. Again, thank you for being proactive and for your valuable input.

Cheers, Nikola

Nikola Stanković · Answer 2 · Wed Aug 30 2023 06:28:19 GMT+0800 (China Standard Time)

Hi @omris94,

Would you mind telling me which parameters you used to execute the nanoid() function? Have you defined any custom size or alphabet?

I prepared a test SQL to reproduce the bug, but locally, for me, it works, unfortunately.

This is the test I wrote:

DO
$$
    DECLARE
        counter         int;
        numLoops int := 1000;
    BEGIN
        -- Default parameters
        FOR counter IN 1..numLoops
            LOOP
                RAISE NOTICE '%', nanoid();
            END LOOP;

        -- Size 12, default alphabet
        FOR counter IN 1..numLoops
            LOOP
                RAISE NOTICE '%', nanoid(12);
            END LOOP;

        -- Size 25, default alphabet
        FOR counter IN 1..numLoops
            LOOP
                RAISE NOTICE '%', nanoid(25);
            END LOOP;

        -- Default size (21), custom alphabet (only lowercase)
        FOR counter IN 1..numLoops
            LOOP
                RAISE NOTICE '%', nanoid(21, 'abcdefghijklmnopqrstuvwxyz');
            END LOOP;

        -- Size 15, custom alphabet (only numbers)
        FOR counter IN 1..numLoops
            LOOP
                RAISE NOTICE '%', nanoid(15, '0123456789');
            END LOOP;

        -- Size 17, custom alphabet (uppercase + numbers)
        FOR counter IN 1..numLoops
            LOOP
                RAISE NOTICE '%', nanoid(17, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789');
            END LOOP;
    END
$$;

Would you execute my test SQL without and once with your fix on your GCP Postgres instance and report your results on this issue? Adjust the variable numLoops to a number you are willing to execute.

Thanks in advance!
Cheers, Nikola

omris94 · Answer 3 · Wed Aug 30 2023 18:17:57 GMT+0800 (China Standard Time)

Hey @nik-sta,
I used nanoid like this: nanoid(10, '23456789abcdefghijklmnopqrstuvwxyz'). I recreated the function without the UNSAFE indication, but for some reason, the method I used yesterday doesn't reproduce the issue and neither does your test 😞

Furthermore, I read that UNSAFE is the default for functions, so I'm not sure what happened yesterday or why the function got stuck...

thanks,
Omri

Nikola Stanković · Answer 4 · Wed Aug 30 2023 19:51:16 GMT+0800 (China Standard Time)

Thanks @omris94, for your help.

The underlying gen_random_bytes function from the pgcrypto extension relies on the operating system for generating random values. In some contexts, especially when there's a high demand for random values, this could lead to performance issues or even blocking:

Continuous heavy usage of random generation can be resource-intensive.
Specific sources of randomness in the OS, like /dev/random, can block if they believe there isn't enough entropy, potentially causing stalls.
Available entropy might differ from traditional physical systems in cloud or virtualised environments.

Given this understanding and after a thorough analysis of the function's behaviour, I've decided to mark the base code of the function as PARALLEL SAFE. No concrete reasons indicate it should be inherently unsafe for parallel execution. However, the PARALLEL UNSAFE designation can be cautious in specific contexts to ensure serial execution, especially if the issue isn't consistently reproducible.

Thanks for bringing this to my attention, and I hope this provides a clear path forward!

I will close this issue now. Feel free to re-open it anytime if you have new ideas or findings.

Cheers, Nikola

omris94 · Answer 5 · Wed Aug 30 2023 20:20:31 GMT+0800 (China Standard Time)

Thanks @nik-sta! That Sounds great.
I just want to clarify that the symptoms didn't seem like a deadlock or anything block-related. I've noticed peak CPU utilization for way more time than it would make sense (CPU usage graph attached). The original insert statement that got stuck contained less than 10k rows. After 30 minutes of waiting for the query to finish (and high CPU usage), I manually stopped the query. Because of the CPU usage, I thought that for some reason we got stuck in the while loop for one of the rows.

After I recreated the function with the unsafe indication (now I know it was probably meaningless) it worked like a charm and it took less than one second.

In the attached graph you can see when I started running the query (first CPU peak), and when I started doing experiments (second peak).

Nikola Stanković · Answer 6 · Fri Sep 01 2023 22:51:53 GMT+0800 (China Standard Time)

Hi @omris94, after our discussion, we improved the nanoid() function. While testing, we used your custom size and alphabet definition. We discovered your combination could be enhanced with a better additional bytes factor (1.85 instead of the default 1.6). We added this possibility to the nanoid() function. It's now the third parameter to pass by. Check out the new release and the new README file for more insights. I also added a release documentation and a migration guide if you plan to use the more recent version. Have a nice weekend! Cheers, Nikola

https://github.com/viascom/nanoid-postgres/releases/tag/2.0.0