oracle / python-cx_Oracle

Python interface to Oracle Database now superseded by python-oracledb

Home Page:https://oracle.github.io/python-cx_Oracle

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support for asyncio

kamikaze opened this issue · comments

Please add support for asyncio

I'll look into what the effort would be to do this. I've been looking at various asynchronous programming models recently so this request comes at a good time. :-)

Maybe this would help a bit... for an idea: https://github.com/MagicStack/asyncpg

@kamikaze whenever I needed to use asyncio I usually wrapped the cx_Oracle API into coroutines. This way I have created, e.g., table monitors. I would be curious to know if and where you think there should be places in the cx_Oracle API that should provide asyncio support directly :)

@P403n1x87 any example of what you did ? I'm quite interrested

@eLvErDe There are some different approaches that I take, depending on the problem at hand. A general one is to use asyncio.run_in_executor to make the DB operations non-blocking. One could then use asyncio.Queue to pass results around. Or one could also fetch the data in small batches until it is consumed entirely and create a simple generator. Something quick and dirty would be

    while follow:
        rows = list(db.log.fetch_all(
            order_by="id",
            where=f"id > :max_id",
            max_id=max_id
        ))
        if not rows:
            await asyncio.sleep(0.1)
        else:
            for row in rows:
                max_id = row.timestamp
                yield row

In this example, I'm polling the LOG table for updates and returning any new rows as they come in (some sort of logcat application where follow is set to True if one wants to follow the table as it grows). One thing to note is that the SQL query against the table is hidden inside db.log.fetch_all. This is equivalent to cursor.fetchall("select * from log where id > :max_id"). In this example I'm making use of the additional layer of abstraction introduced by Sibilla, but the general idea is the same regardless. I know that I can use fetch_all because this will fetch small batches anyway, but more generally you'd want to use fetch_many (if using Sibilla) or Cursor.fetchmany if using a plain Cursor object.

Hello,

Yeah I was thinking about running in a thread and use a Queue for getting messages but I though maybe there was something nicer ;-)

what a crap. no, I would like to have a pure asyncio implementation like asyncpg etc.

So do I, but atm you don't have better options

let's just switch to PostgreSQL \m/

still need support for asyncio, rn i'm using asyncio.run_in_executor()

Is there any update regarding this?

IIRC no one is working on this, @sharkguto if you want you can work on it, in fact diesel-rs/diesel#399 is in a stage similar to this: many people requested async support but nobody wants to do it.

@sharkguto, I have no current updates beyond the fact that I have been exploring support. I have seen similar performance improvements with my proof of concept. @cjbj is the product manager, though, so he can comment further.

@sharkguto Have you benchmarked cx_Oracle ?

@sharkguto Have you benchmarked cx_Oracle ?

Only with flask about 2 years ago, with version 6.4 i think. With starlette or fastapi not yet. But I can do it and provide the results for you soon ... thanks for your quickly response!

That would be interesting. But the key thing is does it already perform the way you need. [Update: check out https://cx-oracle.readthedocs.io/en/latest/user_guide/tuning.html]

[For fun, since it is a slightly different stack, look at https://blogs.oracle.com/timesten/is-the-python-cx_oracle-sql-driver-fast-enough-for-database-benchmarks]

Hi @cjbj ,

I started the benchmark project, but I got some issues to make it work properly... I am trying to use instantclient_19_8 from (https://www.oracle.com/database/technologies/instant-client/linux-x86-64-downloads.html)[https://www.oracle.com/database/technologies/instant-client/linux-x86-64-downloads.html]

without init_oracle_client with env variables:

image

setting init_oracle_client path:

image

error:

cx_Oracle.DatabaseError: DPI-1047: Cannot locate a 64-bit Oracle Client library: "libnnz19.so: cannot open shared object file: No such file or directory". See https://cx-oracle.readthedocs.io/en/latest/user_guide/installation.html for help

But the lib is there, as following print screen

image

also i set the environment variables too

os.environ["PATH"] = os.environ["PATH"] + os.pathsep + "/opt/oracle/instantclient_19_8"
os.environ["LD_LIBRARY_PATH"] = "/opt/oracle/instantclient_19_8"
#os.environ["ORACLE_HOME"] = "/opt/oracle/instantclient_19_8"

Any help would be great! It is so painful to install cx_oracle... Thanks!

Hi @cjbj ,

I started the benchmark project, but I got some issues to make it work properly... I am trying to use instantclient_19_8 from (https://www.oracle.com/database/technologies/instant-client/linux-x86-64-downloads.html)[https://www.oracle.com/database/technologies/instant-client/linux-x86-64-downloads.html]%5Bhttps://www.oracle.com/database/technologies/instant-client/linux-x86-64-downloads.html%5D)

without init_oracle_client with env variables:

image

setting init_oracle_client path:

image

error:

cx_Oracle.DatabaseError: DPI-1047: Cannot locate a 64-bit Oracle Client library: "libnnz19.so: cannot open shared object file: No such file or directory". See https://cx-oracle.readthedocs.io/en/latest/user_guide/installation.html for help

But the lib is there, as following print screen

image

also i set the environment variables too

os.environ["PATH"] = os.environ["PATH"] + os.pathsep + "/opt/oracle/instantclient_19_8"
os.environ["LD_LIBRARY_PATH"] = "/opt/oracle/instantclient_19_8"
#os.environ["ORACLE_HOME"] = "/opt/oracle/instantclient_19_8"

Any help would be great! It is so painful to install cx_oracle... Thanks!

gustavo@terminator-T2900:/media/backup/git-projects/cx_oracle_vs_asyncpg$ export DPI_DEBUG_LEVEL=64
gustavo@terminator-T2900:/media/backup/git-projects/cx_oracle_vs_asyncpg$  cd /media/backup/git-projects/cx_oracle_vs_asyncpg ; /usr/bin/env /usr/bin/python3 /home/gustavo/.vscode/extensions/ms-python.python-2020.9.111407/pythonFiles/lib/python/debugpy/launcher 42907 -- -m benchx 
ODPI [71320] 2020-10-01 18:34:04.724: ODPI-C 4.0.2
ODPI [71320] 2020-10-01 18:34:04.724: debugging messages initialized at level 64
ODPI [71320] 2020-10-01 18:34:04.726: Context Parameters:
ODPI [71320] 2020-10-01 18:34:04.726:     Oracle Client Lib Dir: /opt/oracle/instantclient_19_8
ODPI [71320] 2020-10-01 18:34:04.726: Environment Variables:
ODPI [71320] 2020-10-01 18:34:04.726:     LD_LIBRARY_PATH => "/opt/oracle/instantclient_19_8"
ODPI [71320] 2020-10-01 18:34:04.726: load in parameter directory
ODPI [71320] 2020-10-01 18:34:04.726: load in dir /opt/oracle/instantclient_19_8
ODPI [71320] 2020-10-01 18:34:04.726: load with name /opt/oracle/instantclient_19_8/libclntsh.so
ODPI [71320] 2020-10-01 18:34:04.726: load by OS failure: libnnz19.so: cannot open shared object file: No such file or directory
ODPI [71320] 2020-10-01 18:34:04.726: load with name /opt/oracle/instantclient_19_8/libclntsh.so.19.1
ODPI [71320] 2020-10-01 18:34:04.726: load by OS failure: libnnz19.so: cannot open shared object file: No such file or directory
ODPI [71320] 2020-10-01 18:34:04.726: load with name /opt/oracle/instantclient_19_8/libclntsh.so.18.1
ODPI [71320] 2020-10-01 18:34:04.726: load by OS failure: libnnz19.so: cannot open shared object file: No such file or directory
ODPI [71320] 2020-10-01 18:34:04.726: load with name /opt/oracle/instantclient_19_8/libclntsh.so.12.1
ODPI [71320] 2020-10-01 18:34:04.726: load by OS failure: libnnz19.so: cannot open shared object file: No such file or directory
ODPI [71320] 2020-10-01 18:34:04.726: load with name /opt/oracle/instantclient_19_8/libclntsh.so.11.1
ODPI [71320] 2020-10-01 18:34:04.726: load by OS failure: libnnz19.so: cannot open shared object file: No such file or directory
ODPI [71320] 2020-10-01 18:34:04.726: load with name /opt/oracle/instantclient_19_8/libclntsh.so.20.1
ODPI [71320] 2020-10-01 18:34:04.726: load by OS failure: /opt/oracle/instantclient_19_8/libclntsh.so.20.1: cannot open shared object file: No such file or directory

@sharkguto, note that the environment variable LD_LIBRARY_PATH must be set prior to starting Python. You cannot set it while the process is running! In addition, init_oracle_client() is not properly supported on platforms other than Windows and macOS. On Linux you have to have LD_LIBRARY_PATH or ld.so.conf configuration set up before it will work.

@sharkguto, note that the environment variable LD_LIBRARY_PATH must be set prior to starting Python. You cannot set it while the process is running! In addition, init_oracle_client() is not properly supported on platforms other than Windows and macOS. On Linux you have to have LD_LIBRARY_PATH or ld.so.conf configuration set up before it will work.

Great!! Thanks for your support... i will do it

Hi @cjbj , I finished the benchmark project. git project . Any PR will be welcome :)

benchmark

1 connection

  • postgresql
gustavo@terminator-T2900:~$ wrk -c 1 -t 1 -d 30 http://localhost:8080/v1/postgres --latency
Running 30s test @ http://localhost:8080/v1/postgres
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.53ms  313.27us  20.25ms   98.63%
    Req/Sec   659.80     29.38   727.00     81.00%
  Latency Distribution
     50%    1.51ms
     75%    1.57ms
     90%    1.63ms
     99%    1.92ms
  19709 requests in 30.01s, 390.37MB read
Requests/sec:    656.65
Transfer/sec:     13.01MB
  • oracle
gustavo@terminator-T2900:~$ wrk -c 1 -t 1 -d 30 http://localhost:8080/v1/oracle --latency
Running 30s test @ http://localhost:8080/v1/oracle
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.23ms  110.12us   5.26ms   73.29%
    Req/Sec   813.84     44.07     0.97k    69.00%
  Latency Distribution
     50%    1.24ms
     75%    1.30ms
     90%    1.35ms
     99%    1.50ms
  24320 requests in 30.02s, 481.70MB read
Requests/sec:    810.23
Transfer/sec:     16.05MB

2 connections

  • postgresql
gustavo@terminator-T2900:~$ wrk -c 2 -t 1 -d 30 http://localhost:8080/v1/postgres --latency
Running 30s test @ http://localhost:8080/v1/postgres
  1 threads and 2 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.64ms  178.03us   8.22ms   84.92%
    Req/Sec     1.22k    62.89     1.36k    75.33%
  Latency Distribution
     50%    1.60ms
     75%    1.68ms
     90%    1.81ms
     99%    2.20ms
  36555 requests in 30.01s, 724.03MB read
Requests/sec:   1218.03
Transfer/sec:     24.13MB
  • oracle
gustavo@terminator-T2900:~$ wrk -c 2 -t 1 -d 30 http://localhost:8080/v1/oracle --latency
Running 30s test @ http://localhost:8080/v1/oracle
  1 threads and 2 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.35ms  135.20us   2.50ms   74.16%
    Req/Sec     1.49k    70.66     1.65k    72.00%
  Latency Distribution
     50%    1.33ms
     75%    1.41ms
     90%    1.51ms
     99%    1.78ms
  44362 requests in 30.01s, 0.86GB read
Requests/sec:   1478.03
Transfer/sec:     29.27MB
gustavo@terminator-T2900:~$

200 connections

  • postgres
gustavo@terminator-T2900:~$ wrk -c 200 -t 1 -d 30 http://localhost:8080/v1/postgres --latency
Running 30s test @ http://localhost:8080/v1/postgres
  1 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   131.74ms  198.55ms   1.99s    89.23%
    Req/Sec     2.89k   136.56     3.11k    96.00%
  Latency Distribution
     50%   28.92ms
     75%  153.79ms
     90%  363.36ms
     99%  992.43ms
  86342 requests in 30.01s, 1.67GB read
  Socket errors: connect 0, read 0, write 0, timeout 9
Requests/sec:   2877.00
Transfer/sec:     56.98MB
  • oracle
gustavo@terminator-T2900:~$ wrk -c 200 -t 1 -d 30 http://localhost:8080/v1/oracle --latency
Running 30s test @ http://localhost:8080/v1/oracle
  1 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   188.63ms   24.53ms 504.33ms   82.33%
    Req/Sec     0.94k   171.69     1.09k    92.83%
  Latency Distribution
     50%  188.91ms
     75%  192.81ms
     90%  201.95ms
     99%  271.89ms
  26227 requests in 30.02s, 515.19MB read
  Socket errors: connect 0, read 1992, write 0, timeout 0
  Non-2xx or 3xx responses: 215
Requests/sec:    873.58
Transfer/sec:     17.16MB
gustavo@terminator-T2900:~$ wrk -c 200 -t 1 -d 30 http://localhost:8080/v1/oracle --latency
Running 30s test @ http://localhost:8080/v1/oracle
  1 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   200.27ms   12.24ms 345.36ms   97.08%
    Req/Sec     1.00k    60.83     1.09k    92.00%
  Latency Distribution
     50%  199.77ms
     75%  201.68ms
     90%  205.34ms
     99%  229.34ms
  29825 requests in 30.01s, 589.83MB read
  Socket errors: connect 0, read 139, write 0, timeout 0
  Non-2xx or 3xx responses: 43
Requests/sec:    993.68
Transfer/sec:     19.65MB
gustavo@terminator-T2900:~$

Results

Is not a fair fight, but as I can see cx_oracle performs pretty good with low workload, better than databases(with asyncpg). On the other hand, cx_oracle driver got a lot of crashes, invalidate some tests, when have more connections

OCI-21500: no message, kgebse recursion failure
OCI-21500: internal error code, arguments: [kgegpa:parameter corruption], [], [], [], [], [], [], []

@sharkguto, you need to add the parameter threaded=True if the number of threads exceeds 1; otherwise, you will get such "interesting" errors as you noted!

I haven't examined your code, but this article is definitely relevant: https://blog.miguelgrinberg.com/post/ignore-all-web-performance-benchmarks-including-this-one. 😄 The fact of the matter is that for some workloads, async is definitely better and for others sync is better. I agree that it would be good to have an async Oracle driver -- but I can't say when such a driver will be created. Its on the list but I don't get to set priorities!

@sharkguto, you need to add the parameter threaded=True if the number of threads exceeds 1; otherwise, you will get such "interesting" errors as you noted!

I haven't examined your code, but this article is definitely relevant: https://blog.miguelgrinberg.com/post/ignore-all-web-performance-benchmarks-including-this-one. smile The fact of the matter is that for some workloads, async is definitely better and for others sync is better. I agree that it would be good to have an async Oracle driver -- but I can't say when such a driver will be created. Its on the list but I don't get to set priorities!

Hi @anthony-tuininga , thanks for your support!

I agree with you. I use a lot of async/await just for simple endpoints that run queries on database and return the result. On the other hand, I have other endpoint that need to manipulate images with pillow, so i put it in synchronous mode, running in parallel , and I get better response time.

Now my benchmark with fastapi is done. I am really happy with the latency distribution from cx_oracle, pretty good!

200 connections

  • postgres
gustavo@terminator-T2900:~$ wrk -c 200 -t 1 -d 30 http://localhost:8080/v1/postgres --latency
Running 30s test @ http://localhost:8080/v1/postgres
  1 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   141.39ms  209.19ms   1.99s    87.93%
    Req/Sec     2.90k    99.15     3.11k    88.67%
  Latency Distribution
     50%   28.65ms
     75%  182.29ms
     90%  394.89ms
     99%    1.00s
  86540 requests in 30.01s, 1.67GB read
  Socket errors: connect 0, read 0, write 0, timeout 17
Requests/sec:   2883.50
Transfer/sec:     57.11MB
gustavo@terminator-T2900:~$
  • oracle
gustavo@terminator-T2900:~$ wrk -c 200 -t 1 -d 30 http://localhost:8080/v1/oracle --latency
Running 30s test @ http://localhost:8080/v1/oracle
  1 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    79.87ms   62.10ms 511.67ms   78.35%
    Req/Sec     2.81k   345.50     3.37k    81.67%
  Latency Distribution
     50%   42.39ms
     75%  121.81ms
     90%  179.98ms
     99%  233.64ms
  83852 requests in 30.03s, 1.62GB read
Requests/sec:   2791.99
Transfer/sec:     55.30MB

@anthony-tuininga thanks again , and I hope someday we can have a async driver for python soon. Have a nice day :)

Is there any update on this?

@vikt0rs no news. Thanks for letting us know your interest.

Hi, I'm also very interested in this.

Hi, I would also be very thankful, if the asyncio support got prioritized on the enhancements list.

FWIW, some colleagues had huge problems moving a higher RPS (requests per second) application from on-premise (with Oracle) to the AWS cloud with PostgreSQL. Other ways it was not apples to apples is that the Cloud Architecture pre-decided on the instance types that would be enough for PostgreSQL, but on-premise, Oracle was a bare metal, non-virtual system.

We had to implement RDS Proxy to raise performance because Python doesn't multi-thread well and Django does not manage connections across processes as a true connection pool.

I was looking at this issue because the same colleague has suggested that switching to Golang might be better for higher RPS applications, especially those API driven, and I was comparing the asyncio capabilities and goroutines, and thinking as always about how this fits with the database layer.

To my last post - very early days yet, and my applications get to stay on premise with Oracle and PL/SQL for the future.

This issue will be tracked in the new driver: oracle/python-oracledb#6

It isn't async per se but the key benefits of escaping from the GIL and better managing RDBMS connections that we need. As both @sharkguto and @P403n1x87 mention, it can be done by wrapping queries in coroutines.

That means that @cjbj and @anthony-tuininga don't need to write a new driver (and since async drivers don't follow PEP 249) it seems like it would be a new driver if done at the extension level. I think a thin package that wraps cx_Oracle in an opinionated way (e.g. using connection pools all the time, and get a connection just to do the request) could be used.

I guess there is a problem there for cursors and fetching additional results, but I'll leave it to @anthony-tuininga to figure this out in the new driver.

asyncio support in python-oracledb has been announced: oracle/python-oracledb#258