Async select not returning all rows

Question

Async select not returning all rows

Subdued5455 opened this issue 5 months ago · comments

What versions are you using?
Oracle DB 19.22.0.0.0
platform.platform: Linux-6.5.0-28-generic-x86_64-with-glibc2.35
sys.maxsize > 2**32: True
platform.python_version: 3.11.0rc1
oracledb.version: 2.1.2

Is it an error or a hang or a crash?
An error.
What error(s) or behavior you are seeing?
When using async mode, fetches seem to end early. The issue seems to occur on every execution, though the exact number of rows returned is inconsistent. It works properly when using synchronous mode. I am relatively new to python, but believe the basic test code is correct - would be happy if this is just a problem of me doing something dumb. I am connecting to a database I have no control over.

Here is my script call and output:
python simple_test.py
Password:
async starting...
async connection acquired...
async cursor created...
async getting count(1)...
async getting rows...
async Elapsed Time: 35.00 seconds
async COUNT(1): 9375506
async total_fetched_rows: 32802
async cursor.rowcount: 32802

Does your application call init_oracle_client()?
No.

Include a runnable Python script that shows the problem.

import asyncio
import oracledb
import getpass
import time


ORACLE_SOURCE_DSN=""
ORACLE_SOURCE_USER=""
ORACLE_SOURCE_PASS=""
    
async def async_test(table_name,pool):
    print(f"async starting...")
    start = time.time()
    total_fetched_rows = 0
    cursor_rowcount = 0
    async with oracledb.connect_async(user=ORACLE_SOURCE_USER,password=ORACLE_SOURCE_PASS,dsn=ORACLE_SOURCE_DSN) as connection:
        print(f"async connection acquired...")
        async with connection.cursor() as cursor:
            print(f"async cursor created...")
            print(f"async getting count(1)...")
            await cursor.execute(f"SELECT COUNT(1) FROM {table_name}")
            (row_count,) = await cursor.fetchone()
            print(f"async getting rows...")
            await cursor.execute(f'SELECT * FROM {table_name}')
            while True:
                rows = await cursor.fetchmany()
                if not rows:
                    break
                total_fetched_rows = total_fetched_rows + len(rows)
            cursor_rowcount = cursor.rowcount
    elapsed = time.time() - start
    print(f"async Elapsed Time: {elapsed:04.2f} seconds")
    print(f"async COUNT(1): {row_count}")
    print(f"async total_fetched_rows: {total_fetched_rows}")
    print(f"async cursor.rowcount: {cursor_rowcount}")

async def main():
    table_name = ""
    await async_test(table_name,pool=None)

ORACLE_SOURCE_PASS=getpass.getpass()
asyncio.run(main())

Anthony Tuininga · Answer 1 · Wed May 01 2024 04:42:03 GMT+0800 (China Standard Time)

I haven't seen this behavior myself, but I also don't see anything obviously wrong with your code. Can you supply a full test case? It is possible that the data being fetched is important in some fashion. Ideally you have a create table statement and a PL/SQL block (or Python script) that populates the table with the data. Then the script posted above can be run to demonstrate the issue.

As an aside, why are you using the release candidate for Python 3.11? :-)

Subdued5455 · Answer 2 · Wed May 01 2024 05:42:03 GMT+0800 (China Standard Time)

I will work on seeing if I can recreate the issue in a database I have more permissions on, with data that is generated and shareable.

Your question re: Python 3.11 prompted me to go back and check. At some point in the past as I was trying to figure this out I was just trying the different versions of python available in my package manager to see if that had any impact. I guess I just ended up on 3.11 and my version of Ubuntu has a release candidate as the current python 3.11 version:

Package: python3.11
Version: 3.11.0~rc1-1~22.04

Anyway, I will hopefully get back to you relatively soon. Thank you for all of your work on this!

Subdued5455 · Answer 3 · Thu May 02 2024 01:00:37 GMT+0800 (China Standard Time)

I have this test case which for me is working. I have dropped and recreated the table several times and each time it has successfully brought forth the error, however since the data load is using random data I do not know that I can guarantee it will always work. I also do not know what part of the test case is actually important - I took one of the tables that was causing issues and used its basic schema, loading it with random data.
New Table:

create table ASYNC_TEST
(
  a                 NUMBER(8) not null,
  b            VARCHAR2(6) not null,
  c               NUMBER(3) not null,
  d                  VARCHAR2(5) not null,
  e            VARCHAR2(4) not null,
  f VARCHAR2(5) not null,
  g VARCHAR2(2),
  h VARCHAR2(3),
  i VARCHAR2(4),
  j VARCHAR2(4),
  k VARCHAR2(1),
  l VARCHAR2(30),
  m NUMBER(4),
  n VARCHAR2(30),
  o VARCHAR2(1),
  p DATE not null,
  q VARCHAR2(3),
  r VARCHAR2(3),
  s DATE,
  t DATE,
  u NUMBER(7,3),
  v VARCHAR2(3),
  w VARCHAR2(1),
  x DATE,
  y DATE,
  z NUMBER(3),
  aa VARCHAR2(100),
  bb NUMBER(2),
  cc NUMBER(19) not null,
  dd NUMBER(19) not null,
  ee VARCHAR2(30),
  ff VARCHAR2(30),
  gg VARCHAR2(6),
  PRIMARY KEY(a,b,c)
);

Data Load:

INSERT INTO async_test
SELECT
dbms_random.value(0,99999999),
dbms_random.string('p',dbms_random.value(1,6)),
dbms_random.value(0,999),
dbms_random.string('p',dbms_random.value(1,5)),
dbms_random.string('p',dbms_random.value(1,4)),
dbms_random.string('p',dbms_random.value(1,5)),
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',2) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',3) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',4) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',4) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',1) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',30) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.value(0,9999) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',30) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',1) END,
sysdate + dbms_random.value(-10000,10000),
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',3) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',3) END,
sysdate + dbms_random.value(-10000,10000),
sysdate + dbms_random.value(-10000,10000),
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.value(0,9999) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',3) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',1) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN sysdate + dbms_random.value(-10000,10000) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN sysdate + dbms_random.value(-10000,10000) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.value(0,999) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',100) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.value(0,99) END,
dbms_random.value(0,9999999999999999999),
dbms_random.value(0,9999999999999999999),
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',30) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',30) END,
CASE WHEN dbms_random.value(0,100) < 80 THEN dbms_random.string('p',6) END
FROM DUAL
connect by LEVEL < 1000000;

This is some example output I got:
python simple_test.py
Password:
async starting...
async connection acquired...
async cursor created...
async getting count(1)...
async getting rows...
async Elapsed Time: 153.27 seconds
async COUNT(1): 999999
async fetch_count: 7416
async total_fetched_rows: 741402
async cursor.rowcount: 741402

I had also tried originally with loading 10million records. This takes a very long time though (at least on my server):
python simple_test.py
Password:
async starting...
async connection acquired...
async cursor created...
async getting count(1)...
async getting rows...
async Elapsed Time: 115.61 seconds
async COUNT(1): 9999999
async fetch_count: 5545
async total_fetched_rows: 554302
async cursor.rowcount: 554302

Anyway, I really hope you are able to reproduce.

Anthony Tuininga · Answer 4 · Sun May 05 2024 00:48:51 GMT+0800 (China Standard Time)

I just tried your script with Python 3.12 and did not notice any difficulties and I ran it multiple times. Asyncio is still relatively new and there have been issues reported. Can you try with Python 3.12 and the newly released python-oracledb 2.2.0 and see if you are still able to reproduce in your setup? I also used the recently released Oracle Database 23ai but that shouldn't have any bearing on this issue!

Is the database local? Or is there significant latency between the client and the database?

Subdued5455 · Answer 5 · Tue May 07 2024 23:32:24 GMT+0800 (China Standard Time)

I just tried with Python 3.12 and the latest oracledb 2.2.0 and I'm still getting the same issue. I'm running on a virtualbox guest & the database is not local. I have not noticed any particularly noticeable latency before between the database, but I have never actually tried and measured such a thing either. It is both geographically and network-topographically quite close.

I do a lot of oracle development (SQL & PL/SQL), but I am not really much of an administrator - are there any database parameters that could possibly be of interest here? I will try a few more things and see if I can figure out anything else.

If I can figure out anything I will update here. Thank you again for all of your efforts on this!

Tomer Manzur · Answer 6 · Fri Jun 14 2024 20:55:39 GMT+0800 (China Standard Time)

@anthony-tuininga
Happened to me too, running twice the same query get me different results.

Sometimes i get all rows, and sometimes not.

Trying to get about 1M rows, after increasing the arraysize to 10_000 it seems to be kinda stable and give back all the rows

Anthony Tuininga · Answer 7 · Fri Jun 14 2024 21:39:00 GMT+0800 (China Standard Time)

@syniex, are you able to replicate this issue consistently? We have been unable to do so, so any help from you on determining what causes the issue would be appreciated. Can you supply the table structure and the number of rows?

Tomer Manzur · Answer 8 · Fri Jun 14 2024 21:52:30 GMT+0800 (China Standard Time)

@anthony-tuininga
Yes, most of the time consistently wrong.

I can't provide the table structure since it's within customer network.

But i can provide the following data:

table has 67 columns
has 4M rows
has partitions by month

is there anything else i can do to help?

Anthony Tuininga · Answer 9 · Fri Jun 14 2024 22:46:17 GMT+0800 (China Standard Time)

What is needed is a test case that replicates the issue consistently. It may be very much data related or configuration related. @Subdued5455 provided a script that demonstrated the issue for him/her but unfortunately I was unable to replicate. So any further information on when these issues occur would be helpful!

Tomer Manzur · Answer 10 · Sat Jun 15 2024 15:59:54 GMT+0800 (China Standard Time)

I will try to make a replicatable example.
Is there an oracle i can run within container?

Christopher Jones · Answer 11 · Sat Jun 15 2024 20:55:03 GMT+0800 (China Standard Time)

@syniex A test case would be great. This issue is important for us to fix.

Try:

docker run -d -p 1521:1521 --name free -e ORACLE_PASSWORD=oracle -e APP_USER=scott -e APP_USER_PASSWORD=tiger -v oracle-volume:/opt/oracle/oradata gvenzl/oracle-free:slim

or look at https://github.com/oracle/python-oracledb/tree/main/samples/sample_container

Also see https://github.com/oracle/docker-images/tree/main/OracleDatabase

Tomer Manzur · Answer 12 · Tue Jun 18 2024 04:14:37 GMT+0800 (China Standard Time)

@cjbj
I can't seems to get a test case working using local database.
How do you want to processed.

In the producation enviroment the database is remote.
The query takes about 34-35 seconds to get the result (server became kinda slow lately).

Any information i can provide?

Tomer Manzur · Answer 13 · Sat Jun 22 2024 16:50:15 GMT+0800 (China Standard Time)

@anthony-tuininga I could not get a test case, Is there anything else i can do?

Christopher Jones · Answer 14 · Thu Aug 01 2024 07:12:24 GMT+0800 (China Standard Time)

We have reviewed the code but without a testcase we cannot progress this. We'd love to resolve it, so if anyone can give us a testcase, we would be very happy.

Anthony Tuininga · Answer 15 · Tue Aug 20 2024 12:06:52 GMT+0800 (China Standard Time)

I just pushed some changes that may address this issue. If you are able to build from source you can see if that is indeed the case. This change will be part of python-oracledb 2.4.0 which is scheduled to be released soon.

Tomer Manzur · Answer 16 · Wed Aug 21 2024 00:29:39 GMT+0800 (China Standard Time)

@anthony-tuininga
I will try it tommorow and will try to update you with results.

Anthony Tuininga · Answer 17 · Wed Aug 21 2024 05:22:29 GMT+0800 (China Standard Time)

FYI, the changes were included in python-oracledb 2.4.0 which was just released. Please do let me know if that release resolves the issue for you!

Weiyi_Chung · Answer 18 · Thu Aug 22 2024 16:32:03 GMT+0800 (China Standard Time)

Hi @anthony-tuininga Is there any version history that records the root cause of this issue? I didn't find much related information in the 2.4.0 release notes. Thanks!

Christopher Jones · Answer 19 · Thu Aug 22 2024 20:41:31 GMT+0800 (China Standard Time)

@Weiyi-Chung since we couldn't ever reproduce the problem, we don't know what the issue is. However @anthony-tuininga spotted some areas for improvement in the async implementation while he was adding Pipelining support and there's a chance that the changes made in that feature fix this issue.