oracle / python-oracledb

Python driver for Oracle Database conforming to the Python DB API 2.0 specification. This is the renamed, new major release of cx_Oracle

Home Page:https://oracle.github.io/python-oracledb

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

oracledb memory problem

Looongtail opened this issue · comments

commented

Discussed in #286

Originally posted by Looongtail January 26, 2024
I tried to get data from oracle database and convert it to dataframe.
I've noticed that oracledb's cursor seems consume lot of memory and never get it back.
This is my env. info.
Oracle version : 19c
oracle client version : 19.20
oracledb lib version : 2.1

Table create query for testing

CREATE TABLE test_table (
    ID NUMBER,
    RandomColumn1 VARCHAR2(50),
    RandomColumn2 NUMBER,
    RandomColumn3 DATE
);

DECLARE
    v_counter NUMBER := 1;
BEGIN
    WHILE v_counter <= 10000000 LOOP
        INSERT INTO test_table (ID, RandomColumn1, RandomColumn2, RandomColumn3)
        VALUES (
            v_counter,
            DBMS_RANDOM.STRING('X', 10),
            ROUND(DBMS_RANDOM.VALUE(-100, 100)),
            TRUNC(SYSDATE - DBMS_RANDOM.VALUE(1, 365))
        );
        v_counter := v_counter + 1;
    END LOOP;
    COMMIT;
END;
/

Python code to check memory usage

import oracledb
import pandas as pd
import psutil

def memory_usage(message: str = 'debug'):
    p = psutil.Process()
    rss = p.memory_info().rss / 2 ** 20 # byte to MB
    print(f'{message} memory usage {rss: 10.5f} MB')

memory_usage('init')
oracledb.init_oracle_client()
conn = oracledb.connect(user='*******', password='*******', dsn='*******')

cursor = conn.cursor()
cursor.arraysize = 10000
sql = 'select * from test_table'
data_lt = cursor.execute(sql).fetchall()memory_usage('data fetch')

columns_lt = [column[0] for column in cursor.description]
data_df = pd.DataFrame(data_lt, columns=columns_lt)
memory_usage('make dataframe')

cursor.close()
conn.close()
memory_usage('close connection')
print(f'dataframe size : {data_df.info()}')

Result
C:\Users\SDS\AppData\Local\Programs\Python\Python310\python.exe D:\workspace\nscm_pre\z_local_run\test_oracle_memory.py
init memory usage 88.41797 MB
data fetch memory usage 2475.40625 MB
make dataframe memory usage 2781.08594 MB
close connection memory usage 2772.56641 MB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000000 entries, 0 to 9999999
Data columns (total 4 columns):
;# Column Dtype


0 ID int64
1 RANDOMCOLUMN1 object
2 RANDOMCOLUMN2 int64
3 RANDOMCOLUMN3 datetime64[ns]
dtypes: datetime64ns, int64(2), object(1)
memory usage: 305.2+ MB
dataframe size : None

Process finished with exit code 0

I'd like to make used memory free after making dataframe.
When I save it as a file and delete it, memory is released.
In addition, memory usage much smaller than before when it's loaded from a file.
Could somebody help?

I tried just performing the query without pandas involved and found that after the connection is closed and the data destroyed that very little memory is retained. Even just importing pandas without actually using it involved an additional 60 MB. So I'm not sure there is something that needs to be done for oracledb.

Note that you have two copies of the data: the first being the result of the fetch and the second the data frame. Neither of those have been destroyed in your code so it is unsurprising that the amount of memory has increased enormously!

Closing - no activity.