oracledb memory problem
Looongtail opened this issue · comments
Discussed in #286
Originally posted by Looongtail January 26, 2024
I tried to get data from oracle database and convert it to dataframe.
I've noticed that oracledb's cursor seems consume lot of memory and never get it back.
This is my env. info.
Oracle version : 19c
oracle client version : 19.20
oracledb lib version : 2.1
Table create query for testing
CREATE TABLE test_table (
ID NUMBER,
RandomColumn1 VARCHAR2(50),
RandomColumn2 NUMBER,
RandomColumn3 DATE
);
DECLARE
v_counter NUMBER := 1;
BEGIN
WHILE v_counter <= 10000000 LOOP
INSERT INTO test_table (ID, RandomColumn1, RandomColumn2, RandomColumn3)
VALUES (
v_counter,
DBMS_RANDOM.STRING('X', 10),
ROUND(DBMS_RANDOM.VALUE(-100, 100)),
TRUNC(SYSDATE - DBMS_RANDOM.VALUE(1, 365))
);
v_counter := v_counter + 1;
END LOOP;
COMMIT;
END;
/
Python code to check memory usage
import oracledb
import pandas as pd
import psutil
def memory_usage(message: str = 'debug'):
p = psutil.Process()
rss = p.memory_info().rss / 2 ** 20 # byte to MB
print(f'{message} memory usage {rss: 10.5f} MB')
memory_usage('init')
oracledb.init_oracle_client()
conn = oracledb.connect(user='*******', password='*******', dsn='*******')
cursor = conn.cursor()
cursor.arraysize = 10000
sql = 'select * from test_table'
data_lt = cursor.execute(sql).fetchall()memory_usage('data fetch')
columns_lt = [column[0] for column in cursor.description]
data_df = pd.DataFrame(data_lt, columns=columns_lt)
memory_usage('make dataframe')
cursor.close()
conn.close()
memory_usage('close connection')
print(f'dataframe size : {data_df.info()}')
Result
C:\Users\SDS\AppData\Local\Programs\Python\Python310\python.exe D:\workspace\nscm_pre\z_local_run\test_oracle_memory.py
init memory usage 88.41797 MB
data fetch memory usage 2475.40625 MB
make dataframe memory usage 2781.08594 MB
close connection memory usage 2772.56641 MB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000000 entries, 0 to 9999999
Data columns (total 4 columns):
;# Column Dtype
0 ID int64
1 RANDOMCOLUMN1 object
2 RANDOMCOLUMN2 int64
3 RANDOMCOLUMN3 datetime64[ns]
dtypes: datetime64ns, int64(2), object(1)
memory usage: 305.2+ MB
dataframe size : None
Process finished with exit code 0
I'd like to make used memory free after making dataframe.
When I save it as a file and delete it, memory is released.
In addition, memory usage much smaller than before when it's loaded from a file.
Could somebody help?
I tried just performing the query without pandas involved and found that after the connection is closed and the data destroyed that very little memory is retained. Even just importing pandas without actually using it involved an additional 60 MB. So I'm not sure there is something that needs to be done for oracledb.
Note that you have two copies of the data: the first being the result of the fetch and the second the data frame. Neither of those have been destroyed in your code so it is unsurprising that the amount of memory has increased enormously!
Closing - no activity.