schemacrawler / SchemaCrawler

Free database schema discovery and comprehension tool

Home Page:http://www.schemacrawler.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Large Oracle Scan performance issue

jseaman-idata opened this issue · comments

Description

I'm using a version of SC that is a year or so old, but I don't think this particular approach has changed. It seems when you are processing constraints you pull all constraints in regardless of the tables included in the regex. For a few of the databases we are scanning that is tens of million rows or more. It's taking longer than 23 hours just to process through the constraints after the query returns to be processed by SC. I don't know the full time it would take because the daily up time for the target databases is 23 hours, ending the scan with a failure. Is there a way to optimize this process, add a table filter, or do this part in batches? Even with a decent amount of ram, and a powerful multi-core vm it's not crunching through it very fast.

How to Reproduce

No response

Relevant log output

No response

SchemaCrawler Version

16.15.4

Java Version

OpenJDK 11

Operating System and Version

Windows Server 2019

Relational Database System and Version

Oracle recent versions (serveral different targets)

JDBC Driver and Version

ojdbc10

@jseaman-idata - is this specifically for foreign key constraints? I looked at the other code for other types of constraints, and it seems ok.

This is from the logs when the process times out, we close everything and this is where SchemaCrawler reacts to everything ending:
schemacrawler.crawl.TableConstraintRetriever.retrieveTableConstraintDefinitions Could not retrieve check constraints

    java.sql.SQLRecoverableException: IO Error: Socket read interrupted

So, I'm not sure which constraints were last queried, but the oracle connection had been idle for 18 hours.

@jseaman-idata - you can selectively disable table constraints if you do not need them, while still getting indexes, primary keys and foreign keys. Please let me know if you would like to do this.

Unfortunately we need them. It is a ton of data to crunch through at once. It would greatly increase the number of queries to do it a table at a time if we are grabbing a whole DB, but it may be faster that way, especially for limited grabs. Perhaps as an option to do it one way or the other?

@jseaman-idata getting table constraint definitions is going to be very expensive. You are failing in TableConstraintRetriever.retrieveTableConstraintDefinitions. Do you need check constraint definitions? You can disable just that piece.

I need the constraints, but not the defs. I will try turning that off and seeing if that makes a difference. Thanks!

Closing, since turning off TableConstraintRetriever.retrieveTableConstraintDefinitions made a difference in performance.