Large Oracle Scan performance issue

Question

Large Oracle Scan performance issue

jseaman-idata opened this issue a year ago · comments

Description

I'm using a version of SC that is a year or so old, but I don't think this particular approach has changed. It seems when you are processing constraints you pull all constraints in regardless of the tables included in the regex. For a few of the databases we are scanning that is tens of million rows or more. It's taking longer than 23 hours just to process through the constraints after the query returns to be processed by SC. I don't know the full time it would take because the daily up time for the target databases is 23 hours, ending the scan with a failure. Is there a way to optimize this process, add a table filter, or do this part in batches? Even with a decent amount of ram, and a powerful multi-core vm it's not crunching through it very fast.

How to Reproduce

No response

Relevant log output

No response

SchemaCrawler Version

16.15.4

Java Version

OpenJDK 11

Operating System and Version

Windows Server 2019

Relational Database System and Version

Oracle recent versions (serveral different targets)

JDBC Driver and Version

ojdbc10

Sualeh Fatehi · Answer 1 · Wed May 10 2023 05:10:24 GMT+0800 (China Standard Time)

@jseaman-idata - is this specifically for foreign key constraints? I looked at the other code for other types of constraints, and it seems ok.

jseaman-idata · Answer 2 · Wed May 10 2023 05:30:08 GMT+0800 (China Standard Time)

This is from the logs when the process times out, we close everything and this is where SchemaCrawler reacts to everything ending:
schemacrawler.crawl.TableConstraintRetriever.retrieveTableConstraintDefinitions Could not retrieve check constraints

    java.sql.SQLRecoverableException: IO Error: Socket read interrupted

So, I'm not sure which constraints were last queried, but the oracle connection had been idle for 18 hours.

Sualeh Fatehi · Answer 3 · Wed May 10 2023 05:31:42 GMT+0800 (China Standard Time)

@jseaman-idata - you can selectively disable table constraints if you do not need them, while still getting indexes, primary keys and foreign keys. Please let me know if you would like to do this.

jseaman-idata · Answer 4 · Wed May 10 2023 06:01:12 GMT+0800 (China Standard Time)

Unfortunately we need them. It is a ton of data to crunch through at once. It would greatly increase the number of queries to do it a table at a time if we are grabbing a whole DB, but it may be faster that way, especially for limited grabs. Perhaps as an option to do it one way or the other?

Sualeh Fatehi · Answer 5 · Wed May 10 2023 06:10:41 GMT+0800 (China Standard Time)

@jseaman-idata getting table constraint definitions is going to be very expensive. You are failing in TableConstraintRetriever.retrieveTableConstraintDefinitions. Do you need check constraint definitions? You can disable just that piece.

jseaman-idata · Answer 6 · Wed May 10 2023 06:26:01 GMT+0800 (China Standard Time)

I need the constraints, but not the defs. I will try turning that off and seeing if that makes a difference. Thanks!

Sualeh Fatehi · Answer 7 · Sun May 21 2023 02:34:14 GMT+0800 (China Standard Time)

Closing, since turning off TableConstraintRetriever.retrieveTableConstraintDefinitions made a difference in performance.