Slow performance for large datasets on 7.5.1 (via Spring Data 4.1.0)

Question

Slow performance for large datasets on 7.5.1 (via Spring Data 4.1.0)

mdmm13 opened this issue 2 years ago · comments

Situation: we're got a dev server (4 cores, 16GB memory) that's running a single-instance ArangoDB (3.11.7) and a Spring Boot application that queries ArangoDB via Spring Data 4.1.0. We're posting here instead of in Spring Data as we use the Spring template that passes through to the Java driver directly (see below).

Complication: we've got a query that returns ~6 MB in JSON. That exact query in the admin interface returns in 0.5s. The same query via the Java driver below takes 14s. Regardless of admin interface/ Spring, the CPU/memory barley touches 15% each, same in the Arango dashboard/ metrics, so it's not a spec issue.

watch.start("aql");
ArangoCursor<FindAll> cursor = ops.query(query, bindVars, options, FindAll.class);
watch.stop();

(Note: FindAll is a POJO that includes Arango-annotated classes)

Questions:

How can we debug this properly?
How can we increase parsing performance? 550 lines should not take 13 seconds, regardless of size.

UPDATE:

Setting batchSize to 1 and 1000 - this has shown that the query itself returns in the same time as in the admin console (0.6) for batchSize=1, shifting the 13-second delay to the cursor.asListRemaining() method.
Setting 'RawJson' as the return type - 0.8s total (of which 0.6s is the query), so the performance hit is in the conversion, though it's hard to imagine 550 rows (~6 MB) taking 14 seconds.
Created duplicates of all Arango-annotated classes without annotations - 14s down to 2s.

==> does anyone have input on how to improve the deserialization to the original annotated classes?

Michele Rastelli · Answer 1 · Fri Feb 16 2024 17:33:55 GMT+0800 (China Standard Time)

This could happen if FindAll entity has fields linking to other documents (or edges), i.e. fields annotated with @Ref, @From, @To, @Relations. In such case the linked objects would be fetched eagerly. If this is the case, setting the annotation parameter lazy = true would load them lazily.

mdmm13 · Answer 2 · Mon Feb 19 2024 19:48:33 GMT+0800 (China Standard Time)

Thank you @rashtao - interesting that it'd eagerly load on deserialization instead of first actual use. Is there a way we can set everything as lazy by default?

Michele Rastelli · Answer 3 · Tue Feb 20 2024 05:32:17 GMT+0800 (China Standard Time)

Currently the default is eager and there is no way to change it globally, so you need to set lazy = true for each usage of the annotations above.

mdmm13 · Answer 4 · Wed Feb 28 2024 01:17:25 GMT+0800 (China Standard Time)

Understood, thank you.

Would be a feature request as a general driver option going forward, because it affects read performance heavily.