arangodb / arangodb-java-driver

The official ArangoDB Java driver.

Repository from Github https://github.comarangodb/arangodb-java-driverRepository from Github https://github.comarangodb/arangodb-java-driver

Slow performance for large datasets on 7.5.1 (via Spring Data 4.1.0)

mdmm13 opened this issue · comments

Situation: we're got a dev server (4 cores, 16GB memory) that's running a single-instance ArangoDB (3.11.7) and a Spring Boot application that queries ArangoDB via Spring Data 4.1.0. We're posting here instead of in Spring Data as we use the Spring template that passes through to the Java driver directly (see below).

Complication: we've got a query that returns ~6 MB in JSON. That exact query in the admin interface returns in 0.5s. The same query via the Java driver below takes 14s. Regardless of admin interface/ Spring, the CPU/memory barley touches 15% each, same in the Arango dashboard/ metrics, so it's not a spec issue.

watch.start("aql");
ArangoCursor<FindAll> cursor = ops.query(query, bindVars, options, FindAll.class);
watch.stop();

(Note: FindAll is a POJO that includes Arango-annotated classes)

Questions:

  1. How can we debug this properly?
  2. How can we increase parsing performance? 550 lines should not take 13 seconds, regardless of size.

UPDATE:

  1. Setting batchSize to 1 and 1000 - this has shown that the query itself returns in the same time as in the admin console (0.6) for batchSize=1, shifting the 13-second delay to the cursor.asListRemaining() method.
  2. Setting 'RawJson' as the return type - 0.8s total (of which 0.6s is the query), so the performance hit is in the conversion, though it's hard to imagine 550 rows (~6 MB) taking 14 seconds.
  3. Created duplicates of all Arango-annotated classes without annotations - 14s down to 2s.

==> does anyone have input on how to improve the deserialization to the original annotated classes?

This could happen if FindAll entity has fields linking to other documents (or edges), i.e. fields annotated with @Ref, @From, @To, @Relations. In such case the linked objects would be fetched eagerly. If this is the case, setting the annotation parameter lazy = true would load them lazily.

Thank you @rashtao - interesting that it'd eagerly load on deserialization instead of first actual use. Is there a way we can set everything as lazy by default?

Currently the default is eager and there is no way to change it globally, so you need to set lazy = true for each usage of the annotations above.

Understood, thank you.

Would be a feature request as a general driver option going forward, because it affects read performance heavily.