latitude-dev / latitude

Developer-first embedded analytics

Home Page:https://latitude.so

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Query results are completely stored in memory at once, which can crash the server if they're too large.

csansoon opened this issue · comments

Issue Description

When running queries, Latitude executes them on the source and temporarily stores the result in memory before returning it. This approach can be problematic with large datasets. If the data exceeds the available heap space in memory, it can crash the server.

Example Running a 1M rows table from DuckDB. Node has been configured to use a maximum of 4GB:

Screen.Recording.2024-06-18.at.12.20.06.mov

Example Same query. Node has not been configured with any additional settings (default Node maximum heap is 2GB):

Screen.Recording.2024-06-18.at.12.46.25.mov

Proposed solution

The only viable solution I can think of is to avoid loading the entire results into memory at once. To achieve this, we need to change our whole query results infrastructure. Instead of returning actual results, connectors should provide a stream. This stream, created in the connector, would run the query in batches, yielding results for each batch, with each subsequent batch only requested after the previous one has been consumed.

While this approach would prevent memory overload, it would make all queries become slower.