tediousjs / tedious

Node TDS module for connecting to SQL Server databases.

Home Page:http://tediousjs.github.io/tedious/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Performance loss with queries that return a long list of rows

lpelicioli opened this issue · comments

commented

Question

Hi,
is it possible that a query that returns a long list of rows could block/slow down program execution until the result is complete?
I ask this question because I have such a problem.
Is this a bug or is it a code mishandling problem on my part?

Relevant Issues and Pull Requests

This depends on a bunch of factors I think. What's your packet size you are using? Are you using a TLS connection?

Do you have a reproducible test case for this or are you at the start of investigating this?

commented

Thanks for the response.
I honestly have the problem since Monday, in fact I even opened a request on stackoverflow.
It wasn't until last night that I began to think the problem was the library.
I did several tests that led me to isolate the problem to the code where I run the query.
The packet size is set to 4096 and TLS is disabled

commented

When the query is launched after a few minutes of activity visual studio code notifies me that an out of memory exception occurred

immagine

@lpelicioli Can you reproduce this consistently? We'd be happy to look into this, but it's extremely hard to do so without a reproducible test case.

Based on the code posted in your stackoverflow question, it looks like you're loading an extremely large resultset from the database and holding all of that data in memory. That's not going to work, independent of your use of tedious or any other database server / database library.

If you e.g. simply want to export all data from the database, tedious supports that. Reponse data is streamed from the SQL Server by default (that's when a row event is emitted on the request object), but you need to then stream that data to the http response as well, not collect everything in an object.

commented

Can you reproduce this consistently?

I have to work on it. In the next comment I will send you the info to reproduce it.

it looks like you're loading an extremely large resultset from the database and holding all of that data in memory. That's not going to work

Yes you are right, I am loading about 157.099.449 rows. With this structure

immagine
immagine

I also tried sending away 200 row packets but that doesn't seem to have solved the problem.
The program slows down and the "out of memory" exception takes longer (in this case I have not been waiting for it to occur).

you need to then stream that data to the http response as well, not collect everything in an object.

I honestly don't know how to do it... If you think this problem can be the cause, I will try to solve it first.
For the moment I am simply avoiding saving the information in memory, but the result is the same
(The program slows down and the "out of memory" exception takes longer).

commented

Done.
Here you can find the test for reproduce it.
https://github.com/lpelicioli/TestTedious

Thanks for providing the test case. Before I try to reproduce this locally, let's take a step back.

You mentioned that you're loading 157.099.449 rows.

For the table structure you specified, here's the number of bytes per column per row (if all the columns are non-null):

  • ID: 4 byte
  • AppID: 1 byte
  • DateTime: 8 byte
  • Type: 1 byte
  • SubType: 1 byte
  • MachineID: 2 byte
  • AppEventID: 4 byte

So the total data just for the number of rows you're trying to load is 157 099 449 * (4 + 1 + 8 + 1 + 1 + 2 + 4) bytes = 3.29908843 gigabytes.

These ~3.3 GB does not even include the network overhead of the TDS protocol (the network protocol used to communicate with SQLServer). Now, when this data is read from the network stream and converted to javascript values by tedious, the data size will be a multiple of that, because e.g. JavaScript does not have specific integer types, but all numbers are usually stored as a 64 bit floating point number.

I just want to raise the point that your approach/application design is not going to work, even if there is no issue in tedious itself. You can't hold multiple gigabytes data in memory in Node.js (there is a hard memory limit that I don't remember right now but you should be able to find it via google), and attempting to turn that into a JSON string would again require at least a multiple of that in terms of memory just to hold the resulting string data.

As you mentioned that there seems to also be an issue even if none of the row data is kept in memory, I will see if I can reproduce that over the coming days, and I'm grateful that you shared your test code, but I'd suggest you go back to the drawing board and try to figure out whether you can find a different solution approach.

commented

You can't hold multiple gigabytes data in memory in Node.js, and attempting to turn that into a JSON string would again require at least a multiple of that in terms of memory just to hold the resulting string data.

I agree. Most likely the fact that I save the data in json format accentuates the problem. and the out of memory exception is caused by this error on my part.
As soon as I find the time I will try to fix it.


Since I'm using this library for work I'm interested in being notified about the resolution status of this problem (if it's a library problem).

If you need clarification, I'm at your disposal.
Thanks again

Luca

Since I'm using this library for work I'm interested in being notified about the resolution status of this problem (if it's a library problem).

I don't think this is a problem in tediousjs, really. 😅 Is it okay if I go ahead and close this issue?

commented

I honestly haven't had a chance to look at it anymore.
I noticed that I am not using the latest version and this might be the broblem.
I am closing the issue for the moment... if I get more concrete evidence I will reopen it.
Thank you

Luca