segmentio / analytics-node

The hassle-free way to integrate analytics into any node application.

Home Page:https://segment.com/libraries/node

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Areas for Performance Improvements

kanishk1 opened this issue · comments

Hey team!
I wanted to raise a discussion on two areas where the client could improve in its performance.

Message ID Generation

The current method of generating a messageId uses md5 and a UUID.

In the source, there's currently some comments that explain the need for using md5 (uuid in browsers would fall back to Math.random() which isn't great), however in uuid@8.0.0 and above (at the moment this package is using v8.3.2) this issue no longer exists.

Hashing the message and creating a uuid can be fairly expensive for every single message and there's no way to override the messageId generation with my own implementation. I was wondering if the team is open to removing the need for md5'ing the message contents and just using a UUID, or exposing an api for consumers to create messageIds on their own.

Checking queue size

Every time we enqueue a message, we check the size of the entire queue by JSON.Stringify()'ing each message and adding the lengths together to get an approximation of the total size (in bytes). This is pretty costly and only gets worse with the more messages added to the queue.

I've had a look at external libraries and implementations which might make this easier but each come with their own problems. A simple fix for now could be to save the current size of the queue (totalQueueSize) and every time a message is added, you only stringify the new message and update the totalQueueSize. This way, as the queue grows, the performance of enqueue won't get worse.

I'm keen to get your feedback on these ideas, would be awesome to make these improvements that could provide nice performance benefits to all consumers.