segmentio / analytics-node

The hassle-free way to integrate analytics into any node application.

Home Page:https://segment.com/libraries/node

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Events are not sent in a Vercel serverless function

andreaminieri opened this issue · comments

Hi everybody, I'm using analytics-node 5.1.0 in a SSR application deployed on Vercel.

I wrote an API that is deployed as Vercel serverless functions.
I had some troubles to make it work and this is probably related to #245. These are the steps I followed for one function.

This is what I had initially:

app.post('/api/test', async function (req, res) {
  analytics.track({...},
    function (err, batch) {
      if (err) {
        console.error(`Error occured in track call: ${err}`)
      }
      console.log(`Flushed from track call: ${batch}`)
    }
  )

  console.log('Sending back HTTP 200...')

  res.status(200).json({
    message: 'Signal to segment sent.',
  })
})

This works locally and on my server console I get

Sending back HTTP 200...
Flushed from track call: undefined
Flushed from track call: [object Object]

But it doesn't work on Vercel, in the serverless function log I got just

Sending back HTTP 200...

The track call is not flushed and no event is sent to Segment. This is similar to what is explained in #245, probably Vercel serverless function finishes and kills all threads before the queue is actually flushed. Following the approach in #245, I updated the function as follows:

app.post('/api/test', async function (req, res) {
  analytics.track({...},
    function (err, batch) {
      if (err) {
        console.error(`Error occured in track call: ${err}`)
      }
      console.log(`Flushed from track call: ${batch}`)
    }
  )

  await analytics.flush(function (err, batch) {
    if (err) {
      console.error(`Error occured in flush call: ${err}`)
    }
    console.log('Flushed from flush call.')
  })

  console.log('Sending back HTTP 200...')

  res.status(200).json({
    message: 'Signal to segment sent.',
  })
})

First thing I noticed is that VSCode gives me a warning about using await on analytics.flush: 'await' has no effect on the type of this expression.
I'm using analytics-node 5.1.0 so flush should return a promise. Maybe this warning is due to the types definition of the function, I'm not sure whether this has an effect on the actual async behavior of the function. Anyway, this still works locally and the log is:

Sending back HTTP 200...
Flushed from flush call.
Flushed from track call: undefined
Flushed from track call: [object Object]

But it still doesn't work on Vercel, the serverless function log I got is:

Sending back HTTP 200...
Flushed from flush call.

Am I doing something wrong here?

Anyway I found a workaround to make it work. I wrote an async version of track that return a promise:

const asyncTrack = (payload) => {
  const promise = new Promise((resolve) => {
    analytics.track(payload, function (err, batch) {
      if (err) {
        console.error(`Error occured in track call: ${err}`)
      }
      console.log(`Flushed from track call: ${batch}`)
      resolve()
    })
  })
  return promise
}

and I used it in my function as follows:

app.post('/api/test', async function (req, res) {
  const trackPayload = {...}
  await asyncTrack(trackPayload)
  
  console.log('Sending back HTTP 200...')

  res.status(200).json({
    message: 'Signal to segment sent.',
  })
})

It works both locally and on Vercel, the log, as expected, is:

Flushed from track call: undefined
Flushed from track call: [object Object]
Sending back HTTP 200...

commented

Similar to other issues here, this is due to #309

flush does not guarantee that all inflight messages are sent before calling the given callback. Instead, flush simply sends a batch of queued messages and waits for only that batch's response before callback. ... This is contrary to common expectations that a "flush" function completely empties the buffer to the destination. Further, this means there is no way to know when both all queued and all inflight messages are full sent (fully flushed).

The reason your first attempt did not work is because Analytics will always flush the first message which removes it from the queue. So the callback provided to your explicit flush call is called immediately because the queue is empty. So the response is given and the function exits immediately before the track message has time to be sent to Segment's server. This is evidenced by the ordering of the console output.

Your second attempt works because you rely on the callback provided for track call instead. However, make sure you still explicitly call flush() (no need to await it) or set flushAt: 1. Otherwise, the message could be sitting in the queue for up to the default 10 seconds before being flushed. (This does not apply to the first track/identify/etc because the first one is always immediately flushed as previously described).

Also, the reason your callback is called twice (Flushed from track call is printed twice) is because of #308. To workaround you can use the promise to de-bounce as promises can only be resolved or rejected once:

const asyncTrack = (payload) => {
  return new Promise((resolve, reject) => {
    analytics.track(payload, err => (err? reject(err) : resolve()))
  })
  .catch(err => {
    if (err) {
      console.error(`Error occured in track call: ${err}`)
    }
    console.log(`Flushed from track call.`)
  });
}

Also watch out for #310.

I'm also using Vercel Serveless Functions and this issue is causing big headache and large loss of time for me/us. Would love to see a fix! (as others have stated, flush is not working as expected)

I agree with @yo1dog this is related to issue #309, but my suggestion is a bit different:

  1. create the Analytics object with a queue big enough for your messages, the default of 20 seems good enough given your example.
  2. mark the queue as flushed to avoid sending the first message immediately
  3. await for flush

Here is an untested example:

const analytics = new Analytics(YOUR_WRITE_KEY, { flushAt: Infinity, flushInterval: Infinity });
analytics.flushed = true; //  see https://github.com/segmentio/analytics-node/blob/master/test.js line 30

app.post('/api/test', async function (req, res) {
  analytics.track(... )

  await analytics.flush(...)

  console.log('Sending back HTTP 200...')

  res.status(200).json({
    message: 'Signal to segment sent.',
  })
})
commented

Queue size is not the only internal trigger for a flush. The queue is also flushed at an interval (though you could configure your Analytics class with ‘flushInterval: Infinity, flushAt: Infinity’) and also when the queue byte size crosses a threshold (though you could configure the undocumented ‘maxQueueSize’, you risk exceeding API limits). Further the internal flush triggers are undocumented, unpredictable, and could change without notice. So it is not feasible to simply assume your explicit flush is the only one that will occur. Not to mention that timing and/or API limits may require multiple flushes to occur.

@yo1dog yeah! I see what you say, and I agree!
I should have mentioned my assumptions clearly [1], or even better, configure the flushInterval and flushAt to Infinity.
Furthermore, I have updated my example with an explicit flushInterval and flushAt settings

[1] Assumptions

  1. only very few small messages are sent (maxQueueSize)
  2. there are no other time-consuming task (flushInterval)

closing, as the solution offered here will solve the issue: #309 (comment)