On-Demand Service stops tracking uptime after the restart

Question

On-Demand Service stops tracking uptime after the restart

binlab opened this issue 2 years ago · comments

I got a similar issue with traefik-ondemand-plugin#23.
The issue happens when traefik-ondemand-service was crashed and reloaded or just redeployed in time when services were UP. After it happened, traefik-ondemand-service stops tracking uptime of all services which should be as expected.
Does it possible to add start initialization for tracking configured in timeouts in each service?

Alexis Couvreur · Answer 1 · Fri Apr 29 2022 16:13:33 GMT+0800 (China Standard Time)

The service tracks the service in memory, which means everything is lost upon restart.

Do you think that it should be kept in a persistent storage ?

If so, how would you implement such updates, because the updates are as frequent as there is an http call.

I think it should be a sync to file every 5s or so. I might implement that in a future release, what do you think?

Stanislas Bruhière · Answer 2 · Mon May 02 2022 23:37:24 GMT+0800 (China Standard Time)

Hey @acouvreur .
We discussed this matter today, I think this would be a great improvement !
Sometimes the container restarts and the service looses its state.

I think using a periodic write to file would make it scale better than a write every HTTP request. It would also make the implementation easier.

Maybe we could make the sync interval configuratble via an env variable.

Alexis Couvreur · Answer 3 · Tue May 03 2022 03:00:37 GMT+0800 (China Standard Time)

I'm looking for a Key Value store in memory that can be marshalled and unmarshalled to json

The lib im currently using is https://github.com/workshop-depot/tinykv

I might fork this to add the releveant features if I don't find any lib

https://github.com/tidwall/buntdb might be overkill and does not have a function upon eviction

Alexis Couvreur · Answer 4 · Sun May 08 2022 07:56:40 GMT+0800 (China Standard Time)

You can try it out now, take a look at #30

Feedback welcomed

Stanislas Bruhière · Answer 5 · Tue May 10 2022 16:03:16 GMT+0800 (China Standard Time)

Nice !
I'll give it a try today.

Stanislas Bruhière · Answer 6 · Wed May 11 2022 23:18:01 GMT+0800 (China Standard Time)

Overall it's working very well !
I'm getting some errors
fatal error: concurrent map iteration and map write,
I don't really know if it's linked to your changes though. We encountered 5 restarts in one day.
It might be caused by another change as I updated from version 1 to 1.8.0-beta.1

I've been using the file to monitor the state of the service. I previously worked on generating Prometheus metrics for other projects. Do you think it would be a good idea to expose metrics for the same service ? We could also have another container that would read the state file to expose these metrics

Alexis Couvreur · Answer 7 · Wed May 11 2022 23:38:07 GMT+0800 (China Standard Time)

Thanks for the feedback, can you provide more info about the context of this log ?
fatal error: concurrent map iteration and map write,

It's probably because of me though 😆 I'll check my fork of the lib https://github.com/acouvreur/tinykv

I think exposing prometheus metrics is a good idea, it can be directly exposed through this service.

Stanislas Bruhière · Answer 8 · Thu May 12 2022 16:19:41 GMT+0800 (China Standard Time)

Sure, here are the end of the logs before a restart :
ods-error.log

Alexis Couvreur · Answer 9 · Thu May 12 2022 16:25:01 GMT+0800 (China Standard Time)

Pushed a fix on beta
96547ce

acouvreur/tinykv@45fc921

Stanislas Bruhière · Answer 10 · Thu May 12 2022 16:44:10 GMT+0800 (China Standard Time)

Upgrading to v1.8.0-beta.2 then !
I'll let you know if I encounter other issues.