acouvreur / traefik-ondemand-service

Traefik ondemand service for the traefik ondemand plugin

Home Page:https://pilot.traefik.io/plugins/605afbdba5f67ab9a1b0e53a/containers-on-demand

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

On-Demand Service stops tracking uptime after the restart

binlab opened this issue · comments

commented

I got a similar issue with traefik-ondemand-plugin#23.
The issue happens when traefik-ondemand-service was crashed and reloaded or just redeployed in time when services were UP. After it happened, traefik-ondemand-service stops tracking uptime of all services which should be as expected.
Does it possible to add start initialization for tracking configured in timeouts in each service?

The service tracks the service in memory, which means everything is lost upon restart.

Do you think that it should be kept in a persistent storage ?

If so, how would you implement such updates, because the updates are as frequent as there is an http call.

I think it should be a sync to file every 5s or so. I might implement that in a future release, what do you think?

Hey @acouvreur .
We discussed this matter today, I think this would be a great improvement !
Sometimes the container restarts and the service looses its state.

I think using a periodic write to file would make it scale better than a write every HTTP request. It would also make the implementation easier.

Maybe we could make the sync interval configuratble via an env variable.

I'm looking for a Key Value store in memory that can be marshalled and unmarshalled to json

The lib im currently using is https://github.com/workshop-depot/tinykv

I might fork this to add the releveant features if I don't find any lib

https://github.com/tidwall/buntdb might be overkill and does not have a function upon eviction

You can try it out now, take a look at #30

Feedback welcomed

Nice !
I'll give it a try today.

Overall it's working very well !
I'm getting some errors
fatal error: concurrent map iteration and map write,
I don't really know if it's linked to your changes though. We encountered 5 restarts in one day.
It might be caused by another change as I updated from version 1 to 1.8.0-beta.1

I've been using the file to monitor the state of the service. I previously worked on generating Prometheus metrics for other projects. Do you think it would be a good idea to expose metrics for the same service ? We could also have another container that would read the state file to expose these metrics

Thanks for the feedback, can you provide more info about the context of this log ?
fatal error: concurrent map iteration and map write,

It's probably because of me though 😆 I'll check my fork of the lib https://github.com/acouvreur/tinykv

I think exposing prometheus metrics is a good idea, it can be directly exposed through this service.

Sure, here are the end of the logs before a restart :
ods-error.log

Upgrading to v1.8.0-beta.2 then !
I'll let you know if I encounter other issues.