boundary / folsom

Expose Erlang Events and Metrics

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Table ownership differences can leave folsom inconsistent

russelldb opened this issue · comments

The history metric creates a new ets table when a new metric is created. The owner of that table is the process that called folsom_metrics:new_history(Name). However, the folsom table is owned by the folsom supervisor. In the case that the process that owns the history exits the history metric table itself crashes, but the entry in the folsom metrics table remains.

Folsom is then in an inconsistent state. Using folsom_metrics_histogram_ets to create (and therefore own) the table would probably help. Ideally folsom should have a single process that owns all ets tables so that there is consistency (a crash takes them all away, they're insulated from calling process crashes.) Better still would be to implement something like the strategy in this article http://steve.vinoski.net/blog/2011/03/23/dont-lose-your-ets-tables/

I'm raising this as a request for comments before I factor such a strategy into folsom. Opinions?

Steve's post seems like a good setup and makes sense to me. Yet another item for the folsom to do list. I'll have some time soon if you would like to collaborate on this and the race condition issues.

Yes. That would be good. When we get 1.2 out I've got some time to work on folsom.

At zotonic we recently ran into this issue too. Indeed all tables of folsom should be owned by one process.

Automatic bookkeeping of metrics can be implemented by monitoring the process which creates the new metric. When that process dies folsom can do the bookkeeping without any problem.

The crash we get from time to time.

2013-03-20 17:02:52.468 [error] emulator Error in process <0.32714.199> on node 'zotonic001@Lamma' with exit value:
{badarg,[{ets,delete,[26940620904],[]},{folsom_sample_exdec,delete_and_rescale,4,
[{file,"src/folsom_sample_exdec.erl"},{line,122}]},{folsom_sample_exdec,rescale,5,[{file,"src/folsom_sample_exdec.erl"},
{line,106}]},{folsom_sample_exdec..

Anyone have an interest in tackling this one?

Just ran into this one myself! think I'll hold off using histories for now. I'm going to just stick the few things I wanted it for in a queue in a gen_server, but not ideal...

Looking at the table viewer, it seems the Tids are changing, but the histories index isn't getting updated at the same time.

Test message, ignore me.

Folsom has moved, please resubmit your issue at https://github.com/folsom-project Thanks!