sourcegraph / appdash

Application tracing system for Go, based on Google's Dapper.

Home Page:https://sourcegraph.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Replace AggregateStore with InfluxDBStore

slimsag opened this issue · comments

What is AggregateStore?

AggregateStore is the most complex of the Appdash storage backends. Unlike MemoryStore which just collects traces, AggregateStore both collects traces and aggregates some data to provide some useful stats within the Appdash Dashboard page (like slowest average trace, etc).

Why we should use InfluxDBStore instead

It became clear to me after @bg451's comment that I have not done a great job conveying the overall direction or problems of storage backends in Appdash. The major issues seen today is that:

  • MemoryStore is very simple / lightweight, but doesn't support the Dashboard page (no aggregated data / metrics about traces). It's good, for example, in testing or within a lightweight CLI application.
  • AggregateStore has a number of serious problems:
    • After about ~130k traces for a given name (e.g. HTTP route), it becomes so slow that it can no longer store traces at all. In fact, this caused a few serious memory leaks to crop up in production for us, albeit in unrelated code.
    • It is extremely complex: for what a simple operation it is apparently doing at a high level, the implementation is extremely convoluted. But don't take my word for it, just check out the high level overview.
    • Implementing more features like more complex or exact queries, would be near impossible to manage given the code complexity.
    • is so slow that the Dashboard "time range selection" bar appears broken in real applications (load time is so bad).

In contrast, InfluxDBStore:

  • Has the potential to be significantly more performant that AggregateStore.
  • Like AggregateStore, it can be embedded within your Go process entirely (no external InfluxDB setup is required), this is the default setup.
  • Can in the future support connecting to an externally hosted InfluxDB server (enabling clusters, etc).
  • Is a real time-series database, supporting complex queries in an SQL-like language. This will let us make the Dashboard even cooler and answer more important questions about your application's performance in the future.

Conclusion

Due to the above reasons, and after much hard thought, I can only come to the conclusion that AggregateStore would slow us down by making the codebase more complex, would mislead new users into using it and thinking Appdash isn't for real-world work, etc.

The intent is to bring this project forward for all Appdash users, and make app tracing better than ever before. I don't take the decision to remove existing code in an incompatible way lightly, but do find this to be the best path forward.