"list series" has gotten slow.

Question

"list series" has gotten slow.

Dieterbe opened this issue 10 years ago · comments

list series /regex/ used to return in 150~300ms for me.
nowadays it's around 480ms.
and list series now takes about 860ms. not sure what this used to be, i have some old notes saying 2-3s but those may be outdated.

anyway this probably has something to do with #830: we now sort everytime we request the series list.
i recently upgraded to rc5 so that seems to make sense.

it's important that list series and list series /regex return speedily
for example graphite-influxdb gets user requests, then needs to figure out which series it needs, for which it executes list series /regex/ and then queries influxdb for data. It caches the list outputs, but still with a cold cache (new graph) it can take several seconds before we even know which series to use (because the user supplies multiple expressions)
I can imagine othen monitoring systems being built on top of influx will need similar things.

so how can me make this as fast as it can be?
maybe influx can keep the sorted list of series readily available for querying?

Vladimir Smirnov commented 10 years ago

Any news?

Kenterfie · Answer 1 · Thu Sep 04 2014 05:17:47 GMT+0800 (China Standard Time)

A cache of series nodes would be great. I get timeout ever i try to run a list series or something else with many sub nodes.

Sahil Thapar · Answer 2 · Thu Sep 04 2014 12:45:10 GMT+0800 (China Standard Time)

👍 for cache of series nodes.

Vladimir Smirnov · Answer 3 · Thu Sep 04 2014 16:08:38 GMT+0800 (China Standard Time)

It would be great to have cache of series nodes, right now even if I revert this commit, it still a bit slow (initial list series in graphite tooks like 16 seconds, good drop from 40, but still too much).

John Shahid · Answer 4 · Thu Sep 04 2014 23:36:51 GMT+0800 (China Standard Time)

I find it surprising that list series is taking that long even with the presence of sorting. How many series do you guys have ? Could this be the overhead of json serialization ?

Dieter Plaetinck · Answer 5 · Thu Sep 04 2014 23:47:18 GMT+0800 (China Standard Time)

$ influx-cli -db graphite <<< "list series" | wc -l
188366

Vladimir Smirnov · Answer 6 · Fri Sep 05 2014 00:00:18 GMT+0800 (China Standard Time)

Not so much in fact. It's only 790339 here (including shard spaces).

Kenterfie · Answer 7 · Fri Sep 05 2014 01:10:35 GMT+0800 (China Standard Time)

i have also currently 200k, but i have switched to another solution. i try influxdb again, when it is stable and fast perhaps.

Vladimir Smirnov · Answer 8 · Fri Sep 05 2014 01:17:57 GMT+0800 (China Standard Time)

Yup, the problem is that 700k is not a big value. I've saw a lot more series in single database (around 30kk) on a single machine.

p.s. for me problem semi-solved with reverting commit (get more then 2x performance there) and making cache more aggressive. Though it's not that good and cache can expire :(

Paul Dix · Answer 9 · Fri Sep 05 2014 01:48:25 GMT+0800 (China Standard Time)

With 200k series, what are your expectations for running list series? How
fast is fast enough? Are you doing a query where you return all 200k
series? Is this running over an internet connection? How long are the
series names and the size of the raw data?

On Thu, Sep 4, 2014 at 1:10 PM, Kenterfie notifications@github.com wrote:

i have also currently 200k, but i have switch to another solution. i try
influxdb again, when it is stable and fast perhaps.

—
Reply to this email directly or view it on GitHub
#884 (comment).

Kenterfie · Answer 10 · Fri Sep 05 2014 02:03:54 GMT+0800 (China Standard Time)

The query shouldn't be longer than 5s for 200k. Connected over 1Gbit ethernet. Average size of a series name is 40-50 character.

Dieter Plaetinck · Answer 11 · Fri Sep 05 2014 02:13:52 GMT+0800 (China Standard Time)

I'ld say <100ms for both list series and list series /regex/, the latter is the more common case for me. i connect from localhost for my tests to rule out transmission times.

John Shahid · Answer 12 · Fri Sep 05 2014 03:13:44 GMT+0800 (China Standard Time)

There's no such a thing as absolute time for an operation, I don't understand what does < 100ms means. Specially for regex list series, that's an O(n) operation, or at least this is how it's implemented today. This doesn't really matter since we have to iterate through all series names anyway to create the resulting series object, I don't see a way around the operation being O(n). I benchmarked the list series operation on my local machine and here's the numbers I got:

with sorting

200 -> 876   ms
700 -> 3281 ms
700 -> 3384 ms

w/o sorting

700 -> 2393 ms

As you can see from the with sorting section, the operation scales linearly with the number of series. Getting rid of sorting reduces the operation's time by roughly a second. From the profile I got it looks like most of the time is spent doing json marshaling. We have two options here, one is to use a faster json encoder that doesn't rely on reflection. The other option is to have an option to return the list in text format with a user specified delimiter. Thoughts ?

Dieter Plaetinck · Answer 13 · Fri Sep 05 2014 03:29:02 GMT+0800 (China Standard Time)

I guess I'm mostly surprised by how long the serialisation takes (as i also noticed in the OP that list series /with some regex/ is faster then plain list series, so regex matching a string is faster than json encoding it? would love to see your benchmark with also a scenario of regex filtering (filtering down to small, known subset, say 1~10%).

if we have a large number of things of all the same, known type, then using a better json encoder will probably make a lot of difference, at work we also sometimes write a custom json encoder (and decoder, actually) if we know the format of the inputs and outputs, because it makes a big difference compared to a standard encoder that has to account for all possible scenarios.

Vladimir Smirnov · Answer 14 · Thu Sep 11 2014 20:48:14 GMT+0800 (China Standard Time)

Though I don't know Go too well, but I've tried to hack influx a bit. I've replaced json.Marshal(SerializeSeries(...)) with just a function that already prepares json (byte.Buffer and write data directly to it). Though it doesn't help a lot. It seems that there main perofrmance issue is that data is sorted inside metastore/store.go and also inside SerializeSeries.

As I don't need sorted data - I've got ''list series" to run for 1.1s on 1kk series. Without this same query tooks 3.2s.

So still - main performance problem is sorting. With different json marshaling performance also can be improved, but not much.

Anyway, there is still need to implement series name cache, because anyway performance still too slow.

Tim Sampson · Answer 15 · Thu Sep 11 2014 21:09:51 GMT+0800 (China Standard Time)

Could the sorting not be made optional? As personally, I need the sorting but understand all the comments here about the need for speed. Perhaps, something like (in the fashion of sql, which influxdb seems to loosely follow)

list series order by asc

and without the "order by" it would skip the sorting and return faster.

Vladimir Smirnov · Answer 16 · Thu Sep 11 2014 22:16:40 GMT+0800 (China Standard Time)

My point was that the problem is not only with JSON Marshaling. There are 3 parts where it looses speed:

Marshaling (10-15%)
Sorting (30%, done twice - when getting data from metastore and before passing to json encoder)
Querying the data (55-60%).

So it won't be enough to speed up of one of this things. And one of the most obvious way to improve situation - to store list of the series (sorted or unsorted - doesn't matter) somewhere in memory and do all the operations with it, not to query backends for seires.

John Shahid · Answer 17 · Thu Sep 11 2014 23:40:53 GMT+0800 (China Standard Time)

@vladimir-smirnov-sociomantic how did you get these number ? also the sorting in SerializeSeries is a nop since list series returns one time series. That leaves sorting in the coordinator which according to my numbers contribute to a second (more or less). We can definitely make the sorting optional something along the lines of what @sanga mentioned earlier.

Vladimir Smirnov · Answer 18 · Fri Sep 12 2014 00:34:11 GMT+0800 (China Standard Time)

@jvshahid
This is what I've done: https://gist.github.com/vladimir-smirnov-sociomantic/4c8bad1185258bf86aa7

Values - I've created 1000008 (can be rounded to 1kk, though I've accidentely created 8 series more :)) series in influxdb (2 points each). After that run 'time curl -G 'http://localhost:8086/db/test/series?u=test&p=test&pretty=false' --data-urlencode "q=list series" >/dev/null'. With patch above I get 1.3s on first run, on 3 runs it's avg is 1.1s (lowest 0.6, highest 1.3). For 10 runs (without cold run) it's avg 0.7 (highest 0.9, lowest 0.6).

With influxdb 0.8.2 vanilla same DB it's very consistent between runs and avg time for 10 queries is 2.7s, with highest 3s, lowest 2.45.

If I only change SerializeSeries (leave sorting in place): 1.9s avg, highest 2.2s, lowest 1.7s

If I only remove sorting, but leave JSON Marshaling in place it's: 1.55s avg, highest 1.65, lowest 1.3.

So proper numbers will be:
Marshaling: ~30%
Sorting: ~43%

Without sorting and with custom marshaling 'list series' will be almost 4x faster.

Testing HW is simple desktop - i5-3470S with performance governour, 8GB ram, db is on HDD (cheap seagate ST500DM002).

UPD: modified paste link, fixed small bug. Get this values:
Column name - what patches are on the version. Vanilla - 0.8.2 without patches. Marshaling - only JSON marshalin modified, Sorting - only sorting disabled, Marshaling+Sorting - full patch. All time in seconds (what time says for curl).

        Vanilla Marshaling  Sorting     Marshaling+Sorting
        3.021   2.08        1.653       0.967
        2.774   2.1713      1.616       0.601
        2.668   1.73        1.57        0.624
        2.459   2.03        1.315       0.924
        2.754   2.067       1.634       0.878
        2.71    1.746       1.586       0.588
        2.684   2.033       1.559       0.892
        2.406   1.757       1.324       0.601
        2.716   2.023       1.821       0.858
        2.925   2.022       1.764       0.83
AVG     2.71    1.97        1.58        0.78
STDEV   0.18    0.16        0.16        0.15
MAX     3.021   2.1713      1.821       0.967
MIN     2.406   1.73        1.315       0.588

Vladimir Smirnov · Answer 19 · Fri Sep 12 2014 01:18:21 GMT+0800 (China Standard Time)

Using benchmark for ListSeries ( make integration_test only=SingleServerSuite.BenchmarkListSeries verbose=on benchmark=on) I've got following values:
Vanilla:

PASS: single_server_test.go:230: SingleServerSuite.BenchmarkListSeries        20        2754023298 ns/op

START: single_server_test.go:42: SingleServerSuite.TearDownSuite
PASS: single_server_test.go:42: SingleServerSuite.TearDownSuite 0.098s

OK: 1 passed
--- PASS: Test (88.68 seconds)
PASS
ok      github.com/influxdb/influxdb/integration        88.700s

Marshaling:

PASS: single_server_test.go:230: SingleServerSuite.BenchmarkListSeries        20        2262612770 ns/op

START: single_server_test.go:42: SingleServerSuite.TearDownSuite
PASS: single_server_test.go:42: SingleServerSuite.TearDownSuite 0.099s

OK: 1 passed
--- PASS: Test (78.81 seconds)
PASS
ok      github.com/influxdb/influxdb/integration        78.838s

Sorting:

PASS: single_server_test.go:230: SingleServerSuite.BenchmarkListSeries        50        2093408024 ns/op

START: single_server_test.go:42: SingleServerSuite.TearDownSuite
PASS: single_server_test.go:42: SingleServerSuite.TearDownSuite 0.109s

OK: 1 passed
--- PASS: Test (138.24 seconds)
PASS
ok      github.com/influxdb/influxdb/integration        138.294s

Sorting + Marshaling:

PASS: single_server_test.go:230: SingleServerSuite.BenchmarkListSeries        50        1617170577 ns/op

START: single_server_test.go:42: SingleServerSuite.TearDownSuite
PASS: single_server_test.go:42: SingleServerSuite.TearDownSuite 0.100s

OK: 1 passed
--- PASS: Test (114.17 seconds)
PASS
ok      github.com/influxdb/influxdb/integration        114.194s

Dieter Plaetinck · Answer 20 · Fri Oct 03 2014 02:50:24 GMT+0800 (China Standard Time)

unfortunately I can't compile influxdb from scratch right now because I can't build rocksdb but your patch looks really neat @vladimir-smirnov-sociomantic , i wish i could compile and run it. this looks like a fairly easy quick-win to boost list series performance. any chance we can get this in @jvshahid ?

Dieter Plaetinck · Answer 21 · Sat Oct 25 2014 02:34:52 GMT+0800 (China Standard Time)

uhm, this ticket is about list series being slowness.
hope @vladimir-smirnov-sociomantic`s patch gets merged soon....

Vladimir Smirnov · Answer 22 · Sat Oct 25 2014 02:47:36 GMT+0800 (China Standard Time)

It will be still slow, just a bit better.

@jvshahid @pauldix if you think that my patches (or parts of them) can be merged, I can make a PR with them. Just say if you'll need only Marshaling or also disable sorting.

John Shahid · Answer 23 · Mon Oct 27 2014 23:45:48 GMT+0800 (China Standard Time)

There are some changes proposed by @pauldix #1059. Depending on the final cut of that proposal this story may or may not be relevant. We will probably just push back on this story for a little bit.

Dieter Plaetinck · Answer 24 · Tue Oct 28 2014 05:41:41 GMT+0800 (China Standard Time)

i didn't see anything in #1059 about speeding up listing of series, did i miss something? either way #1059 is about major refactoring that will probably take a while to get to.
To make influxdb usable as graphite backend, a solution/workaround for this problem would be welcome much sooner. Not to troll or fuel a negative discussion, but to explain how important this is for graphite users: we haven't run influxdb at $dayjob since these problems. I guess I could switch back to running v0.8.0-rc.4, but before I went down that road I wanted to see if this problem would be addressed. just disabling the sorting again/making it configurable or applying @vladimir-smirnov-sociomantic's patch would go a long way with fairly minimal work.

Dieter Plaetinck · Answer 25 · Fri Oct 31 2014 07:50:38 GMT+0800 (China Standard Time)

hey @vladimir-smirnov-sociomantic when i run your patch I got the error invalid character 'x' in string escape code
after some debugging i noticed that your code sometimes puts \ instead of \\:

[g1 ~]$ diff official.txt patched.txt 
(...)
< "servers.domU-foo.diskspace.\\x2f.byte_avail"]
---
> "servers.domU-foo.diskspace.\x2f.byte_avail"]
(...)
< "stats.dfbar\\cli\\command\\consumer\\foo.age"]
---
> "stats.dfbar\cli\command\consumer\foo.age"]

should be an easy fix, will fix it unless you beat me to it :)
other then that, the json output is the same format as the official one, so nice work!

Vladimir Smirnov · Answer 26 · Fri Oct 31 2014 17:41:52 GMT+0800 (China Standard Time)

Yup, it can be that, cause I haven't got any metrics with '' and with unicode symbols, so it should be a bug.

Dieter Plaetinck · Answer 27 · Sat Nov 01 2014 05:00:33 GMT+0800 (China Standard Time)

my results:
with 433k series.

stock 0.8.3 between 2.5 ~3.5s
@vladimir-smirnov-sociomantic's patch + Dieterbe@8d2e14c to fix the backslashes = 2.5s
@vladimir-smirnov-sociomantic's patch + just string replace \ by \\ (which is all that's needed actually, i think. at least it works for me) = 1.75 ~ 2.2s
same but compression disabled = 1.5 ~ 1.9

so not bad :) next up is just removing all series that have a \ in them and using your pure patch, should be a bit faster still.

EDIT: my numbers are using influx-cli which uses standard influx client, which does json decode as well. i should redo this with pure wget to eliminate json decode time.

Todd Persen · Answer 28 · Wed Nov 26 2014 00:40:20 GMT+0800 (China Standard Time)

The underlying series structure is going to change completely for v0.9.0, so this is no longer actionable. Closing it out.

Dieter Plaetinck · Answer 29 · Wed Nov 26 2014 00:45:13 GMT+0800 (China Standard Time)

@toddboom AFAIK this is something that 0.9 doesn't address. because this is due to the json encoding, cacheless list iteration & regex checking and sorting of series. is 0.9 going to adress any of these? will it keep a sorted list in memory perhaps?

Vladimir Smirnov · Answer 30 · Wed Nov 26 2014 01:43:35 GMT+0800 (China Standard Time)

I think it's better to leave this open until benchmarks will proof that it's really fixed. Because @Dieterbe is right, it's about things that will be still there even series struct will be completely rewritten.

Todd Persen · Answer 31 · Wed Nov 26 2014 03:10:56 GMT+0800 (China Standard Time)

@Dieterbe v0.9.0 is effectively going to be a complete rewrite. Also, because of tags, you should end up with far fewer overall series. For the sake of managing actionable issues on our end, I'd prefer to keep this closed and reopen if necessary once v0.9.0 is out.

Vladimir Smirnov · Answer 32 · Wed Nov 26 2014 03:14:57 GMT+0800 (China Standard Time)

@toddboom if there is no caching and result will be still available only as json (with same json marshaling library) it'll still be a problem. You just push limits a bit further, and if not 1kk, then 10kk series will be a problem in terms of performance.

Though, of course, it's up to you to decide if you'll need separate issue for that. Just prepare that the problem will still be there, because it's cause is still in the code.

Paul Dix · Answer 33 · Wed Nov 26 2014 07:33:38 GMT+0800 (China Standard Time)

One question is, do you need to be returning entire series set in the result?

The model for 0.9.0 will be that you have series names and you have tags and their values. If you do list series, you're likely to only get thousands or maybe tens of thousands of results. If you get the values for a tag, it's the same.

Basically, you should be able to drill down to what you're looking for without getting 100k+ series names back in a single result.

But we'll also be looking at how to efficiently render the result set JSON.

Vladimir Smirnov · Answer 34 · Wed Nov 26 2014 17:10:59 GMT+0800 (China Standard Time)

@pauldix for graphite, returning entire series set is used to display all the metrics that system have. So yes, I still need that. And I don't think that tags will help me much. The only thing I need to filter out - is shards that were used for retention, but if I have 200k distinct metrics, I'll still get 200k records on those query. If I got 1kk - I'll get 1kk.

Good to hear about JSON.

Paul Dix · Answer 35 · Wed Nov 26 2014 20:53:22 GMT+0800 (China Standard Time)

I'm not sure why you would have 200k series names though. You end up having this in Graphite because it forces you to encode metadata (i.e. stuff you put in tags) into the series name. This will not be the preferred way to do things in Influx.

Let's go back to the beginning. What is your use case? Why do you need a list of series? What is the question you need answered (stated in regular English)?

Vladimir Smirnov · Answer 36 · Wed Nov 26 2014 21:06:21 GMT+0800 (China Standard Time)

I'd like to use influxdb as a Graphite backend. One of the usecases, where InfluxDB now performs not that good is getting list of all available metrics (for dashboards). It's done by "list series" query. When you have 200 hosts with 1k metrics for each, you'll get 200k metrics as a result for list series. Right now this query is very-very slow. And 200k is not even near the limit. On initial listing it performs query like 'list series /.*/' now. About why there are 200k series - if you got 200k independent time series of data, it'll end in 200k series in influx, and it won't be possible to reduce this with tags, right?

If influx's graphite plugin will be also modified to use tags, maybe it'll help somehow to make list series queries faster.

Paul Dix · Answer 37 · Wed Nov 26 2014 21:14:42 GMT+0800 (China Standard Time)

You can't visualize 200k time series on a dashboard (obviously). So this begs the question, what query needs to be answered to draw a dashboard? For example, if you want all hosts in a datacenter, then that's a query that can be done. Or if you want all series for a given host for the mysql service, then that's a query that can be answered.

The point is that the query language is there to filter down the result set. You should never be streaming the entirety of the metadata set down to a client.

As for the graphite plugin, it can be modified to use tags. But it will have to make assumptions about your series names. The most likely one I can think of is to rip apart the name like this:

(tagKey.tagValue)*seriesName

That is you have 0 or more tag value pairs followed by the series name. This is how I've seen most people structuring Graphite names.

What are you feeding data into Influx with? What's pushing the Graphite protocol? The best thing to do would be to modify the collectors to actually push up tag style data.

Vladimir Smirnov · Answer 38 · Wed Nov 26 2014 21:28:19 GMT+0800 (China Standard Time)

I'm feeding data with collectd right now.

Ok, even if it's possible to get only hosts, only metric groups, etc. It's still possible to have a lot of hosts, a lot of metrics inside group etc. Yeah, it won't be 200k, but currently several of my hosts have approx. 10k individual series in one of it's groups.

Paul Dix · Answer 39 · Wed Nov 26 2014 21:29:30 GMT+0800 (China Standard Time)

ok, so we'll want to make sure that things that have cardinality in the
tens of thousands are able to return results quickly

On Wed, Nov 26, 2014 at 8:28 AM, Vladimir Smirnov notifications@github.com
wrote:

I'm feeding data with collectd right now.

Ok, even if it's possible to get only hosts, only metric groups, etc. It's
still possible to have a lot of hosts, a lot of metrics inside group etc.
Yeah, it won't be 200k, but currently several of my hosts have approx. 10k
individual series in one of it's groups.

—
Reply to this email directly or view it on GitHub
#884 (comment).

Dieter Plaetinck · Answer 40 · Wed Nov 26 2014 21:35:48 GMT+0800 (China Standard Time)

graphite itself doesn't support a tagging system though. so when using influxdb as a backend for graphite we're stuck with string keys.

That is you have 0 or more tag value pairs followed by the series name. This is how I've seen most
people structuring Graphite names.

I don't think you can automatically spot where the tags are. the dimensions (nodes in the graphite metric string) that you want to seggregate or aggregate by (those are the ones that should become tags) are not always at the end (but they are often second to last and before), they are sometimes at the beginning but more often start and the 2nd or 3nd node.
i'm skeptical about an approach that automatically converts graphite strings into series+tags for stock graphite setups, i don't believe this is feasible, so i would just store them in influx the way it is in graphite.

to your point, the queries that graphite-influxdb invokes have a filter to narrow down (i.e. a regex), this is the most common case. that said, i think there are some cases for getting the entire list: for example some graphite dashboards do this to build an index, graph-explorer does this to run metrics2.0 plugins, and I personally sometimes do it to count how many series i have (in influx-cli list series | wc -l) to validate my stuff is working correctly.

Panos · Answer 41 · Tue Jan 13 2015 01:26:27 GMT+0800 (China Standard Time)

Will have to agree with @Dieterbe here.

In our testing we are finding InfluxDB query performance (select, not list series) where many data points are returned to be entirely CPU limited with the bottleneck seeming to be serialisation and/or sorting as with this issue.

The API refactor will not change this, InfluxDB will still be returning the same amount of data even after the refactor.

Our use case is querying a days' or more worth of 1 second sampled metric data from Grafana in order to display a dashboard. The data itself is Graphite metric series ingested directly by InfluxDB.

1 days' worth of 1 second resolution data means 86400 (time,value) data point tuples to be returned by InfluxDB per metric name. Two months worth of 60 second resolution data (default graphite metric resolution) amounts to the same number of datapoints and identical response times.

On a 4 CPU 2.8Ghz Xeon E312xx InfluxDB takes ~3sec to serialise 86400 data points with linear scaling as number of days requested/datapoints returned is increased.

1 day:

2015-01-12 17:19:32,004 - DEBUG - Sending request - http://10.206.77.194:80/render?from=01:00_20150106&until=01:00_20150107&target=stats.amers.alpha-us1-cell.rtd-cph-idsi.us1i-cphidsi01.ids.perf.inUpdateRate&format=json
2015-01-12 17:19:35,285 - INFO - Query duration - 3.281203 sec. Datapoints: 86401
2015-01-12 17:19:35,392 - DEBUG - Sending request - http://10.206.77.194:80/render?from=01:00_20150106&until=01:00_20150107&target=<graphite series>&format=json
2015-01-12 17:19:38,690 - INFO - Query duration - 3.298818 sec. Datapoints: 86401

2 days:

2015-01-12 17:18:15,764 - DEBUG - Sending request - http://10.206.77.194:80/render?from=01:00_20150106&until=01:00_20150108&target=<graphite series>&format=json
2015-01-12 17:18:22,712 - INFO - Query duration - 6.948422 sec. Datapoints: 172801

3 days:

2015-01-12 17:20:19,836 - DEBUG - Sending request - http://10.206.77.194:80/render?from=01:00_20150106&until=01:00_20150109&target=<graphite series>&format=json
2015-01-12 17:20:32,875 - INFO - Query duration - 13.39656 sec. Datapoints: 259201

4 days:

2015-01-12 17:20:35,064 - DEBUG - Sending request - http://10.206.77.194:80/render?from=01:00_20150106&until=01:00_20150110&target=<graphite series>&format=json
2015-01-12 17:20:54,630 - INFO - Query duration - 19.565301 sec. Datapoints: 345601

We can provide the script used to perform these queries if that would be useful.

The underlying InfluxDB queries are normal select time,value from <series name> where time > X and time < Y order asc queries generated by the graphite_influxdb handler.

As this issue is about list series being slow, should I make a new issue for read query performance?

Todd Persen · Answer 42 · Tue Jan 13 2015 03:10:54 GMT+0800 (China Standard Time)

@pkittenis The changes for v0.9.0 will include much more than an API refactor. The entire codebase has been rewritten, and introduction of indexed tags will eliminate the need for such a proliferation of series names. In light of that, we're going to keep this issue closed. There may still be serialization overhead we'll want to address down the road, but I think that's best left for a separate issue once the new codebase has been released and profiled.

Paul Dix · Answer 43 · Tue Jan 13 2015 06:40:40 GMT+0800 (China Standard Time)

My other question is why are you returning 8600 data points for graphing?
You can't visualize that much raw data. You should be using rollup
intervals. This means you should be returning anywhere from 200-1000 data
points for a given series that you're visualizing.

That being said, we're working on performance enhancements across the board
(API serialization included)

On Mon, Jan 12, 2015 at 2:10 PM, Todd Persen notifications@github.com
wrote:

@pkittenis https://github.com/pkittenis The changes for v0.9.0 will
include much more than an API refactor. The entire codebase has been
rewritten, and introduction of indexed tags will eliminate the need for
such a proliferation of series names. In light of that, we're going to keep
this issue closed. There may still be serialization overhead we'll want to
address down the road, but I think that's best left for a separate issue
once the new codebase has been released and profiled.

—
Reply to this email directly or view it on GitHub
#884 (comment).

Vladimir Smirnov · Answer 44 · Tue Jan 13 2015 07:20:35 GMT+0800 (China Standard Time)

@pauldix for graphing purpose you need to get list of all series, except of the retention scheme.

And for 8600 data points for graphing... well... I know people that are doing that and they say that it's useful (at my current work one of the manager graph stats for all the client-related stats on one graph, and it's thousands of lines, he says that he can see when something bad happens and also he's saying that he need to have all thousands lines)

Panos · Answer 45 · Tue Jan 13 2015 19:38:48 GMT+0800 (China Standard Time)

Thanks for the update, will wait for 0.9.0 to retest.

While I would agree rollup intervals should be used the queries themselves are generated by grafana via the graphite_influxdb handler. The queries are not something we're doing manually.

Any grafana dashboard with an influxdb backend will generate those queries and try and retrieve 86400 data points for the default 1min resolution graphite metric series if the time range spans 2 months or more.

Adam · Answer 46 · Tue Mar 03 2015 20:20:24 GMT+0800 (China Standard Time)

Getting a list of all datasets is probably the most basic operation in any DBMS. If this is a serious project, then pagination is a must. I hope this feature will be added soon.

Paul Dix · Answer 47 · Wed Mar 04 2015 06:41:45 GMT+0800 (China Standard Time)

This is fixed in 0.9.0-rc7. Actually since before then. For example you can do:

SHOW SERIES LIMIT 10 OFFSET 20

Adam · Answer 48 · Thu Mar 05 2015 05:38:51 GMT+0800 (China Standard Time)

Oh, that's good news :)

Is it also possible to paginate results from queries such as SELECT * FROM /.*/ LIMIT 1?
It is too expensive (in terms of time, bandwith and memory) when there are hundreds of thousands of series :(

Dieter Plaetinck · Answer 49 · Tue Mar 17 2015 23:30:23 GMT+0800 (China Standard Time)

for those using influx 0.8 as graphite backend,
i added support to graphite-influxdb for using elasticsearch to query metric metadata, bypassing influxdb list series. before: 400~800ms, via ES i get <50ms most of the time. with a few outliers, but all cases (median, upper 90th, upper) always at least as good as influx.

(see https://github.com/vimeo/graphite-influxdb/blob/de5dd7f37c2174bee6b7be860e31cf9635571337/get-series-influxdb-vs-es.png for more numbers)