wa0x6e / ResqueBoard

ResqueBoard is an analytics software for PHP Resque. Monitor your workers health and job activities in realtime

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cube Dependency

mtotheikle opened this issue · comments

It seems that Cube is having very little development done with it now and since the NPM module "websocket-server" has been unpublished I can no longer install this project since Cube fails to install.

Has anyone else had this problem recently? Any plans to move away from Cube?

See square/cube#149 for more information.

I was able to install websockets-server by hand. Just git clone https://github.com/miksago/node-websocket-server repository into node_modules (in cube).

Given the current situation with cube I certainly agree with the idea of moving away from it. Any ideas on what to replace it with? Off the top of my head the only thing I could think of is statsd + ratchetphp to handle websockets.

commented

Reason I was going with Cube was not really websocket, but the all data aggregation done behind. I'm not really familiar with statsd, but the replacement should be backward compatible, to keep all the database.

StatsD does not have it's own native datastore but instead uses several different backends. The best candidate for ResqueBoard is the MongoDB backend: https://github.com/dynmeth/mongo-statsd-backend.

commented

Do you know if statsd can also compute metrics, like number of events occurring between 2 dates ?

Do you know if statsd can also compute metrics, like number of events occurring between 2 dates ?

I am pretty sure the mongo schema supports this. Any chance you could point me to where these queries are? I can take a quick look and figure out which backend(s) would work.

commented

I don't master node.js enough to understand how Cube works, but from what I saw, it uses the javascript part of mongodb to run functions, and compute the metrics. And ResqueBoard does not really query mongodb directly, it uses the Cube API for most of the jobs.

There is 2 things ResqueBoard is expecting from the cube database:

  • A database of events: what happens, when it happens. (so we can fetch a list of jobs, sorted by time). I think that's how statsd work, so no surprise there. With Cube, these events are stored in a [EVENT_NAME]_events collection, and each entry looks like:
 {
   "_id": ObjectId("503c4a613bab703704000148"),
   "d": {
     "worker": "KAMISAMA-MAC.local:987",
     "level": NumberInt(200)
  },
   "t": ISODate("1970-01-01T12:33:32.0Z")
}   
  • database of metrics. When fetching the number of jobs occurring within a timeframe, it does not query mongodb with a complex find() and a lot of filters, it just queries a [EVENT_NAME]_metrics collection, where all metrics are already computed offline. Each entry looks like:
 {
   "_id": {
     "e": "sum(got)",
     "l": NumberInt(86400000),
     "t": ISODate("2013-04-22T00:00:00.0Z")
  },
   "i": false,
   "v": 25609
}   

e: name of the metrics => Sum of all the gots events
l: timeframe => 86400000 = 1 day
t: Time of the metrics
v: Value of the metrics

So it reads: There was 25609 got events in the last 24 hours preceding 2013-04-22T00:00:00.0Z.

All these data are computed offline, so nothing is computed when queried via the Cube API. These metrics are where Cube really shines, and there's a metric for different timeframe (day, hour, min).

The Cube replacement should have a similar structure, where most of the metrics are computed offline, and computed fast enough to be synced with the realtime events.

commented

Seems like statsD can provide the type of granularity needed. But its mongodb structure differs from Cube, so migrating to statsd will means losing all cube's data.

@Kamisama

Seems like statsD can provide the type of granularity needed. But its mongodb structure differs from Cube, so migrating to statsd will means losing all cube's data.

I will attempt to create a branch of ResqueBoard that uses the StatsD MongoDB schema. Getting the websocket functionality that cube has I think will be the hardest part. Long polling might be an option if we make the assumption that there will not be many clients using the interface.

Once it seems to be working I could try and provide a migration script that translates Cube's metrics to the StatsD schema.

commented

Graphite seems to come with a visualization library, something unneeded in this case. statsD + another websocket framework seems a good combo. Since statsD already run on node.js, the websocket framework should preferably also run under nodejs.

The advantage of Graphite is that it already handles aggregation of time series data like Cube does, but yes, it does include a lot of other components that are unnecessary.

commented

From what I learned from statsD documentation, it also seems to do data aggregation

StatsD does aggregation, it's what it does. But what it does not do is log errors or text in any form. This is basically the different data types it supports: https://github.com/etsy/statsd/blob/master/docs/metric_types.md.

The actual failure counts, success counts, memory usage, cpu usage and even the amount of time each job takes can be recorded though.

I have a branch that works with Monolog's MongoDB handler instead of Cube's: https://github.com/pwhelan/ResqueBoard/tree/nocube.

MongoDB's log format (if tweaked correctly) is almost exactly the same as the Cube format except for two things:

  • All the data is under 'context' instead of 'd'.
  • It uses a single collection for all events where 'context.type' defines the type of event.

Statistics will have to be handled separately for this branch. It does require changing the formatter for the Monolog MongoDB connection to have an infinite nesting level. It also does not represent the final direction I want to take.

Here was my general idea for the future of this branch:

  • Abstract or hide the difference in the schema between Cube versus plain Monolog (MongoDB).
  • Make the source for stats configurable between either Cube (decided by the Mongo config...) or MongoDB (backed by StatsD).
  • Use Long-Polling for the realtime stats (and query them in PHP either from Cube or StatsD).

Using Long-Polling should make it possible to avoid a websocket server and also make it compatible with most browsers. Performance is important but I think taking a bit of a hit when it comes to the connection for admins it should be fine. I am assuming here that most systems will only have a few admins using RB for monitoring.

I was also planning making stats optional (and only enabled when a separate Mongo connection/collection for stats is defined or when cube is used). I am in no way of thinking of abandoning this feature though. I also plan to add cpu usage statistics, using gettimeofday/posix_times to calculate it then submitting it to statsd.

I'll start working on my branch. Feel free to comment or suggest anything.

commented

Seems interesting, I'm looking forward to it

I have it working at the moment without the logs tab and without statistics. The major snags are:

  • The Monolog Init library needs to patched so the MongoHandler to use a fully nested schema (otherwise it serializes it into JSON at about the third nesting level).
  • Monolog 1.12+ needs to be installed, I believe 1.5 is the lowest version right now.

Next weekend I can finalize my branch, without the advanced statistics. I'd like that to be finalized before tackling the advanced stuff. I have been using it successfully at work on our test servers.

I was able to install websockets-server by hand. Just git clone https://github.com/miksago/node-websocket-server repository into node_modules (in cube).

How exactly did you do this? After git cloning this repository in cube/node_modules, I still get the same error on "npm install"...

edit on 17/04/2014

Someone suggested me to remove websocket-server from the package.json file after installing it manually, but then another error happens:

...
npm http 200 https://registry.npmjs.org/wordwrap/-/wordwrap-0.0.2.tgz
npm ERR! Error: shasum check failed for /root/tmp/npm-4389-jKgT7sg0/1429259818152-0.3462741563562304/tmp.tgz
npm ERR! Expected: 500d26d883ddc8e02f2c88011627636111c105c5
npm ERR! Actual: 72b0e88de3feeb269db2effe14e95751b031ab04
npm ERR! at /usr/local/node/lib/node_modules/npm/node_modules/sha/index.js:38:8
...

This same error remains after replacing "websocket-server": "1.4.04" with "node-websocket-server": "1.1.4" (0.0.1 fails) in the package.json as possible fix from https://github.com/square/cube/pull/149/files

Bummer :(

Try this:
cd node_modules/
git clone https://github.com/miksago/node-websocket-server
mv node-websocket-server websocket-server
cd..
npm install cube

Thanks a lot for the pointer, @maxcanna!

After battling a lot of new errors (corrupted downloads and one runtime error), I got it up and running by using the most recent version of NodeJS (instead of the one specified in an outdated installation script I was using as reference).

@pwhelan
@maxcanna
@Sieberkev
Thanks very much!

awesome!

I'll push what I have when I can. Nothing out of the ordinary though.

@pwhelan Any update ?

@Techbrunch most of my work has been implemented in https://github.com/pwhelan/ResqueBoard/tree/cft. Feel free to test it out. Note that you must configure fresque or your resque workers to log to mongo and to log to mongo with full recursion (otherwise resque-ex/monolog only logs objects that have 2 levels of nesting).

I also did a lot of work to use the fresque.ini configuration directly instead of the config in RB.