uberVU / elasticboard

Dashboard that aggregates relevant metrics for Open Source projects.

Home Page:http://elasticboard.mihneadb.net/landing.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Create es river that brings data into elasticboard

mishu- opened this issue · comments

Need

Right now the current implementation relies on parsing dump files updated via a cron, it would be nice to have a cleaner way to bring it github data to the es database

Proposed Solution

Create an es river (http://www.elasticsearch.org/blog/the-river/) which pulls data from github to es directly.

Notes

This issue is a stub.

Worth mentioning that the way to bring data in is either:

  1. subscribe with a listener to github's API for the given repo
  2. poll the events every delta T for new events and check for dupes

1 requires an action from someone who has access rights to the repo (even if it's a public repo), 2 doesn't.

Can you please provide the links for documentation for both 1 and 2 pls?

  1. http://developer.github.com/v3/repos/hooks/
  2. http://developer.github.com/v3/activity/events/ (ctrl-f for 300 :) )

Part of the email conversation:

Hey Mihnea,

> How do you suggest I get the events that I haven't seen so far? Using a timestamp? The github archive scraper collects lots of events, I'm guessing more than 300 so there should be a way, right?

As I mentioned before, we can only provide a history of up to 300 events currently. If you need to collect more than that, the only way to do it is to periodically fetch events from the API and store them locally. I'm guessing that the (Unofficial) GitHub Archive project is doing exactly that - polling our API with a high frequency to pick up all events. If you need to go further back in history and need to do it now - there is no workaround for that except querying the archive project.

> By doing a simple check (cat | sort | uniq | wc -l) I found that indeed there are just 300 unique events. However, your API didn't reply with "last" as a page number, it let my script keep polling.

Ooops, sorry about that! I noticed that our documentation says that we will return a "last" link, and in fact we aren't. I'll see if we can do something to correct that - thanks for the report!

Glad you were able to figure out what was going on! Let me know if you have any other questions or feedback.

Cheers,
Ivan