microsoft / ghcrawler

Crawl GitHub APIs and store the discovered orgs, repos, commits, ...

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add support for Elastic Search as a store

jeffmcaffer opened this issue · comments

Elastic search seems like an interesting choice for store technology. Would give live access to the data in a queryable way.

Not sure what this might involve, but I would be interested in discussing it and seeing what sort of resources I might be able to provide (I work @ Elastic).

Great @markwalkom. I should not be that bad. Basically the store API as about 5-10 methods like upsert, get, list, ... all basic point or list queries.

What are you doing with ghcrawler and perhaps I can help guide.

Hi @markwalkom,

I have a basic implementation here: craigez@4540ca1 (and here: https://source.codeaurora.org/external/qostg/ghcrawler/commit/?h=develop&id=4540ca1283d30aa0483a9ca9adf398e0ba41772a)

It's licensed under the MIT license, but I've been unable to get permission to sign the Microsoft CLA to get the contribution merged into the upstream 😢 . Additionally, we haven't really tested it much as we will probably go an alternate direction for our metrics, but we are still looking at options.

that's awesome, thanks @craigez! pity the licensing is causing problems though :(

It's not the licensing, MIT is fine, just the CLA to Microsoft.

I'll poke on this again and see if I can get it resolved soon.

It seems like https://github.com/yougov/mongo-connector can be useful to sync to Elasticsearch