datamade / how-to

📚 Doing all sorts of things, the DataMade way

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Github Actions Self-Hosted Runners

fgregg opened this issue · comments

Background

We've been having a great time using github actions as a scraping platform. But for intensive scrapes on private reps, the pricing is pretty unfavorable ($0.008/minute.).

We might be able to get the best of both worlds by setting up a self-hosted runner.

Here are some resources for setting some up:

Proposal

Try setting up a self-hosted runner that has much better pricing.

Deliverables

An approved approach for setting up self hosted runner or a decision that this is not worth doing.

Timeline

Two investment days.

2 cents on self-hosting runners:

I built a service for running self-hosted GitHub Action runners on cloud with 0 maintenance and free for open source: https://cirun.io/

It's on-demand, which means you only pay to your cloud provider for the time you're using it.
All you need is a simple configuration .cirun.yml file, here is a demo.
Also available on GitHub Marketplace as well: https://github.com/marketplace/cirun-io

hi @aktech, what is the pricing for private repos?

Hey @fgregg It is free at the moment for private repositories as well, we're in the process of defining the pricing as of now. In the nutshell, it will be flat price per month based on the number of private repositories using it.

I noticed you're using it with DigitalOcean with runner being up for 15-20 minutes. I would not recommend using DigitalOcean for short time jobs, the reason being DigitalOcean doesn't have a per minute billing for example irrespective of the fact if you use it for 1 minute or one-hour you'll be charged the same. Other cloud providers have per minute billing.

So i tried a few of the solutions on https://github.com/jonico/awesome-runners

  • actions-runner-controller/actions-runner-controller was promising. but it requires setting up and operating a kubernetes cluster, which is not part of standard practice, and the service providers i looked at (aws and digital ocean) all have some constant costs, which are pretty expensive $70-$100 a month. the scraping that is prompting this research also did not complete with anywhere near the success rate of github action.

  • philips-labs/terraform-aws-github-runner looks like a good set up, but the site that we are trying to scrape has a blanket IP block for ec2s. so AWS is a no go.

the other runners listed were much less developed.

right now, github's own action runners are looking like the more economical path.

i think the last thing to try is to see if @aktech might be willing to consult on our needs and see if cirun.io might be a good fit.

i think the last thing to try is to see if @aktech might be willing to consult on our needs and see if cirun.io might be a good fit.

Hey @fgregg I would be happy to help, you can also use GCP or Azure with Cirun.io (If EC2 is blocked). Feel free to drop me a mail at amit@cirun.io to schedule a call.

cirun + azure spot instances is looking very promising.

with cirun + azure spot instances, this is worth doing. when i write the doc for #212, i will give directions on using these runners.