ipfs / infra

Tools and systems for the IPFS community

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

VMs for benchmark work

Elexy opened this issue · comments

I've been enlisted to automate benchmarking for js-ipfs and I'd like to get my hands on a VM with reliable cpu/mem to test benchmarking strategies.

Actually now that we have a better Idea of what it is we're building, I need 3 VMs.

  1. VM with performance guarantees (as best as possible) to run all benchmarks
  2. VM with performance guarantees (as best as possible) to run any tests inside Clinic
  3. VM of container platform with persistent storage to run the runner for both of the above and Influxdb/grafana

ping @eefahy @kyledrake

Can we get into more specifics please?

  • What's your timeline?
  • What sort of performance guarantees do you need?
  • Are you sure a VM is suitable for the testing?
  • What access to the machine do you need? SSH only?
  • How performant does storage need to be and how big?
  • RE: 3 are you asking for a kubernetes cluster or a VM where you can install docker?

Hi @eefahy

  • timeline: asap
  • The performance should be as consistent as possible. Meaning CPU, memory and disk performance should be consistent between benchmark runs. in the case of AWS it seems c5d.xlarge instances offer consistent CPU performance without any bursting. The physically attached NVME ssd should offer disk performance without any interference.
  • Having a dedicated physical machine would perform more consistent if that is even an option. A VM will do be a good place to start. They can even be automatically started/stopped when required as long as that process is < 30s.
  • ssh access only is enough
  • the benchmark VMs only needs a couple of Gb's storage, the OS disk is probably big enough. Doesn't have to be backed up either.
  • re: 3: We'll be running influxDB/grafana/runner.js in containers. Your choice of where I run these containers is fine with me. It can be a VM, ECS(or similar) or a Kubernetes cluster. The requirement here is that we can access the dashboard and the runner over https and we can store the DB backups somewhere like an s3 bucket with a lifecycle policy.

the benchmark VMs only needs a couple of Gb's storage, the OS disk is probably big enough. Doesn't have to be backed up either.

Do count for tests that move 100GB to 1TB of data, if possible.

@daviddias I get your point. We can add huge disks (If cost is not an obstacle) let's add them.
If we want to move 1TB would we need that space for each node + the original data?
We do need to spec out those tests and what it is exactly that we're benchmarking there. i.e. Disk speed vs js-IPFS speed.

You need space for the 1TB of data, 1TB for each node and much more space for the overhead of creating the Graph Data Structure. Thinking more about this, perhaps starting with a benchmark for 10GB and 100GB might be wiser. 1TB might take too much time nowadays to make it practical and useful

@Elexy Thanks so much for the info. I see we have a meeting scheduled for tomorrow but wanted to give some options per your requirements:

Bare metal servers are an option. We use Packet.net for those needs. It would seem that this machine is a good fit. Would that work?

Unfortunately, we do not currently have a suitable k8s or ECS cluster for this so it will be a VM/Packet machine, depending on the sizing needs. What are the sizing needs for that machine?

We will need to sort out https for serving those endpoints, which I will look into. Making an S3 bucket for backups sounds good.

Hi @eefahy,

Thanks for that.

Bare metal for the 2 test machines would be awesome. The specs fo that machine look fine to me, if ipfs is supposed to run on commonly available consumer hardware (@daviddias ?). Can we get Debian or Ubuntu on that?

A VM to run the runner/db/dashboard in containers should be spec'd like this:

>= 4 CPUs
>= 16GB RAM
1TB ssd

As for the endpoint; the *ipfs CI should be able to hit an endpoint with a POST request. So it can(should) be fairly locked down.

Per our conversation:

  • 1 Ubuntu VM with a local private key
  • public ssh key loaded into authorized_keys for nearform access
  • 1 Ubuntu bare metal machine with the public ssh key from the VM user loaded into authorized_keys

Once CI platform is decided on, we will need to mint a cert for the VM to field a webhook from CI to kick off tests.

@Elexy Do you have a preference on the IOPS or throughput for the 1TB SSD? Can we get away with provisioning a gp2 EBS volume for this or do you need the io1 flavor?

These machines have been made. I went with a gp2 SSD Volume. Let me know if that needs an upgrade. Once I get a nearform public key, I'll add that to the VM

Thanks.
here is the pub key:

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDRZsdV936RrBC3ZK3wmmP4dgj8cNSD6WdChMs2Ic+A+h5HALTSkK0Z7ri1pPbbSf6OzD5NwGByhXIwT34RPZjWRs008cv8z4x4bYV+wBlUQRaVAYQDxag+5522cg35nHqgONHmyuutBjU4lxRlq4qyACbI5sI7CDg0J9fMEyAK8hj8ADWM3dZFe+kA9bUh3LLyCUfhiSnGX7xmtQRqXHzU+c0ZPFARSbRakMUcGBRb7P5CnQkIwZqyGIpAbV4/wjDA6jn+whPyOx3lHxpxyzHtjoGra6j+j3KHcC2i1+PaGYK8hGRckcVNr6jsH86HzFkwmP+KMJQ2nkHSVMoJdYAt elexy@Alexs-MacBook-Pro.local

Alright this is done.

ubuntu@63.33.104.238 VM
ubuntu@147.75.33.155 Bare Metal machine

Ping me when we know what CI looks like and I can sort out the webhook to VM stuff

@eefahy I am assuming this dashboard should be made available on a domain as well. Can you take care of that?

Also I need ports 80 and 443 opened in the vNet to be able to expose the service and get the let's encrypt handshake to work.

UFW config on the VM:

ubuntu@ip-172-31-44-15:~$ sudo ufw status verbose
Status: active
Logging: on (low)
Default: reject (incoming), allow (outgoing), deny (routed)
New profiles: skip

To                         Action      From
--                         ------      ----
22/tcp                     LIMIT IN    Anywhere                   # Limit ssh access
80/tcp                     ALLOW IN    Anywhere                   # Allow http access
443/tcp                    ALLOW IN    Anywhere                   # Allow https access
22/tcp (v6)                LIMIT IN    Anywhere (v6)              # Limit ssh access
80/tcp (v6)                ALLOW IN    Anywhere (v6)              # Allow http access
443/tcp (v6)               ALLOW IN    Anywhere (v6)              # Allow https access

/cc @daviddias

@eefahy I'd like to CI/CD this project as well. Can you add it too CircleCI or Gitlab ci?

@Elexy I've enabled the Circle.

Port 80 and 443 are now open on the VM - not sure what domain to use for this. Thoughts @daviddias or @alanshaw ?

We decided to put it on cloud.ipfs.team yesterday for the time being, so something like benchmarks.cloud.ipfs.team?

OK - turns out that *.cloud.ipfs.team is tied up so I made benchmarks.ipfs.team resolve to the EIP on the VM. Let me know if that needs to change.

@Elexy I'm seeing a LetEncrypt script on the VM so I assume that you can take care of the cert for that new domain?

@eefahy re - cert: yes, I will. Thanks for the domain!

@eefahy I previously deleted my comment about needing sudo but I now have hit that wall again with no way around it.
The benchmark Bare Metal machine needs git installed and possibly some more things. We need to git clone js-ipfs there and checkout a specific commit to run the benchmarks against.

Could the ubuntu user have sudo rights, like on the VM?

This is a blocker for me now.
/cc @alanshaw @mcollina

The ubuntu user already had sudo privs but I updated it to not require a password. If we know what sort of stuff needs to be installed on the bare metal machine, I'm happy to add that to its user_data script instead which feels a little more secure. Thoughts?

@eefahy Thanks for the quick reply. Let's add it to user_data once it's stabilized. We're still discovering stuff here.

@Elexy Cool, sounds good. I'm closing this issue now. If other things come up or you're ready for user_data stuff then open a new issue please. Thanks!