Implement robust evaluation pipeline in EvalAI
RishabhJain2018 opened this issue · comments
Project Title: Implement robust evaluation pipeline in EvalAI
Description:
Currently, the submission worker that evaluates the challenge requires manual scaling. Moreover, logging & metrics-monitoring isn’t available to the challenge hosts for the submission worker in real-time. Also, an often requested feature by the challenge organizers has been the ability to test their competition package (evaluation scripts, etc) locally before uploading it to EvalAI. This capability will also reduce assistance required by the platform maintainers. The goal of this project is to write a robust test suite for submission worker, port it to AWS Fargate to setup auto-scaling and logging. The tasks will also include giving control to challenge hosts over the submission worker from the UI in terms of starting, stopping and restarting it.
Deliverable:
- Add filtering in Django backend using https://django-filter.readthedocs.io/en/latest/guide/rest_framework.html
- Add unit tests for submission worker
- Add Integration tests for submission worker
- Shift worker container on AWS Fargate for auto-scaling
- Provide naming to the worker containers running for different challenges
- Add slack notification to inform the challenge host/EvaLAI Admin when a worker is killed
- Add feature to get the updated auth token before launching workers on the server for different challenges
- Add feature to automatically launch/restart worker containers when:
- Create a container when the challenge is approved by EvalAI admin
- Restart containers when a challenge host updates evaluation script or annotations
- Add feature to start/stop/restart worker container from UI.
- For challenge host
- EvalAI admin
- Add feature to request EvalAI Admin for increasing number of workers from 1 to x
- If the remote evaluation is enabled for a challenge, then add the capability of storing challenge host AWS keys and launch AWS instances from their account to setup worker docker container that will evaluate the submissions
Extended Goals:
- Add feature to display logs of the worker container on UI
Mentor: Ram Ramrakhya @Ram81 , Rishabh Jain @RishabhJain2018 , Deshraj @deshraj
Skills: Python, Django, Django Rest Framework, AWS, Docker
Skill Level: Hard
Get started: Try to fix some issues in EvalAI (note that there are some issues labeled with GSOC-2019)
Tutorials:
a) Docker
b) AWS-Fargate
Important Links:
- EvalAI Website: evalai.cloudcv.org
- EvalAI Github repository: Cloud-CV/EvalAI
- EvalAI Docs: http://evalai.readthedocs.io/en/latest
- GSoC Proposal Template: Cloud-CV/GSoC-Ideas/wiki/GSOC-2019-Proposal-Template
- Gitter Channel: gitter.im/Cloud-CV
- Mailing list: groups.google.com/forum/#!forum/cloudcv
@deshraj @RishabhJain2018 @Ram81 this idea looks amazing. I am looking forward to work on this issue
Awesome! Looking forward to your GSOC Proposal @kurianbenoy :)
Hi Rishabh,
This Neeraj from IIT Bhubaneswar, I will like to contribute in it please guide me on the way.
Hi @RishabhJain2018 I wish to contribute to EvalAI in GSoC 2019. Can you help me how to get started?
Thanks
@RishabhJain2018 @Ram81 I am interested in this idea. I would love to work for GSOC 2019 on this issue. I am familiar with Django and have a basic idea about Docker.
@RishabhJain2018 @Ram81 @deshraj This is exciting and am interested in the idea! To get familiar with the requirements, can we go ahead and make PRs relevant to this? (As is recommended in the UI ideas for GSoC.)
Hey @RishabhJain2018 @deshraj @Ram81 I'm very excited to work on this issue. I'm good in Python and worked on AWS and familiar with dockers. This is my first GSoc and your guidance would help me get started.
Hi, @GrayR00t @mrkarna @navneel99 @KhalidRmb @anunay999 Thanks for your interest in the project. Looking forward to your GSoC Proposal.
Can you help me how to get started?
@navneel99 Please start by setting up EvalAI on your local machine and then start solving good-first-issue
or GSOC-2019
issues.
To get familiar with the requirements, can we go ahead and make PRs relevant to this?
@KhalidRmb Yes.
@RishabhJain2018 @Ram81 @deshraj Sounds interesting! I would like to work on this project 👍
Hi, @shiv6146 Looking forward to your GSoC proposal!
Hi! Had some queries, and it would be very helpful if the mentors can help me navigate them.
-
Regarding scaling the workers from 1 to x by the challenge host: I gather it is necessary for docker-based challenges where the host wants to test the submission against diverse environments with different requirements in each worker container.
- In which case, the host must specify the configuration of each additional worker environment through the UI itself. (of course, the new worker(s) will be integrated with the pre-existing SQS queue for the challenge, and change the leaderboard data accordingly, reflecting the metrics returned by the evaluation in a pre-defined standard format.).
-
Coming to non docker-based challenges, the main concern to scale the workers is speed and submission bottlenecks? Because the worker evaluates the submissions sequentially, running workers in parallel (with the same configurations) are faster from the host's perspective. Is that correct or did I miss something?
Regarding shifting to Fargate:
-
Is only the submission worker to be shifted or the Django container along with it? Because based on incoming traffic to the Django app, Fargate will autoscale the task (which could be based on a single task definition of 2 containers- Django & the Worker).
-
If scaling the worker alone, we could define a new task definition only for the worker and use
deploy.sh scale
through boto3.
Could you please help clarify the situation here?
@RishabhJain2018
Regarding the task Provide naming for worker containers for different challenges, there already is a mechanism for that in the deploy.sh
file here:
Could you please provide some clarity regarding the task?
@RishabhJain2018 @deshraj @Ram81 Could you please take a look at these, and the doubts I've asked on Gitter? The proposal deadline is very near. Thanks!
Hi @KhalidRmb,
I gather it is necessary for docker-based challenges where the host wants to test the submission against diverse environments with different requirements in each worker container.
What if a challenge host wants to parallelize the submission processing for the non-docker based challenges?
Coming to non docker-based challenges, the main concern to scale the workers is speed and submission bottlenecks? Because the worker evaluates the submissions sequentially, running workers in parallel (with the same configurations) are faster from the host's perspective. Is that correct or did I miss something?
I didn't get what you meant by speed
here but the idea is to parallelize the submission processing near the challenge end date so that more people can submit to the challenge.
Is only the submission worker to be shifted or the Django container along with it?
For now, we're focussing on the worker container only.
If scaling the worker alone, we could define a new task definition only for the worker and use deploy.sh scale through boto3.
Could you please help clarify the situation here?
I'd like to see the complete approach in proposal. Also, I've answered your query.
Regarding the task Provide naming for worker containers for different challenges, there already is a mechanism for that in the deploy.sh file here:
Yes, it is already there. But docker doesn't allow running two containers with the same name on a single machine, so a fix regarding it will be needed in the deliverable.
I didn't get what you meant by speed here but the idea is to parallelize the submission processing near the challenge end date so that more people can submit to the challenge.
This is what I meant itself. Thanks.
@RishabhJain2018 What level would you say this project is? How good should I be with the stack mentioned? What are the stuff I should really know before I start working on it? I know it's GSoC-related but I was just thinking that I'm gonna get it a try for fun and experience?