Implement robust evaluation pipeline in EvalAI

Question

Implement robust evaluation pipeline in EvalAI

RishabhJain2018 opened this issue 6 years ago · comments

Project Title: Implement robust evaluation pipeline in EvalAI

Description:

Currently, the submission worker that evaluates the challenge requires manual scaling. Moreover, logging & metrics-monitoring isn’t available to the challenge hosts for the submission worker in real-time. Also, an often requested feature by the challenge organizers has been the ability to test their competition package (evaluation scripts, etc) locally before uploading it to EvalAI. This capability will also reduce assistance required by the platform maintainers. The goal of this project is to write a robust test suite for submission worker, port it to AWS Fargate to setup auto-scaling and logging. The tasks will also include giving control to challenge hosts over the submission worker from the UI in terms of starting, stopping and restarting it.

Deliverable:

Extended Goals:

Add feature to display logs of the worker container on UI

Mentor: Ram Ramrakhya @Ram81 , Rishabh Jain @RishabhJain2018 , Deshraj @deshraj

Skills: Python, Django, Django Rest Framework, AWS, Docker

Skill Level: Hard

Get started: Try to fix some issues in EvalAI (note that there are some issues labeled with GSOC-2019)

Tutorials:

a) Docker
b) AWS-Fargate

Important Links:

EvalAI Website: evalai.cloudcv.org
EvalAI Github repository: Cloud-CV/EvalAI
EvalAI Docs: http://evalai.readthedocs.io/en/latest
GSoC Proposal Template: Cloud-CV/GSoC-Ideas/wiki/GSOC-2019-Proposal-Template
Gitter Channel: gitter.im/Cloud-CV
Mailing list: groups.google.com/forum/#!forum/cloudcv

Kurian Benoy · Answer 1 · Thu Feb 07 2019 09:48:38 GMT+0800 (China Standard Time)

@deshraj @RishabhJain2018 @Ram81 this idea looks amazing. I am looking forward to work on this issue

Rishabh Jain · Answer 2 · Thu Feb 07 2019 17:07:00 GMT+0800 (China Standard Time)

Awesome! Looking forward to your GSOC Proposal @kurianbenoy :)

Neeraj Singh · Answer 3 · Sat Feb 23 2019 03:44:55 GMT+0800 (China Standard Time)

Hi Rishabh,
This Neeraj from IIT Bhubaneswar, I will like to contribute in it please guide me on the way.

Manish Ranjan Karna · Answer 4 · Sat Feb 23 2019 15:51:59 GMT+0800 (China Standard Time)

Hi @RishabhJain2018 I wish to contribute to EvalAI in GSoC 2019. Can you help me how to get started?
Thanks

Navneel Mandal · Answer 5 · Mon Mar 04 2019 05:24:56 GMT+0800 (China Standard Time)

@RishabhJain2018 @Ram81 I am interested in this idea. I would love to work for GSOC 2019 on this issue. I am familiar with Django and have a basic idea about Docker.

Khalid Riyaz · Answer 6 · Fri Mar 08 2019 00:53:21 GMT+0800 (China Standard Time)

@RishabhJain2018 @Ram81 @deshraj This is exciting and am interested in the idea! To get familiar with the requirements, can we go ahead and make PRs relevant to this? (As is recommended in the UI ideas for GSoC.)

Anunay Aatipamula · Answer 7 · Tue Mar 12 2019 01:13:22 GMT+0800 (China Standard Time)

Hey @RishabhJain2018 @deshraj @Ram81 I'm very excited to work on this issue. I'm good in Python and worked on AWS and familiar with dockers. This is my first GSoc and your guidance would help me get started.

Rishabh Jain · Answer 8 · Tue Mar 12 2019 03:35:20 GMT+0800 (China Standard Time)

Hi, @GrayR00t @mrkarna @navneel99 @KhalidRmb @anunay999 Thanks for your interest in the project. Looking forward to your GSoC Proposal.

Rishabh Jain · Answer 9 · Tue Mar 12 2019 03:37:16 GMT+0800 (China Standard Time)

Can you help me how to get started?

@navneel99 Please start by setting up EvalAI on your local machine and then start solving good-first-issue or GSOC-2019 issues.

To get familiar with the requirements, can we go ahead and make PRs relevant to this?

@KhalidRmb Yes.

Shiva shankar · Answer 10 · Wed Mar 20 2019 20:00:01 GMT+0800 (China Standard Time)

@RishabhJain2018 @Ram81 @deshraj Sounds interesting! I would like to work on this project 👍

Rishabh Jain · Answer 11 · Tue Mar 26 2019 04:07:55 GMT+0800 (China Standard Time)

Hi, @shiv6146 Looking forward to your GSoC proposal!

Khalid Riyaz · Answer 12 · Sat Mar 30 2019 21:05:46 GMT+0800 (China Standard Time)

Hi! Had some queries, and it would be very helpful if the mentors can help me navigate them.

Regarding scaling the workers from 1 to x by the challenge host: I gather it is necessary for docker-based challenges where the host wants to test the submission against diverse environments with different requirements in each worker container.
- In which case, the host must specify the configuration of each additional worker environment through the UI itself. (of course, the new worker(s) will be integrated with the pre-existing SQS queue for the challenge, and change the leaderboard data accordingly, reflecting the metrics returned by the evaluation in a pre-defined standard format.).
Coming to non docker-based challenges, the main concern to scale the workers is speed and submission bottlenecks? Because the worker evaluates the submissions sequentially, running workers in parallel (with the same configurations) are faster from the host's perspective. Is that correct or did I miss something?

@RishabhJain2018 @deshraj @Ram81

Khalid Riyaz · Answer 13 · Sun Mar 31 2019 03:54:15 GMT+0800 (China Standard Time)

Regarding shifting to Fargate:

Is only the submission worker to be shifted or the Django container along with it? Because based on incoming traffic to the Django app, Fargate will autoscale the task (which could be based on a single task definition of 2 containers- Django & the Worker).
If scaling the worker alone, we could define a new task definition only for the worker and use deploy.sh scale through boto3.
Could you please help clarify the situation here?
@RishabhJain2018

Khalid Riyaz · Answer 14 · Sun Mar 31 2019 23:05:57 GMT+0800 (China Standard Time)

Regarding the task Provide naming for worker containers for different challenges, there already is a mechanism for that in the deploy.sh file here:

https://github.com/Cloud-CV/EvalAI/blob/82dfe27173893fcb3c8ed5b3e09dcd42d365b211/scripts/deployment/deploy.sh#L79

Could you please provide some clarity regarding the task?

Khalid Riyaz · Answer 15 · Thu Apr 04 2019 03:49:28 GMT+0800 (China Standard Time)

@RishabhJain2018 @deshraj @Ram81 Could you please take a look at these, and the doubts I've asked on Gitter? The proposal deadline is very near. Thanks!

Rishabh Jain · Answer 16 · Mon Apr 08 2019 15:54:51 GMT+0800 (China Standard Time)

Hi @KhalidRmb,

I gather it is necessary for docker-based challenges where the host wants to test the submission against diverse environments with different requirements in each worker container.

What if a challenge host wants to parallelize the submission processing for the non-docker based challenges?

Coming to non docker-based challenges, the main concern to scale the workers is speed and submission bottlenecks? Because the worker evaluates the submissions sequentially, running workers in parallel (with the same configurations) are faster from the host's perspective. Is that correct or did I miss something?

I didn't get what you meant by speed here but the idea is to parallelize the submission processing near the challenge end date so that more people can submit to the challenge.

Is only the submission worker to be shifted or the Django container along with it?

For now, we're focussing on the worker container only.

If scaling the worker alone, we could define a new task definition only for the worker and use deploy.sh scale through boto3.
Could you please help clarify the situation here?

I'd like to see the complete approach in proposal. Also, I've answered your query.

Regarding the task Provide naming for worker containers for different challenges, there already is a mechanism for that in the deploy.sh file here:

Yes, it is already there. But docker doesn't allow running two containers with the same name on a single machine, so a fix regarding it will be needed in the deliverable.

Khalid Riyaz · Answer 17 · Mon Apr 08 2019 15:58:30 GMT+0800 (China Standard Time)

I didn't get what you meant by speed here but the idea is to parallelize the submission processing near the challenge end date so that more people can submit to the challenge.

This is what I meant itself. Thanks.

Ahmed Samir · Answer 18 · Thu May 09 2019 05:24:21 GMT+0800 (China Standard Time)

@RishabhJain2018 What level would you say this project is? How good should I be with the stack mentioned? What are the stuff I should really know before I start working on it? I know it's GSoC-related but I was just thinking that I'm gonna get it a try for fun and experience?