How to configure any particular test suites to be distributed to different shards in the most efficient way?

Question

How to configure any particular test suites to be distributed to different shards in the most efficient way?

skhalipski opened this issue 6 years ago · comments

Siarhei Khalipski commented 6 years ago

**I'm submitting a ... **
- bug report
- feature request
Do you want to request a feature or report a bug?
It would be nice to add any possibility to configure any particular test suites to be distributed to different tests shards. For example, we have 4-5 test suites which executes quite long. It would be nice if we could configure these tests suites to be distributed in different shards. If we have 3 shards and 5 long running test suites, then we would like to be sure if one shard does execute 2 long running test suites, the second shard - 2 long running test suite, the third shard - the left one long running test suite.
What is the current behavior?
For now we can configure it via different test suite name/length of description but it doesn't guarantee the expected behavior because naming of other tests can be changed as well and it affect these long running test suites.
If the current behavior is a bug, please provide the steps to reproduce and if possible a minimal demo of the problem
What is the expected behavior?
We have 4-5 test suites which are executed quite a long time. Quite often they are executed by the same shard if we don't provide any tricky point with naming. We would to be sure if any particular test suites will be distributed accordingly between all shards. For example, it can be done via some specific naming in test suite description like it was done for [always] predicate previously. Let's imagine, if test suite is started with '[Id=[unique-id]] Test suite description' then all test suites with the same unique-id will be distributed by different shards in the most efficient way.
Please tell us about your environment:

version: 2.0.0-beta.X
Browser: [all | Chrome XX | Firefox XX | IE XX | Safari XX | Mobile Chrome XX | Android X.X Web Browser | iOS XX Safari | iOS XX UIWebView | iOS XX WKWebView ]
Language: [all | TypeScript X.X | ES6/7 | ES5 | Dart]

Other information (e.g. detailed explanation, stacktraces, related issues, suggestions how to fix, links for us to have context, eg. stackoverflow, gitter, etc)

Boris Sotsky · Answer 1 · Fri Sep 21 2018 20:25:37 GMT+0800 (China Standard Time)

I think it can be done in the way similar to [always] prefix. We could have something like [shard:1] prefix. And then to distribute by shards by: shardNumber % shardsCount.

Joel Jeske · Answer 2 · Sat Sep 22 2018 11:35:39 GMT+0800 (China Standard Time)

Interesting. This would be very valuable to better organize the test suites, specifically for suites with a specs that vary greatly in execution time.

I cannot think of any other use case to indicate which shard to run a spec in, except to better distribute the load. If that is the case, I wonder if a more direct approach be better suited.

Perhaps if specs could indicate the relative “weight” or “time” such that this project could attempt to evenly distribute all the specs. Perhaps all specs could have a weight of 1 by default but that could be changed with by adding the substring [weight: 10] to the description similar to [always].

It could be tricky to figure out the relative weight, but I also think it would be tricky to directly indicate which shard a spec should run in, especially as it would be configured in the source code but the number of executors is dynamic.

Thoughts?

Siarhei Khalipski · Answer 3 · Sat Sep 22 2018 20:25:55 GMT+0800 (China Standard Time)

You are right about the initial goal of the requested feature: better tests distribution. In the ideal way all shards should be executed in more less the same time. In our project we can't achieve it: the fastest shard does work ~3min, the last one: ~5-6min. But the resently added long running tests added to the last shard and the execution time increased to 8-9min, but the fastest one is still the same with ~3min. We tried different test suites naming and more less could mitigate it but this solution is not stable because new added/refactored tests can break this hard achieved distribution. Additionally, we still have different shards execution tine from 3 min to 7 min.

Your idea with weight or time will allow to fix tests distributions issue for the developers/teams who are interested in it and have more less equal tests execution time in every shard. In my view both solutions (weight or time) will allow to do it but time approach will provide more accurate approach + we always have exact test execution time from the appropriate test reporter and it will be very easy to introduce it.

Joel Jeske · Answer 4 · Sun Sep 23 2018 11:52:14 GMT+0800 (China Standard Time)

I like this idea, but it would seem very cumbersome to require marking each spec with a weight or time. It would be prone to mis-configuration due to lack of real insight into how long each spec takes relative to each other.

It would seem difficult as well, but I wonder if it would be more useful to install a timing mechanism and save the results to a local file for use during the next run. It would be much more automatic and nice and most efficient with least amount of user configuration, but it would also likely have new results on every run, thus causing frequent diffs in the file that would be unwanted and annoying to track in git. Of course this timings file could go ignored, but then the CI would be inefficient on every run.. Perhaps if there was a lot of leeway in timings file (perhaps only changing the file if off by >= 20%), it could produce generally consistent deterministic results and not produce lots of diffs.

Also, timings would have be converted to relative weights such that they would be somewhat consistent, even when running on different hardware.

Do you have any thoughts?

Siarhei Khalipski · Answer 5 · Sun Sep 23 2018 14:52:18 GMT+0800 (China Standard Time)

If we are speaking about any internal mechanism which will allow to do the good distribution according to the prev calculated metrics (execution time) it is the best solution. Of course, at this case it should be considered weights instead of time and it should be introduced a mechanism to convert time to weights. Regarding % of changed files to update the prev metrics - if this value (20% by default) can be configured via configuration it would be nice. If there are any fixed tests (fdescribe or fit) then tests distribution calculation/applying should be ignored. If calculated tests distribution is missed then it should be used the current approach with the configured strategy (robin or description length). If it is changed shards count then the prev calculated and available tests distribution should be missed ad well with the usage of default strategy. I think with taking into accounts all these details karma-parallel still will work perfectly as it is now. But the main question regarding such adwinced functionality -complexity and time to implement it. What do you think?

P.S. In general regarding weight vs time I think weights are more preferable solution because the same tests are run in different hardware environments (devs PCs and CI) and weights will be more accurate. I was wrong in my prev comment that time is the best metric.

Joel Jeske · Answer 6 · Mon Sep 24 2018 23:22:51 GMT+0800 (China Standard Time)

Yea, I would think that an automatic solution would be the best but I think there would be the following issues:

Need to track the file in git to optimize CI scenarios
If tracked in git, there would be potential conflicts when merging in branches with changed unit tests
If tracked in git, there would have to be a mechanism to remove noise from the calculated weights to avoid changing the file too frequently.
Potentially longer to implement.

I do think that your initial approach of adding a group name to a set of tests such that they are distributed would also work. I do have a couple of concerns about that:

What would be the best name for this: "id", "group", "distribution group"?
Currently this project only looks at the top-level describes and distributes them accordingly. That means that only a single shard executes a top-level describe. If you have a number of expensive specs inside that single top-level describe, this project will still not be able to distribute those tests. Do you think that would be a problem?

Siarhei Khalipski · Answer 7 · Tue Sep 25 2018 02:27:36 GMT+0800 (China Standard Time)

In my view all mentioned issues with git are not critical because at this case we are getting very good dynamic distribution. But complexity and time to implement are the main questions here.

With regard to top level describe blocks - it is ok for us and it is exactly how karma-parallel does work now. We locate our long running tests in different *.spec files.

Regarding naming - I am not sure, but I like group or distribution group.

Joel Jeske · Answer 8 · Sun Dec 02 2018 07:34:14 GMT+0800 (China Standard Time)

I have added support for a custom shard strategy function that you can configure in the karma conf. It will run for each top-level describe block and it is expected to return true or false to determine if the describe block should run in the current executor instance. Please read the notes in the README for more info.

Siarhei Khalipski · Answer 9 · Mon Dec 03 2018 14:36:09 GMT+0800 (China Standard Time)

@joeljeske thanks! We are looking at it soon and try to integrate it into our build process. I will keep you updated about the status and any issues if they are.