cucumber / cucumber-jvm

Cucumber for the JVM

Home Page:https://cucumber.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support concurrent executions of scenarios

whiskeysierra opened this issue · comments

Currently it's only possible to execute feature classes in parallel. In case you have independent scenarios (and you really should) it would be nice to run them concurrently. The current codebase does not support this. First of all the reporting is inherently sequential and there is some state in the Runtime class (skipNextStep and scenarioResult). I would be happily contributing a patch, but this would require a major change of some of the internal structure.

Any update on this one? This missing feature currently stops me from using cucumber-jvm.

Am I right that it's not even possible to achieve parallel execution of features unless using some clumsy hacks.

Nobody has submitted any pull requests for this, so I'm afraid the answer is no.

👍 I am carrying around this issue with me for quite a while now. Especially for UI tests it is absolutly nessessary to run tests in parallel. I've seen several attempts to make this possible from slicing feature files into single scenario files (within a build job) with massive jvm-forking to runners with ugly Reflection-Hacks that replace thread-unsafe code in Runners and Runtimes - none of them very satisfying. I tried to build a JUnit Runner around it but there are some internal design flaws that absolutely prevent one from efficiently executing cucumber scenarios in parallel. The core problem is IMO that the Runtime and Backend both couple expensive operations, like class- and glue-loading as well as the state of a single executing scenario (or world). Splitting this up a bit and seeing Runtimes as a factory of ExecutionContext where a single Scenarios could be run with, would make this much easier. Additionally, reporters (at least JSON and XML) are not thread safe and would need an interface overhaul or need to be proxied, such that reportings are recorded during parallel test execution and replayed in serial when a scenario execution is done.

I'd really like to spent some time on this but it would, in one way or another introduce changes in the core apis. Thats why I'd kindly ask if such changes would be welcome at all :)

I broke the hell out of core back in the day making the Jruby stuff work. I don't believe anyone would be opposed to seeing code. You'd just have to be willing to accept feedback and possibly change the code even more.

Hear, hear!

It would be a big, but welcome change.

In the next months I intend to refactor the internals to use Gherkin3 and in order to avoid merge hell I'd like to see parallelism added after this.

Two thumbs up @danielwegener. I really appreciate the proposed changes. If there's anything I can support you with, just let me know. I'll close #664 so we track the parallel scenario executions in one issue - and this one looks more promising!

Good to hear that I am not alone with this requirement :) I find some time to investigate the code a bit and tried to push pieces around.
I made the following observations and try to summarize what I learned about the core concepts and what could be done to go parallel:

How things are

  • Multiple scenarios can be executed in serial using one Runtime with multiple backends and multiple formatters from a single entry point (CLI or TestTool-Integration)
  • A Runtime is instanciated with immutable RuntimeOptions parameters and loads its backends using reflection. At the same time it serves as the state holder for an executing Scenario.
  • A Backend is somewhat a glue-provider for a certain technology-binding that has a costly initialization while loading glue and hooks. At the same time it serves as the state holder for each executing scenarios world. A Runtime can host multiple backends at the same time.
    • Most backends expose methods that a user can use to add glue code after intialization (at random time, maybe somewhere during test execution)
  • A World is the state of an executing scenario in a certain backend. An executing scenario has its state distributed across the Runtime and possibly across all its backends (if they are stateful)
  • HookDefinitions are backend specific Glue and mostly implemented as delegates that close over their backend (and therefore its World-state)
  • Glue is a mutable repository for all possible bindings from a gherkin step to executable code (Runnables)
  • Stats and UndefinedStepsTrackers are aggregates of the execution of one or more executed scenarios. Important: Their state can be aggregated commutatively (i.e. their state can be created by summing up each scenarios execution stats in arbitrary order).
  • Reporters and Formatters can, by their interfaces, process one scenario at a time.
  • CucumberFeatures encapsulate gherkin ast's into a cucumber model of features/scenarios/examples/steps. They are mostly immutable which is great.
  • Propably expensive operations (thus: things that you dont want to repeat for every scenario) are:
    • 'Runtime#loadBackends'
    • 'Backend.loadGlue'
    • 'RuntimeOptions.formatter', '.stepDefinitionReporter', '.summaryPrinter'
    • 'RuntimeOptions.cucumberFeatures'!

Goals

  • Let Cucumber-JVM execute mutliple scenarios in parallel safely and efficiently (i.e. on multiple threads)
  • Even if unefficient, let Cucumber-JVM execute multiple runtimes in parallel safely (i.e. avoid stateful statics)
  • Avoid big changes
  • Do not introduce avoidable new concepts and dont break with familiar concepts
  • New API should be explicit and clear about its concurrency-behavior

Non-Goals

  • Support asynchronous glue (i.E. glue that returns futures/promises and does not rely on running in the same thread during one scenario execution) (although this would be awesome)

Ideas

The current major problem is the strong coupling between mutable resources. Although we could try to make things safe by sprinkling synchronization barriers and api-conventions with javadocs like 'please do not touch touch this after intialization', this will still hard to get things right then. I'd suggest a refactoring that clearly separates the following phases:

  1. Loading backends, glue and plugins
  2. Loading features
  3. Execution of scenarios

Possible changes

  • Runtime and Backend should do all the heavy lifting (loading backends, glue, plugins) at construction time and are immutable afterwards. This allows them to be safely shared across independent/parallel executions.
    • Alternative: Runtime and Backend should should be split into RuntimeFactory/-Builder and BackendFactory/-Builder that do the expensive operations) and things they could create (Runtimes/Backend) which are cheap to construct and may carry around the state of an scenario execution.
    • Alternative: Move all scenario dependend state into the Scenario/ScenarioImpl. Each Backend stores its world within the Scenario object (maybe within a kind of Context-Map like in the EE/Spring-World).
  • Split Glue into two interfaces. Glue and GlueBuilder. GlueBuilder is used during the expensive phase of Backend initialization while Glue is an Immutable view over an already initialized backend. Backends that provide an api that supports adding steps randomly at runtime might need to be rethought.
    • Alternative: A Backends glue might still be mutable but changes in the glue are copy-on-write operations and each started scenario execution keeps a immutable view of the glue how it was at the time of its instanciation.
    • Alternative: Just trust in the user that he will not change glue while scenarios are running
  • Formatters and Notifiers should be thread safe and decide on their own if they can handle parallel executed scenarios (e.g. a console formatter could just print out progress out of order (as it is arriving) or use backspace printing to render an incremental progress of all parallel executed scenarios). Thus the Formatter method Formatter.startOfScenarioLifeCycle should rather return a ScenarioReporter on which all further reporting for that scenario happens.
    • We might add a AbstractSerializingFormatter that records scenario reportings and serialize/synchronize them as if they would have been executed in serial (#0854dd is a good example). This might be nesessary for most report that directly write to a file (XML, JSON).

I'd be glad to hear your thoughts on this. Since code doesnt lie, prototyping happens here: https://github.com/danielwegener/cucumber-jvm/tree/wip/630-parallel

Fantastic analysis @danielwegener - I couldn't have done it better myself.

What are your thoughts on doing this in parallel with Gherkin3? I'm worried that both streams ofcwork will take a while and touch a lot of the same code.

I plan to rip out the gherkin2 code entirely and modify everything that depends on it. The cucumber.runtime.model package will most likely go away. Formatter and Reporter interfaces will br affected too.

@aslakhellesoy It sounds like as if Gherkin3 will lead to a major release of cucumber-jvm anyway - and that would make breaking changes from this parallel-feature easier to justify. I would be fine to develop this parallel-feature on top of your gherkin3 branch (keeping the commit history small and rebasing regularly should not be such a pain). IMO It would be great if we could put both features into one major release, so users would have to make a maybe painful migration only once (from the point of view seen cucumber-jvm as a product). This may lead to some organizational coordination effort for the roadmap but after working with forked test executions for over a year now, I am not really in a rush :)

So my suggestion would be: You develop gherkin3 on your branch (keeping it on top of master), I'll try to keep the parallel-feature on top of your branch (although I dont know how/if rebasing is practiced in this project). Eventually you merge to master and we can see if the parallel-feature is mature enough to put it into the same major release. If not, well then later :)

Anything new on this issue? I have never contributed to a project such as this but am willing to help in any way that I can - this would be a great capability!

Hi @ereber! Glad you picked up this issue. I've almost forgot about it. In my initial work towards a solution I stumbled across so many entangled state and side effects that I came to the conclusion that it's not really doable without major API changes. These changes would probably not only affect cucumber-jvm but also tools like intellij's cucumber-java integration (and friends), which then would need to support multiple versions of the cucumber-jvm api.
Especially the integration of non-java languages like jruby, jython and groovy is hard to convince to not use global world objects and the formatter chain is a whole huge topic on its own.
I still think it would be a huge benefit to have a cucumber implementation that runs tests in parallel but I doubt it is important enough to break the whole api.

I have another ongoing attempt: I am trying to change the synchronous execution (Runtime.runStep/runHook, CucmberScenario.run/CucumberScenarioOutline.run/CucumberFeature.run) into a trampoline that returns a partial execution and deferrable side effects on Formatters and Reporters that decouples the step execution from the actual reporting of results and allow an executor to choose at which level it wants to parallelize (feature, outline, scenario or never).
This is an minimal invasive attempt that will mostly only change the internal runtime api (and spi for test frameworks). But it will only work if the glue instances provided by the ObjectFactories are stateless or thread-safe (without scenario scoped glue). However, I hope this attempt will guide into a direction that can help to take further steps to make glue code DI align with this execution model.

@danielwegener
Your approach is still lacking some aspects that need to be addressed (at some point in time):

  1. You need to give the user / feature-writer some means of control on parallel-execution.
    For example, the user can tag a feature with the @sequential tag to mark that its scenarios should not be executed in parallel.
  2. If you are not only testing software, but real systems (hardware+software) you need to have some capability/resource-management solution, because hardware resources are limited and a Scenario may require a certain hardware resource to be run. As long as you execute that sequentially, this will not be a problem because you have only one runner.

This approach was discussed in Cukenhagen, when I described my solution to Jarl Friis and he pointed some problems with point 2 above out. My architecture for the parallel-runner also looks slightly different (as far as I can tell). It uses a three-layered architecture (in a map/reduce style):

Executor-Frontend ==> Executor-Backend ==> Worker+     (many workers) 

Frontend:

  • Splits up work into work-items (Features, Scenarios) that can be executed in parallel
  • Delegates work-items to Backend for execution
  • Collects finished work-items with results, etc. from Backend
  • Provides visible results with one (or more) Formatter(s)
  • Provides progress hints (how much is executed, how much is still to do)
  • Generic, can be reused different backends

Backend:

  • Knows how to create worker(s) to execute work-items in parallel (as pool of workers)
  • Delegates work-items to workers
  • Collects results from workers
  • Informs Frontend about finished work-items and results
  • Extension-Point/Plugin: Can be replaced by another Backend strategy
  • Coupled with Frontend in same process

Worker:

  • May run in another process or on another host (compared to Frontend/Backend pair)
  • Runs features or scenario (similar like cucumber program)
  • Provides results back to Backend

This approach allows you to start with a multi-processing executor (or Backend) and add other parallel-execution backends in the future (like a Cloud-Executor, a Cluster-Executor, etc.). In addition, it should simplify some of your multi-threading problems related to formatters, etc. (because this requirement does no longer exist).

Hi @jenisys , cheers for your feedback - good to get the discussion rolling :) . My code attempt mentioned above is already kind of outdated and I learned some more things.

I think to control whether tests should be executed in parallel should not be defined in the gherkin model but rather in the runner (e.g. via @CucumberOptions). Cucumber scenarios itself should (as far as I understood it) be independent and as such, conceptually always be parallelizable. The decision to actually run tests in parallel, is rather a technical one and Gherkin tags are afaik not meant to technically configure the test execution.

Resource management is a good point (e.g. If I only have one selenium browser driver instance, it does not make sense to run tests in parallel). If we assume that these resources are provided as Glue instances by the backends, we could rather Pool glue instances (by type) while each glue instances is capable of executing one step at a time (and may maintain state while a scenario is executed). However, not all Backends or ObjectFactories have these requirements (either because they are stateless or cheap to create on a scenario basis). But the Backend or each backends glue API should indeed be able to express the that it can only create a limited number of instances.

I don't really get your idea about the architectural split into frontend, backends and workers. Could you elaborate, or do you have any link (discussion, code) you could point me to? Do I understand it right that it is about distributing execution to multiple workers? If that and since you mentioned map&reduce - do you have any plans how to distribute the actual test bytecode to workers (thinking about Hadoop/Spark this is horrible to get right and pretty expensive).

Instead of modeling a network of workers who actively pass around their work and results, I'd just create a "work-plan" (an executable graph of features/scenarios) once and let an executorService pull steps from this graph and allocate matching glue instances as long it has capacity (i.e. available glue code instances and unused threads.)

Sure it depends what you test but most cucumber usages I've seen is talking to a selenium grid or rest-endpoints (and in fact spend most of their time waiting for IO). In such environments, the actual test execution code never was the bottleneck. As soon as we can run Scenarios on an ExecutorService I'd bet we won't have any problems to run lets say 100 scenarios in parallel (100 Threads waiting for IO is not really much for an application server).
The next step (and it would not be that much further) would be to support async (e.g. CompletionStage based) glue (using async httpClients or the async selenium api) would make big thread pools (and its scheduling and memory overhead) unnecessary.

But one step after another!

The point about reporters and progressIndicators is also very good. There are some reporters (like the junit reporter) which are only interested in progress events in a timely manner (like "scenario A step 3 failed") and others which really expect to receive a sequence of invocations which must, even if executed concurrently, be serialized before they are written to the wire (like console-output, where single scenarios steps from different scenarios should probably not overlap each other; or even json-reports which are expected to have even a stable order between features (sorted by filename?)).

@aslakhellesoy Are there any objections against upgrading cucumber-jvm to java7 (maybe android)? I consider using the ForkJoin/RecursiveTask-Framework for scheduling/concurrent scenario execution.

@danielwegener I'm convinced that concurrent testing should be provided by the actual framework, e.g. JUnit, i.e. cucumber shouldn't prevent it, but it should implement it either. See for example https://github.com/jhorstmann/zuchini, it's a rewrite that supports JUnit parallel executions (not saying that a rewrite is necessary, but it helps my case).

@whiskeysierra Yes and no. cucumber.api.cli.Main is (kind of) a test-framework that is part of cucumber-jvm and imo should provide concurrent execution (it is used by the intellij cucumber-java runner). But I agree, the actual concurrency mechanism should not leak into the runner SPI used by junit, testng and friends. Ideally I'd prefer to use java.util.concurrent.CompletableFuture but depending on java8 is possibly not really an option.

Is the parallelism added? The issue is still opened.

I think is still open. My latest attempt (https://github.com/danielwegener/cucumber-jvm/commits/wip/630-parallel) required too many refactorings (and changes in the client/non-java APIs) so I finally abandoned it.

+1 - this would be awesome. My tests are currently up to 1.5 hrs.

If your tests are that slow you should fix your architecture.

https://skillsmatter.com/skillscasts/8567-testable-software-architecture

@aslakhellesoy hey buddy, the tier of tests that cucumber meets in my situation is running tests that were done manually before. Your test speed note is really geared for unit testing and integration testing. This layer of testing I have proves the rigor of the system and, due to the application this system is designed for, cannot be run faster than roughly a minute for a scenario. These scenarios COULD RUN IN PARALLEL and my stuff would be done in a minute, thus the desire to have cucumber support this out of the box.

This all makes less sense given that you have left this ticket open since 2014.

And, for the record, I really like cucumber. It actually documents my tests quite well.

@nwertzberger @aslakhellesoy is teasing you, well done for taking that so well and not feeding his trolling :)

It's actually pretty rare to hear of a use-case where parallel tests are all that valuable. In my experience, most people with slow tests also have a lot of internal system state that relies on not running several tests at once.

This is a big piece of work, and there are quite a few other things in the way of it right now. Thanks for adding your support for this, hopefully it will motivate someone to pick this up and help out with the steps to get us there.

@mattwynne i am going to assume everyone else is doing what i just did today after @aslakhellesoy jokingly closed this ticket (and, as of this writing, it's still closed).
I broke out my junit integration to target subsets of my cucumber tests. One junit class per feature, and, for the longer features, annotation dances to target subsets of scenarios.
I can then use the surefire maven plugin to run it in parallel with its forking feature.
It's annoying, and I need to be diligent to make sure that i keep this up, but it works well enough.

I think the point why this ticket is was open for a long time is that it (imo) requires a big refactoring of the whole cucumber-jvm architecture and its language bindings, executor bindings and (possibly) tool integrations. The state is just too scattered (see elaboration above).

@mattwynne I almost took the bait. Still wondering why this issue is closed.

And I have to disagree - It's not always the case that tests are slow but just that there are alot (and that they are run against different enviroments (browsers)). And it's rather the case that you actually have cheap CI-Clusters with elastic worker nodes, ad-hoc docker composite deployments and selenium grids that could bring down your whole browser regression test suite down from 30 minutes to 3 minutes -- if the test-executor would be willing to run tests in parallel without hugging too much resources on its own (forking, no parallel executions, no support for async await).

Yeah @danielwegener, I think you're probably talking more of a re-write than a refactoring TBH. That's what we went through with the Ruby codebase, and it took @tooky and I about 18 months altogether.

Working with the test-case abstraction (rather than being coupled to the Gherkin document structure) is the key, but it breaks all the formatters, and so the herd of yaks grows and grows.

I agree that this would be hugely valuable to people if we could do it. Right now many of us on the core team are hustling hard to get http://cucumber.io/pro out the door, which means we're neglecting our open-source responsibilities in the short term. Medium term we hope it will provide us with a steady revenue stream that enables us to give more of our time back to the community again, and we'll be able to invest in bigger pieces of work like this.

@mattwynne Thanks for the clarification. I wish you the best going pro :). Fortunately Java is not Ruby and (I do not think it would take 18 month) - so once you finde some time to give some feedback or discuss on possible strategies - just drop a ping here and I'd be glad to join.

@danielwegener We do not want to do any of this before lifting Cucumber-JVM to Gherkin v4.0 with the introduction of Gherkin compiler an Pickles (which is some sense mimics the re-write of the Ruby codebase).

If at all interested, we've been battling with the same problem and ended up writing a maven plugin that executes multiple cucumber threads and combines the test reports.

https://github.com/eu-evops/cucumber-runner-maven-plugin

@sponte that's interesting - thanks for link. I use Jenkins scripted pipeline with parallel support to run multiple CLI runners for cucumber. It has advantage to scale on different nodes if required. Jenkins plugin for cucumber is grabbing all json files and generating single report for all.

You might also be interested in: https://github.com/temyers/cucumber-jvm-parallel-plugin (have not used this myself, but was mentioned by others)

There is some plugin like cucumber-jvm-parallel-plugin but to be used with gradle and doesn't interferes with Serenity living documentation?

Not yet. There might be when cucumber can execute pickles in parallel:

#1357

@mpkorstanje I've nearly finished my changes, just have to review the tests I've written to ensure appropriate coverage, also am going to double check some of yours that you wrote to ensure no cross over etc...

Looks like I'm going to have a fun merge too based on your last few commits 😭

No problem! Let me do the merge. I made this bed.

commented

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.