JetBrains / teamcity-dotnet-plugin

TeamCity plugin for .NET projects

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Failing tests reported as success

lxalln opened this issue · comments

We just had a failed build go through TeamCity and get deployed, because TeamCity continued after tests failed.

5 tests failed in total, and the build log says:

Process finished with positive exit code 1 (some tests have failed). Reporting step success as all the tests have run.

What?! Surely any test failing is enough to fail the step?! Since when was 'all the tests have run' a measure of success??

@lxalln, step was success because of tests were started and process was finished successfully. But what is the state of a build?

@NikolayPianikov the build is failed, but all subsequent steps ran, which means the "Push build to Octopus" step ran... even though the tests failed!

Just configure each step like
Options

I suppose that would work, even if I don't agree that step should be marked as success.

Is this a recent change? We've been relying on TeamCity to fail on unit test failure for years.

It is original behavior. For most of runners a process returns non zero exit code which means that a running was not successful. For runners like NUnit, VSTest, dotnet test, msbuild /t:VSTest, dotnet vstest positive exit code just an amount of failed tests. See this thread for details

@lxalln anyway I've created the issue. You could vote for it

I've voted for that issue but I can't understand the logic behind the original behaviour. If a test fails you want the step to fail and then the whole build to fail. It's a simple binary decision and I'd imagine that's pretty much the default behaviour everyone would want and expect.

@badgerparade, the default is: when any test fails it changes the state of a build but does not stop the build. For cases when build contains a lot of tests and some of them are flaky the this is a most suitable behaviour from the TeamCity team point of view. But user could directly specify the option "Only if build status is successful" to skip some build step when the state of a build is not successfull.

This behavior cannot be changed easy because customers are accustomed to seeing this behavior. But anyway you should create issue in our tracker and vote for it.

Flaky tests is not a reason to change default platform behavior. Flaky tests is a reason for a user to fix flaky tests, or for the platform to provide a non-default workaround.

I agree with @robertglickman we upgraded to TC 2018.2 from TC 10.x and we discovered that nasty change of default behavior.

If the old behavior was a bug regarding some documentation I could understand but at least it should have been communicated properly.

It's not very nice to change a standard platform behavior that used to work that way for years.

@NikolayPianikov the build is failed, but all subsequent steps ran, which means the "Push build to Octopus" step ran... even though the tests failed!

Exactly what happened to us

Sorry, but executing the next build step after tests fail is ridiculous. This cannot be "Default" behavior since when we add a new build step, execution step is set to "If all previous steps finished successfully" by default. It seems to suggest that build should not proceed if tests fail, which obviously turned out to be wrong assumption.

This is a pit-fall and I'm sure we are not the only one who were caught off-guard.

We are using ".NET CLI" plugin to run our tests. The failure condition is set to: Fail build if: at teast one test failed So why should it still execute the next step?

Update: I fixed it by adding an additional failure condition as below instead of proposed work-around.
image

This is because for us changing Execute Step to "Only if build status is successful" for the step below the "Test" step did not make much sense. Especially, since we may reorder or add/ delete the build steps later.

I just need to chip in here! A process returning a non-zero exit code should be interpreted as failing. A testrunner returning a non-zero exit code should be interpreted as a failure. When tests are failing, the build is not ok, and should not be released.

You can release even though your tests are failing, but then you should mark the next steps to continue even if the tests are failing. The default should be a safe default, not an unsafe one.

I have no idea how you're justifying this behavior. Are you releasing a new version of TeamCity even though all tests are failing? What would you say if all your vendors started shipping code where all tests fail because the products are broken? Who actually wants this behavior? Who're having stakes in keeping this clearly wrong default? How on earth can we change this so you don't break a lot of production environments and cause millions of dollars in damage?

I would categorize this as a fatal flaw in the product which must be hotfixed immediately, but it doesn't seem like anything is happening. Don't you care that your product is breaking production environments?

Please explain to me why we who complain about this are wrong, and I'll eat my hat, apologize for being a moron, and crawl away in shame. If you're unable to give a good explanation, please change this behavior.

PS: We noticed this due to a bug in the configuration which caused a test failure even though there was no error. I would have been quite angry if anything bad happened.

This is not a specific dotnet CLI runner behavior. This is the usual behavior for all runners of the this type (NUnit tests, Visual Studio Tests, IntelliJ IDEA Project ...). I would like to agree with you that each failing test should be a reason of failing build and in an ideal world the logic should be so, but there is some cons of this and reasons why I did not do this fix right now:

  • if I changed this behaviour dotnet CLI runner would work in different way in comparison with other runners of this type
  • if I change this behaviour for all runners in TeamCity how many users come to us 1K, 1M?
  • if your project has more than 100k tests and 15% of them some UI tests (emulated clicking to buttons) what is the chance that all 100k tests are successful?
  • and most important thing: if your build configuration contains 40 steps and 20 of them runs a group of tests and (for instance) the first test step is failing, do you want to run tests of other 19 groups or we should stop a build immediately and stay ignorant about state of other tests? Or we should specify a different behaviour depending on the build runner types: so we should run all tests but do not deploy?

I hope you understand me. I'm on your side ... 😥

I can make some TeamCity "internal property" which can be configured for TeamCity build configuration to stop a build if there are some failed test in dotnet CLI runner.

What we should do if the command 'dotnet tests' was specified using a wildcard like *.Tests to run tests for all project in one build step with Tests in the end?

if I change this behaviour for all runners in TeamCity how many users come to us 1K, 1M?

Do you believe there are such numbers that rely on "failing tests are ok"? We've been using TC for 8 years on projects here before hitting this case, and in our case, we consider it a catastrophic failure.

if your project has more than 100k tests and 15% of them some UI tests (emulated clicking to buttons) what is the chance that all 100k tests are successful?

If you have tests which are ok to fail, I bet you would group them, tag them or otherwise handle them to not trigger an error. I do not feel this is a valid argument in the discussion.

and most important thing: if your build configuration contains 40 steps and 20 of them runs a group of tests and (for instance) the first test step is failing, do you want to run tests of other 19 groups or we should stop a build immediately and stay ignorant about state of other tests?

I have to admit that we don't have any configurations with many steps running tests of which some should be allowed to fail, so I don't have any experience with such builds. That said, if I ever would have a test that should be allowed to fail without triggering a failure, I would handle such cases.


I feel the current behavior is a real wtf, and a bad default. Knowing what would break for existing builds is difficult unless you could get some telemetry or so for the cases -- but even seeing this happening wouldn't know if you're in the group that want this behavior, or in the group thinking of it as a bug.

I fully understand you're reluctant to change a default behavior and that could upset a lot of people as the current behavior is upsetting a lot of people, but are there other measures you can take to avoid this? Can a build step with a testrunner get a big warning sign to make people aware of this?

Could we perhaps get a setting "treat tests failures as build failures"?

I agree with others - this is very unintuitive, dangerous and hard to spot.

In my case the configuration is:

  • Failure conditions: one of build steps exited with an error, at least one test failed
  • All steps are configured with: execute if all previous steps finished successfully
  • I use NUnit runner for running tests

The build displays as follows:

  • Build status is 'Failed'
  • In the build log NUnit step is red and says 'Process exited with code 2'

So my build failed, my step failed and the process still ran all the steps and deployed broken software.
I only noticed this because failed builds took a suspiciously long time to finish even with broken tests.

If you're saying that a test step with non zero, positive exit code is by definition successfully run (which on its own is dodgy IMO), at the least I'd expect to see it in yellow, not red.
I'd also expect a config setting on the test runner to fail it on test failures.

Just for a voice on the opposite side: I do rely on builds continuing when tests fail, because I want to run all the tests (there are several build steps that each run a different suite of tests). It's annoying if failures in one of the early suites prevent later suites from running.

I also have a couple of tests that are deliberately left in a failure state, because it acts as a reminder to fix something before a particular milestone -- but until that milestone the rest of the build functions as intended and can be used just fine. (It's documentation-related, if that helps understand why.)

But then, I also never deploy to production at the end of a build -- mostly it just leaves an artifact for QA to look at the next day, for some it will deploy to a QA server automatically, but that's all. So a bad build "finishing" is less dire anyway.

Just for a voice on the opposite side: I do rely on builds continuing when tests fail, because I want to run all the tests

I think this is a common feature and I don't think anyone is arguing against this option. The problem is the default behaviour which I think should always be erring on the side of caution. Let's say the pipeline stops on the first failed test by default. You will then investigate, find out that some steps didn't get run and if you want them to run you will change the configuration accordingly. On the other hand, let's say the pipeline continues to run on a test failure by default. You may not notice this behaviour for months and even deploy broken code. I'd argue that this is the case here (given the test runner step succeeds by default and 'Execute if all previous steps finished successfully' is the default criterion).

Just for a voice on the opposite side: I do rely on builds continuing when tests fail, because I want to run all the tests

I think this is a common feature and I don't think anyone is arguing against this option. The problem is the default behaviour which I think should always be erring on the side of caution. Let's say the pipeline stops on the first failed test by default. You will then investigate, find out that some steps didn't get run and if you want them to run you will change the configuration accordingly. On the other hand, let's say the pipeline continues to run on a test failure by default. You may not notice this behaviour for months and even deploy broken code. I'd argue that this is the case here (given the test runner step succeeds by default and 'Execute if all previous steps finished successfully' is the default criterion).

This is exactly what happened for me, we deployed a broken build to production even when the unit tests failed, we assumed that the test runner would stop the build if any tests failed. Thankfully we had a blue / green setup so could roll back easily. We've now moved away form TeamCity, there are may reasons why we did it but this odd default behaviour was one of them.