Allow marking an example as should-fail

Question

Allow marking an example as should-fail

petertseng opened this issue 8 years ago · comments

As an extension to multiple examples in #395, we'd like to be able to add examples that we can mark as expected to fail the test.

We can use this to make sure that our test cases properly reject some mistaken solutions that we expect them to reject. The example I gave earlier: In accumulate, we have an inefficient example noted at https://github.com/exercism/xhaskell/blob/master/exercises/accumulate/src/Example.hs#L15 and we say that it should fail the test suite. But this was never automatically verified and indeed for the longest time that example did not even compile until #209

The things to figure out here are:

How will it be specified? e.g. the name of the directory contains SHOULDFAIL? Or something else?
After it is specified, ideas on implementation? Something to do with checking the exit status of stack test, of course. We'll have to keep in mind that we usually have set -e on, but this time we have to invert the exit code of stack test. Maybe something can do that for us. Maybe simply the ! operator will work.

rbasso · Answer 1 · Thu Oct 13 2016 09:10:14 GMT+0800 (China Standard Time)

First, I think we should clarify what it is to fail. I think that we have 3 distinct cases:

Everything compiles and the example solution passes the tests. (success)
Everything complies and the example solution fails the tests. (fail)
The pair (example solution, test suite) fails compilation. (error)

If I was to choose how to mark the expected results, I would go for folders named like this:

examples/fail-something/
examples/error-something/
examples/success-something/

Any other prefix or the lack of a sub-folder should make Travis fail the tests.

To implement this, it would be necessary to separate building from testing.

Another think that may be a problem is that, because we are substituting files in the same project, the cache for the examples seems to be always overwritten. Maybe we'll have to cache separately each of the examples to keep recompilation times reasonable.

All this will probably make the .travis.yml considerably more complex.

Peter Tseng · Answer 2 · Fri Oct 14 2016 00:37:38 GMT+0800 (China Standard Time)

The idea seems good to me.

Another think that may be a problem is that, because we are substituting files in the same project, the cache for the examples seems to be always overwritten.

Ah, I see. Yes, that might be needed. It appears I missed that a cached folder is being used as stack-work for each exercise. Now it looks like it indeed should be for each example.

All this will probably make the .travis.yml considerably more complex.

Sometimes I am told it is not wise to write too-complex logic in shell scripts and it makes me wonder if we should go back to having something like https://github.com/exercism/xhaskell/commits/master/_test/check-exercises.hs . Sure, the complexity would still be somewhere (we only change where and what language it is), but maybe it is more desirable to keep the complexity there.

rbasso · Answer 3 · Fri Oct 14 2016 00:49:37 GMT+0800 (China Standard Time)

Sometimes I am told it is not wise to write too-complex logic in shell scripts and it makes me wonder if we should go back to having something like https://github.com/exercism/xhaskell/commits/master/_test/check-exercises.hs

I was working on something like that, but I still have a lot to learn before opening a PR.

At first I thought it was going to be easy, but I discovered that Stack doesn't like my NixOS environment, so I'll have spend a few more days preparing a Debian virtual machine for development.

Peter Tseng · Answer 4 · Wed Oct 19 2016 15:38:51 GMT+0800 (China Standard Time)

Should I assume you are working on it and therefore I shouldn't, or might I give it a try sometime?

rbasso · Answer 5 · Wed Oct 19 2016 16:02:52 GMT+0800 (China Standard Time)

I got stuck in my solution and right now I don't have the time to continue, so please please please do it! 👍

The hardest part seems to be how to separate the .stack-work caches. I came with two ideas to solve it:

Move the example to ./src and package.yaml to ./.
Create a new folder (possibly in the user's home folder) for each pair (exercise/example) and copy everything needed there before testing.

In both cases we have to link each .stack-work to a different subfolder in ./foldercache?.

I think that the second solution is better because it would avoid any present or future problem with the global stack cache. I made a few tests and seems that the first solution works too, but sharing exactly the same path for distinct build feels wrong...

Anyway...it is you call! 😄

Peter Tseng · Answer 6 · Mon Jan 23 2017 18:48:14 GMT+0800 (China Standard Time)

I may extract some of the current functionality contained in travis.yml out into a shell script in bin/ or something. The reason is that I would really like to have something that lets me automatically test a single example locally. My general idea is to have:

test-example.sh - give it the path to an example, and it will check that it has the desired result (success, error, fail).

(If it is desired, we could also have such scripts as test-all-examples-for-exercise and test-all-exercises, unsure. But test-example is a big one I want)

I know we got rid of _test in #203, but I think being able to easily test an example locally would be helpful for me.

The template issues for new tracks suggest that tests:

be runnable locally: https://github.com/exercism/x-template/blob/master/TRAVIS#L3
allow testing one exercise at a time: https://github.com/exercism/x-template/blob/master/TRAVIS#L11

So it seems reasonable for us to follow the suggestions.

One more argument for having this easy script: It makes the completion of #398 quite trivial: Just say "Run the script" in the README.

rbasso · Answer 7 · Wed Jan 25 2017 13:58:31 GMT+0800 (China Standard Time)

I like the idea of being able to test an example/exercise easily! 👍

Peter Tseng · Answer 8 · Tue Nov 27 2018 07:25:36 GMT+0800 (China Standard Time)

Note that I have not yet found any other track that does this. I imagine I would have pointed to any other examples I was aware of when opening this issue.

I don't think I've looked particularly hard since then, so that information might be out of date.

I would gladly defend this idea regardless of what any other tracks are doing, though.