ClassCastException when using Combinators during shrinking
SimY4 opened this issue · comments
Testing Problem
ClassCastException when using Combinators and arbitraries producing elements of the type hierarchy.
java.lang.ClassCastException: class mypackage.MyTypeLike$Any cannot be cast to class mypackage.MyTypeLike (mypackage.MyTypeLike$Any and mypackage.MyTypeLike are in unnamed module of loader 'app')
at net.jqwik.engine.properties.arbitraries.combinations.DefaultCombinator5.lambda$combineFunction$0(DefaultCombinator5.java:33)
at net.jqwik.engine.properties.shrinking.CombinedShrinkable.createValue(CombinedShrinkable.java:25)
at net.jqwik.engine.properties.shrinking.CombinedShrinkable.value(CombinedShrinkable.java:21)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682)
at net.jqwik.engine.properties.shrinking.AbstractSampleShrinker.lambda$shrink$2(AbstractSampleShrinker.java:52)
Suggested Solution
Very brief analysis points to the lack of variance in functional combinator methods. i.e.:
<R> Arbitrary<R> flatAs(F5<T1, T2, T3, T4, T5, Arbitrary<@NotNull R>> flatCombinator)
<R> Arbitrary<R> as(Combinators.F5<T1, T2, T3, T4, T5, R> combinator)
Combinators.Combinator5<T1, T2, T3, T4, T5> filter(Combinators.F5<T1, T2, T3, T4, T5, Boolean> filter)
<R> Function<List<Object>, R> combineFunction(Combinators.F5<T1, T2, T3, T4, T5, R> combinator)
should probably be relaxed to:
<R> Arbitrary<R> flatAs(F5<? super T1, ? super T2, ? super T3, ? super T4, ? super T5, Arbitrary<? extends R>> flatCombinator)
<R> Arbitrary<R> as(Combinators.F5<? super T1, ? super T2, ? super T3, ? super T4, ? super T5, ? extends R> combinator)
Combinators.Combinator5<T1, T2, T3, T4, T5> filter(Combinators.F5<? super T1, ? super T2, ? super T3, ? super T4, ? super T5, Boolean> filter)
<R> Function<List<Object>, R> combineFunction(Combinators.F5<? super T1, ? super T2, ? super T3, ? super T4, ? super T5, ? extends R> combinator)
@SimY4 Thanks for reporting. Can you add a short example reproducing the phenomenon?
First thought: Changing type signatures without changing the implementation shouldn’t be able to get rid of a CCE if it already compiled before. But I may be wrong.
@SimY4 I couldn't reproduce it on my own. Waiting for your input then.
@jlink Sorry, took me a while to pin it down. I'm still trying to create a minimal reproducible example for this bug. But my current understanding of the problem is:
I have an entities with identity equality defined like so:
abstract class Common {
private final String value;
protected Common(String value) { this.value = value; }
public boolean equals(Object o) { return this == o || (o instanceof Common && value.equals(((Common) o).value)); }
public int hashCode() { return value.hashCode(); }
}
class This extends Common {
static final This INSTANCE = new This();
This() { super(""); }
}
class That extends Common {
static final That INSTANCE = new That();
That() { super(""); }
}
This and That types are different but their identity is the same, therefore:
Arbitraries.just(This.INSTANCE)
and Arbitraries.just(That.INSTANCE)
identity is also the same. And any derivative from both of these arbitraries will have the same identity.
So, when it comes to memorization that's built-in inside Jqwik, this right here:
jqwik/engine/src/main/java/net/jqwik/engine/facades/Memoize.java
Lines 28 to 34 in 1e45490
won't tell them apart. So one can be substituted with another. And that would break the runtime type checker.
Based on the sample above, I would say equals
is not well-defined.
What is the use case for having different classes that are equal in that way?
Have you considered adding getClass() == o.getClass()
check to the Common
implementation, and, getClass().hashCode()
to Common#hashCode
?
PS. I wonder if it would make sense to avoid caching Arbitraries.just
in jqwik. In other words, Arbitraries.just
should be a trivial generator, so it might be faster to just return the generator instead of trying to create a cache key, lookup the value in the cache, and keep the cache in-memory.
@vlsi you might be right. But discovering that my somewhat broken equality is being used as a cache key also came as a surprise to me since I wasn't expecting jqwik to do that. I'd probably consider Arbitraries.just
and Arbitraries.of
as potentially not giving stable equality to rely on.
But the problem when goes higher because:
record Product(Common p1, Common p2) {}
Combinators.combine(arbCommon(), arbCommon()).as(Product::new)
are also the same now
Caching is done on a heuristical basis and requires reasonable equality implementation. In a way this is a hack, but I haven't found a better solution yet. If caching was done on an identity basis it wouldn't have the intended effect.
is there a configuration option to disable it? Maybe I can add that in
Nope. Convince me that it would be a good idea to make caching configurable.
The problem is that without generator caching, many non-trivial value generators will become VERY slow.
In other words, configuration would have to happen on a case by case basis.
Something like arbitrary.dontCacheGenerators()
.
Convince me that it would be a good idea to make caching configurable.
WDYT of making the cache configurable (e.g. to allow more or less memory for the caching)?
At the same time, configuring the cache to 0
would be a reasonable way of disabling the cache, which might be helpful to see if the caching introduces differences (e.g. if there are objects with weird equality), and it might be helpful for benchmarking.
In other words, configuration would have to happen on a case by case basis.
Something like arbitrary.dontCacheGenerators().
The key issue here is that users supply Arbitraries.just(weridThing)
, and they, well, reasonably expect the arbitrary would produce the exactly the same thing they passed in the constructor.
I doubt they will know when it is the right time to call .dontCacheGenerators()
, so I would consider that as a workaround only :-/
Even though handling Arbitraries.just
specially does not seem to resolve all the possible cases of using arbitraries based on objects with weird equality, the following makes sense for me:
- Avoid caching generators for
Arbitraries.just
. We probably need to test performance, however, I would expect that creating the generator every time or caching it intoArbitraries.just
would be better from the perf point of view - Use the supplied
just
value for equality only for well-known classes likejava.lang.String
,java.lang.Integer
,java.lang.Class
,enum
, and so on. Treat all other objects as "unknown", so use identity forJustArbitrary#equals and hashCode
. Then, if people build trees ofArbitrary.randomOf(Arbitrary.just(weirdThing), Arbitrary.just(weirdThing))
, then the existing caching would work just fine as it won't trustjust(weirdThing)
.
Of course, the caching would be sub-optimal, however, it might be a reasonable compromise between reasonably hard to support code, reasonably easy to use (less caching surprises), and reasonably fast to execute.
WDYT?
Combinators.combine(arbCommon(), arbCommon()).as(Product::new)
are also the same now
@SimY4 , would you clarify the issue? I don't get it :-/
- Avoid caching generators for
Arbitraries.just
. We probably need to test performance, however, I would expect that creating the generator every time or caching it intoArbitraries.just
would be better from the perf point of view
The simplest solution for this special case is to use identity hashing for just(..)
. This is a single-line change. But that doesn't solve the general problem. Already Arbitraries.create(..)
would fail with a broken equals implementation.
Frankly, I'm not convinced it's time well spent to handle anomalies like that. There are probably dozens or hundreds of similar cases all over jqwik.
Some afterthoughts that I had lately:
What's the effectiveness of this cache anyway? If I'm just by chance defining equality for some of my arbitrary instances.
What's the identity of Arbitrary<Function<A, A>>
? I've discovered some shady reflection magic that tries to compute identity of a function by hashing its inner fields. But stateless functions will all resolve as identical.
What's the effectiveness of this cache anyway?
jqwik relies on the cache for cases like frequencyOf(List<Tuple2<Integer, Arbitrary<T>>> frequencies)
.
In order to prepare a generator for such arbitrary, jqwik
needs to calculate cumulative upper bounds, so the generation could be like "draw a uniform random and binary search the value according to frequencies". The cache does work and it improves perf in my case (==if I disable the cache, the number of tries/second reduces).
See
The preparation step has non-trivial overhead, so creating generators always does not sound good.
I've discovered some shady reflection magic
I love that term :-). jqwik is 50% reflection magic. Some of it is shady.
So you're right. My counter-argument is: It (mostly) works for the use cases I've seen so far.
@vlsi I'm sorry, I may be overstepping here, but I don't see the need for what you just described. Isn't frequency about weighing your arbitraries and randomly selecting one based on that? I can't think of a reason for you to explode and weigh individual items.
The frequency method you've shown is about frequencies of values and not arbitraries, but I'm too new to this codebase to maybe I'm not seeing what you're showing me. I still can't see how caching of arbitraries is useful. Are you trying to save on heap space?
I still can't see how caching of arbitraries is useful. Are you trying to save on heap space?
Weighted generators require preprocessing which is CPU-intensive. Caching enables running the pre-processing once rather than "for every generated item".
I still can't see how caching of arbitraries is useful. Are you trying to save on heap space?
Generators are cached, not arbitraries - based on the arbitrary which creates them.
It's not about saving heap space but creation time. In addition to the example from @vlsi, there are many more. One prominent example is recursive arbitraries. Creating a new generator on each recursion step can be very time-consuming. Same holds for nested combinators. Without this kind of caching (or another approach to speed this up) quite a few use cases get out of reasonable performance bounds.
All that said, is there a reason that the supposedly broken implementation of equals()
cannot be fixed?
Is there another issue hiding underneath that I'm missing?
Just wanted to include this for future references, just in case we'd land on some outcomes from these discussions. I've fixed my equalities and hashcodes and now they aren't universal like they used to. Most of the CCE
exceptions are gone but some persisted and they happen in a different place in jqwik:
java.lang.ClassCastException: class mypackage.MyType cannot be cast to class mypackage.MyOtherType (mypackage.MyType and mypackage.MyOtherType are in unnamed module of loader 'app')
at net.jqwik.engine.properties.arbitraries.EdgeCasesSupport.lambda$filter$7(EdgeCasesSupport.java:111)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:178)
at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682)
at net.jqwik.engine.properties.arbitraries.EdgeCasesSupport.filter(EdgeCasesSupport.java:113)
at net.jqwik.engine.properties.arbitraries.ArbitraryFilter.edgeCases(ArbitraryFilter.java:38)
at net.jqwik.api.arbitraries.ArbitraryDecorator.edgeCases(ArbitraryDecorator.java:62)
at net.jqwik.engine.properties.arbitraries.ArbitraryFilter.edgeCases(ArbitraryFilter.java:38)
at net.jqwik.api.arbitraries.ArbitraryDecorator.edgeCases(ArbitraryDecorator.java:62)
at net.jqwik.engine.properties.arbitraries.ArbitraryFilter.edgeCases(ArbitraryFilter.java:38)
at net.jqwik.api.arbitraries.ArbitraryDecorator.edgeCases(ArbitraryDecorator.java:62)
at net.jqwik.engine.properties.arbitraries.ArbitraryFilter.edgeCases(ArbitraryFilter.java:38)
at net.jqwik.api.arbitraries.ArbitraryDecorator.edgeCases(ArbitraryDecorator.java:62)
at net.jqwik.engine.properties.arbitraries.ArbitraryFilter.edgeCases(ArbitraryFilter.java:38)
at net.jqwik.api.arbitraries.ArbitraryDecorator.edgeCases(ArbitraryDecorator.java:62)
at net.jqwik.engine.properties.arbitraries.ArbitraryFilter.edgeCases(ArbitraryFilter.java:38)
at net.jqwik.engine.properties.RandomizedShrinkablesGenerator.lambda$resolveEdgeCases$3(RandomizedShrinkablesGenerator.java:126)
Sorry, it was my bad with the last one.
Sorry, it was my bad with the last one.
You mean it's not a jqwik issue?
@jlink I can twist it that it is. 😀 The last one came from ArbitraryConfiguratorBase
's contract. Basically, the contract says that I should define a class like this:
class Configurator extends ArbitraryConfiguratorBase {
public Arbitrary<MyType> configure(Arbitrary<MyType> arb, MyAnnotation ann) { ... }
}
This looks like it a correct configurator. Except it isn't because you also shouldn't forget to overrideboolean acceptTargetType(TypeUsage)
with:
MyType.class.isAssignableFrom(targetType.getRawType());
without it it can mess up all annotated arbitraries.
But it's not relevant to the original issue, so it's irrelevant to the initial issue with cache collisions.
@jlink I can twist it that it is. 😀 The last one came from
ArbitraryConfiguratorBase
's contract. Basically, the contract says that I should define a class like this:class Configurator extends ArbitraryConfiguratorBase { public Arbitrary<MyType> configure(Arbitrary<MyType> arb, MyAnnotation ann) { ... } }This looks like it a correct configurator. Except it isn't because you also shouldn't forget to override
boolean acceptTargetType(TypeUsage)
with:MyType.class.isAssignableFrom(targetType.getRawType());without it it can mess up all annotated arbitraries.
This is exactly what I discovered yesterday when looking into "your" other issue: #493
At the time of implementing ArbitraryConfiguratorBase
I forgot (or was too lazy to) doing it correctly. Technical debt.