generated arbitrary list from fixed inputs
drobert opened this issue · comments
Testing Problem
In complicated data sets (e.g. for spark or similar) we often need to establish the 'bounds' or 'universe of allowed IDs' or similar such that we can produce a semi-arbitrary data set that still has high likelihood (or perfect likelihood) of joining together safely.
Consider testing a system that combines ad clicks with ad campaigns, with data types looking something like this (note this is an intentionally simplistic example):
// imagine getters/setters exist, etc.
class AdClick {
private long adId;
private Instant clickTime;
}
class Ad {
private long id;
private long campaignId;
}
class Campaign {
private long id;
private BigDecimal costPerClick;
private BigDecimal bugdet;
}
Processing might join all ad clicks against all ads against campaigns to produce the total cost (clicks * costs per click) vs the budget for each campaign at some point in time. One important test case is the 'happy path' where all ad clicks correspond to an ad and all ads correspond to a campaign.
Mechanically, I think the approach would generally be to:
- produce
Arbitrary<List<Campaign>>
(some arbitrary list of campaigns, comprising the 'universe' of known campaigns) - produce one or more
Ad
for eachCampaign
- produce one or more
AdClick
for eachAd
(alternatively, produce all known campaign ids and all known ad ids up-front and then generate the full objects from there).
In either case, there would at some point be within a flatMap operation List<Campaign>
and we need to create at least one AdGroup
for each Campaign
. I don't see an approach that looks much different than this:
Arbitrary<List<Campaign>> arbCampaigns = ...;
arbCampaigns.flatMap(campaigns ->
// note: this is List<Arbitrary<T>> rather than Arbitrary<List<T>>.
// semantically, it feels reasonable, the list isn't really arbitrary but
// each element within the list is arbitrary except for the corresponding campaign id
List<Arbitrary<AdGroup>> arbAdGroups =
campaigns.stream()
.map(Campaign::getId)
.map(campaignId -> Arbitraries.longs().map(adId -> new AdGroup(id, campaignId))
);
// and a similar flatMap for the list of arbitrary ad clicks
)
Suggested Solution
I think it would be useful to have a built-in mechanism to go from List<Arbitrary<T>>
to Arbitrary<List<T>>
. (And/or Stream<Arbitrary<T>>
). Something like:
// I picked 'sequence' as the common FP name for such an operation
public static <T> Arbitrary<List<T>> sequence(List<Arbitrary<T>> in) {
return in.stream()
.collect(
() -> Arbitraries.just(new ArrayList<>()),
(arbList, arb) -> arb.flatMap(e -> arbList.map(l -> l.add(e))),
(l1, l2) -> {
l1.flatMap(l1p -> l2.map(l2p -> {
l1p.addAll(l2p);
return l1p;
}));
}
);
}
The above example would then look something like:
Arbitrary<List<Campaign>> arbCampaigns = ...;
arbCampaigns.flatMap(campaigns ->
List<Arbitrary<AdGroup>> tmpArbAdGroups =
campaigns.stream()
.map(Campaign::getId)
.map(campaignId -> Arbitraries.longs().map(adId -> new AdGroup(id, campaignId))
);
Arbitrary<List<AdGroup>> arbAdGroups = sequence(tmpArbAdGroups);
)
I haven't gone through your problem in detail (yet). Have you looked at
ListArbitrary.mapEach(..)
and
ListArbitrary.flatMapEach(..)
?
A somewhat simpler example using flatMapEach
for what I think you want to do:
@Property(tries = 10)
void addAgesToFixedListOfUsers(@ForAll("users") List<User> users) {
System.out.println(users);
// Assertions.assertThat(users).hasSize(1);
}
@Provide
Arbitrary<List<User>> users() {
Arbitrary<String> names = Arbitraries.strings().alpha().ofLength(5);
ListArbitrary<User> users = names.map(User::new).list().ofMinSize(1).ofMaxSize(5);
return users.flatMapEach((allUsers, user) -> {
IntegerArbitrary ages = Arbitraries.integers().between(0, 100);
return ages.map(age -> {
user.age = age;
return user;
});
});
}
static class User {
String name;
int age = -1;
public User(String name) {
this.name = name;
}
@Override
public String toString() {
return "User{name='" + name + '\'' + ", age=" + age + '}';
}
}
One could argue that there should be a simpler form for
return users.flatMapEach((allUsers, user) -> {
IntegerArbitrary ages = Arbitraries.integers().between(0, 100);
return ages.map(age -> {
user.age = age;
return user;
});
});
Especially since allUsers
is not needed in this case.
For example:
IntegerArbitrary ages = Arbitraries.integers().between(0, 100);
return users.combineEach(ages, (user, age) -> {
user.age = age;
return user;
});