Solution refactor

Question

Solution refactor

matthieu-vergne opened this issue 4 years ago · comments

After discussing it in other threads, here is a plan to improve Solution in an iterative way. Basically, it is a sequence of refactoring and deprecation tasks, without removing anything to not break existing code, including for the jMetal users.

Each deprecated element must be documented with the @deprecated Javadoc tag to tell how to migrate. Since we have several deprecation steps, it may be that an old deprecation suggest to migrate to another (more recent) deprecated stuff. That is not an issue because this newly deprecated stuff should have its own suggestion. The user should eventually reach a non-deprecated component after some migration steps.

Some steps of this plan, like creating a class, are atomic: they are done in one shot. Other steps, like replacing methods, are way heavier but can be done progressively, like few methods at a time. This is what makes the whole plan progressive: it can be stopped and resumed at any point.

Generalyze from the many `XxxSolution` extensions to the unique `Solution<Xxx>`

The first step is to simplify the types hierarchy by going back to the generic interface Solution. We can rely on its attributes to store additional data:

Create an interface like this one (although I would probably rename it Property or alike) to read/write additional values from/in solutions : ed79bba
Provide an implementation for each additional property brought by extensions of Solution, like I did here for DoubleSolution: 1215981
Progressively refactor all the components to use these objects instead of the specific methods of the interfaces, using these implementations by default. As discussed earlier, I will favor factory methods over constructors.
After having no specific dependency anymore, replace all the XxxSolution by the more generic Solution<Xxx>

Allow the use of custom solution generators

At this point, we don't need the extensions anymore, but we are still generating solutions from the solutions themselves (copy) and from problems (createSolution), so these extensions are still in use. We must first allow their replacement:

Generalize the no-arg instantiators (random, problem) by Supplier objects, with the current components (random, problem) used as default. Again, I will favor factory methods over constructors.
Generalize the copies, which take another solution in argument, by UnaryOperator<Solution<T>>, maybe making our on interface for readability, like SolutionCopier<T>.copy(solution). We can use Solution.copy() as default again.

Use of a generic `Solution` implementation

Now that all components can be extensively customized on their solution instanciations, let's exploit it:

Create a proper generic implementation of Solution, like here but still with the primitive arrays, abstraction is for later: deca799
For each component having a factory method using legacy instantiators as defaults (problem/copy), deprecate the method and add one with this implementation as default instead (replace problem/copy with relevant solution factory methods).
Replace all the calls to the deprecated method by the new one.
Deprecate all the Solution extensions and their implementations. Since they are only used in deprecated factory methods & constructors, no additional migration work should be necessary here.
Deprecate Solution.copy(), since they are replaced by solution factory methods.

Replace primitive arrays by abstractions

At this point, we primarily rely on the generic Solution implementation but it still uses arrays, so let's change that:

Extends the Solution interface to provide abstractions, like here: 1a81cb0
Create a new generic implementation of Solution optimized for abstractions, like here: deca799
For each component using the "primitive" version as default, refactor the component to use the new methods instead and replace the default by the "abstracted" version. At this point, those using the deprecated constructors which still rely on the old Solution extensions will have an additional conversion, although it should remain small if we smartly use streams/iterators where relevant.
Deprecate the "primitive" implementation once no component uses it anymore.
Deprecate the "primitive" methods of Solution, since they have been replaced by the abstracted ones.

Simplify `Solution` uses

Since get/set specific variables can be done through the already available list, no need for the additional getter and setter. Same thing for the others. So:

At this point, those using the deprecated constructors which still rely on the old Solution extensions will be enforced to use the abstraction, so having a systematic additional conversion for objectives and constraints. However, this is the last run before the new major version, so it is really time to migrate now from this deprecated stuff.

New major version

If not done yet, it would be wise to plan a new major version here. By releasing the current state as the last minor version for the current major, people would just have to update to it to get all the deprecation warnings and go through them for proper migration.

By removing all the deprecated stuff and releasing as the next major version, those having migrated can just replace by the new major version with no problem. Only the major version breaks, so if people complain about breaking stuff a relevant suggestion would be to go back to the previous version and fix all the warnings first, so we don't have to spend time helping everyone. Of course, this only applies for those up to date with jMetal 5. So expect people on older version to require more help from us. But if we make this deprecation steps a habit, then we should significantly decrease the support for the next versions.

Worth noting, after the major version is released, all the deprecated stuff is gone. Especially the annoying sets of constructors plugged to Solution extensions. So it opens the possibility to generalize all the relevant components to deal with more types. Indeed, for now we have only replaced extensions by their generic counter parts, but never generalized further. Once this plan is complete, we can think about further generalizations.

Matthieu Vergne · Answer 1 · Fri Jul 24 2020 00:24:01 GMT+0800 (China Standard Time)

@ajnebro What do you think about this plan? May I start with it? Or more time to think about it?

Antonio J. Nebro · Answer 2 · Fri Jul 24 2020 15:25:09 GMT+0800 (China Standard Time)

I have to think. The main reason is that I'm working with some colleagues in techniques to accelerate the execution of NSGA-II, and we achieve significant time reductions: https://jmetal.readthedocs.io/en/latest/mnds.html. If changing the objectives in the solutions from arrays to list have a significative negative impact in performance, all our effort could be in vain.

Matthieu Vergne · Answer 3 · Fri Jul 24 2020 17:15:44 GMT+0800 (China Standard Time)

Please remember that ArrayList is a List backed by an array. Surely, using it should not change much. You may add some more indirections (methods calls) but these are optimized by the JVM along the run. And a good abstraction should be applicable without changing much the algorithm. Since you divide the time by more than 2, abstractions should not make you go back there. It should be like a few percent more time at worst, otherwise there is something going wrong.

The link you provide has a dead link for NSGAIIWithMNDSRankingExample and I don't find it on master. Do you have a branch with it? I may provide an refactoring with proper abstraction for you to compare.

Antonio J. Nebro · Answer 4 · Fri Jul 24 2020 18:12:13 GMT+0800 (China Standard Time)

That class is one of those moved to the jmetal-experimental package: https://github.com/jMetal/jMetal/blob/master/jmetal-experimental/src/main/java/org/uma/jmetal/component/example/multiobjective/nsgaii/NSGAIIWithMNDSRankingExample.java

I see that I have to review the documentation and check the links.

Matthieu Vergne · Answer 5 · Fri Jul 24 2020 18:22:51 GMT+0800 (China Standard Time)

I didn't have this one imported yet. I will check that.

Matthieu Vergne · Answer 6 · Fri Jul 24 2020 18:46:45 GMT+0800 (China Standard Time)

Could you please share the required files, like resources/referenceFrontsCSV/DTLZ2.3D.csv?

Antonio J. Nebro · Answer 7 · Fri Jul 24 2020 22:13:10 GMT+0800 (China Standard Time)

That file should be in that folder. I've have just run the program and everything seems ok.

Matthieu Vergne · Answer 8 · Fri Jul 24 2020 22:17:33 GMT+0800 (China Standard Time)

I only have the irace folder there, not referenceFrontsCSV.

Antonio J. Nebro · Answer 9 · Fri Jul 24 2020 22:19:11 GMT+0800 (China Standard Time)

The resources folder is in the root directory of the project.

Matthieu Vergne · Answer 10 · Fri Jul 24 2020 22:20:35 GMT+0800 (China Standard Time)

ah, this is because you have it in the parent module, not in the module of the class.

Matthieu Vergne · Answer 11 · Fri Jul 24 2020 22:21:17 GMT+0800 (China Standard Time)

when you run the class through Eclipse, it looks for it in the experimental project

Antonio J. Nebro · Answer 12 · Fri Jul 24 2020 22:43:02 GMT+0800 (China Standard Time)

The contents of the resources folder was not too long ago located in the resources folders of each Maven sub-project. I was really frightened when I saw that the jMetal project size was more than 100GB.

I decided to move them to the project root directory.

Antonio J. Nebro · Answer 13 · Fri Jul 24 2020 22:45:41 GMT+0800 (China Standard Time)

Related to this, I created a GitHub project to store all this information: https://github.com/jMetal/Mooses

I even thought about removing the resources folder from the jMetal project and to include a link to this project.

Matthieu Vergne · Answer 14 · Sat Jul 25 2020 00:56:38 GMT+0800 (China Standard Time)

I just created #412 which abstracts all the objectives methods:

all the legacy methods are deprecated to be replaced by objectives(): https://github.com/jMetal/jMetal/pull/412/files#diff-64b1ca7fbc2e5e7f31ba68f5db784218
getNumberOfObjectives() > objectives().size()
getObjective(i) > objectives().get(i)
setObjective(i, v) > objectives().set(i, v)
getObjectives() > objectivesArray()

All the replacements I did are naive. For instance, I didn't factor several calls to objectives() even if it was obviously relevant. I just replaced everything in a quick and dirty way. So the performance loss is greater than what it should be with a proper refactoring. I replaced ALL the calls to the legacy methods by the new ones, everywhere, just to be sure I didn't miss one.

The last case is a replacement by a method which always fail, so that I can run the example, see where it explodes, and refactor just this part by using objectives(). I only need it there:
https://github.com/jMetal/jMetal/pull/412/files#diff-4ae1ed0286a35111626070d6400355af

-      System.arraycopy(solutionSet.get(i).getObjectives(), 0, population[i], 0, m);
+      List<Double> objectives = solutionSet.get(i).objectives();
+      for(int j = 0 ; j < m ; j++) {
+		population[i][j] = objectives.get(j);
+      }

I didn't test anything. I just ran the example to ensure it executes until the end. I let you confirm it does what you want.

I did it the dirty way, so only this example should work. All the rest should explode. The point was to make it work for this example for a quick comparison.

I didn't find any mention of constraints there, so I didn't bother refactoring them. Please tell me if I need to.

Matthieu Vergne · Answer 15 · Sat Jul 25 2020 01:06:57 GMT+0800 (China Standard Time)

With 10 executions on my machine, the array version shows 941ms in average, while the list version is 1078ms in average. So 15% of loss. I will see if I can do better.

Matthieu Vergne · Answer 16 · Sat Jul 25 2020 01:16:42 GMT+0800 (China Standard Time)

And here we are:

-      System.arraycopy(solutionSet.get(i).getObjectives(), 0, population[i], 0, m);
+      Iterator<Double> iterator = solutionSet.get(i).objectives().iterator();
+      for(int j = 0 ; j < m ; j++) {
+		population[i][j] = iterator.next();
+      }

Instead of getting a specific index everytime, which imposes a check on the index you provide, just use an iterator to get them as they come. Average time: 946ms, so roughly the same thing than your current version.

Matthieu Vergne · Answer 17 · Sat Jul 25 2020 01:19:09 GMT+0800 (China Standard Time)

And as a remainder, this is just one replacement. All the others are naive replacements with no proper factoring. So it should be considered a worse case.

Please run your own comparison (with various parameters) to see if there is an impact (and confirm the code is correct).

Antonio J. Nebro · Answer 18 · Mon Jul 27 2020 21:52:04 GMT+0800 (China Standard Time)

Some comments.

I like this approach (for objectives, variables, and constraints):

Refactor the objectives

Add Solution.objectives() to return the list of objectives

Replace all the calls to Solution.getObjective() by objectives().get()

Replace all the calls to Solution.setObjective() by objectives().set()

Replace all the calls to Solution.getNumberOfObjectives() by objectives().size()

Deprecate all the legacy objective methods

Constraints are as important as objectives, so they should be managed in the same way.

About computing time overheads:

With 10 executions on my machine, the array version shows 941ms in average, while the list version is 1078ms in average. So 15% of loss.

This would be acceptable to me (what I don't want is to go from 1 second to 5 or 10 seconds), but:

Instead of getting a specific index everytime, which imposes a check on the index you provide, just use an iterator to get them as they come. Average time: 946ms.

is still better, as there is no differences in practical terms.

So ... let's go ahead :).

Matthieu Vergne · Answer 19 · Tue Jul 28 2020 01:18:23 GMT+0800 (China Standard Time)

\{^o^}/

Then you will see pull request coming.

Antonio J. Nebro · Answer 20 · Tue Jul 28 2020 15:42:39 GMT+0800 (China Standard Time)

A comment about the SolutionAttribute interface. I have just deprecated it. Some of its implementations were already deprecated and I would prefer to adopt a different approach. As is, we are promoting adding classes like these:

public class Fitness<S extends Solution<?>> extends GenericSolutionAttribute<S, Double> {
}

public class DistanceToSolutionListAttribute extends GenericSolutionAttribute<Solution<?>,Double> {
}

Matthieu Vergne · Answer 21 · Wed Jul 29 2020 01:00:07 GMT+0800 (China Standard Time)

Indeed, it seems to me that the "attribute" notion is split between the attributes map of the solution and the attribute classes implementing this interface. Having such a split of the same responsibility (managing a specific attribute) introduces confusion, so I am not a fan either.

Managing attributes out of the solution seems to me not recommended, since it would need a dedicated map to relate an attribute to its solution. Such a map can be costly to read with thousands of solutions, and since we create and discard solution on the fly, we would need to properly manage the cleaning of the map. There is some facilities like WeakHashMap but it would clearly look like a hack rather than a proper design.

Rather, I would focus on the attributes map of the solution.

Now, you can go with it in several ways:

using a simple map like now, easy but since it can store anything the value must be of type Object, which requires to cast the data properly upon reading it. Annoying, but beside the cast I don't see any disadvantage with it.
using an heterogeneous map, the idea being that the key stores the type you need through a generics such that when you get the value it can be automatically casted with this generics.

I like the second approach because it allows you to factor the casts, but also to show better the intentions behind the code. For example, you can imagine something like that:

Attribute<Double> distanceAttribute = Attribute.create();//For type safety: Attribute.create(Double.class);
solution.setAttribute(distanceAttribute, 123.0);
Double distance = solution.getAttribute(distanceAttribute);

or with the future design:

Attribute<Double> distanceAttribute = Attribute.create();//For type safety: Attribute.create(Double.class);
solution.attributes().put(distanceAttribute, 123.0);
Double distance = solution.attributes().get(distanceAttribute);

The attributes map would not be a standard map. It would be for example an implementation of this:

interface AttributeMap {
  <T> void put(Attribute<T> attribute, T value);
  <T> T get(Attribute<T> attribute);

  static class Attribute<T> {
    // Some private code relevant to the attribute

    static public <T> Attribute<T> create() {/*create an attribute*/}
    static public <T> Attribute<T> create(Class<T> c) {/*create a type-safe attribute*/}
  }
}

Something like that.
What do you think?

Matthieu Vergne · Answer 22 · Wed Jul 29 2020 01:09:44 GMT+0800 (China Standard Time)

As a side note, type-safety only allows a stronger type check. Generics are enough to use the right type when writing code. Type-safety allows you to enforce the check at runtime, which is helpful if you do nasty stuff with generics or if you expect external stuff to come with wrong types (you can fail fast with the type check instead of waiting that something explodes somewhere).

I personally don't expect jMetal to do nasty things with generics (and I fix them when I find something like that). And since we only consume objects that are generated during our runs, it seems odd to expect to read data of the wrong type. Since it is an experimental lib, we should not care about hacking attacks either. So I don't see any reason to enforce type safety so far. We can go with the simple one and later introduce type safety when we need it.

Antonio J. Nebro · Answer 23 · Wed Jul 29 2020 15:28:43 GMT+0800 (China Standard Time)

using a simple map like now, easy but since it can store anything the value must be of type Object, which requires to cast the data properly upon reading it. Annoying, but beside the cast I don't see any disadvantage with it.

Let's take this approach. It is simple as, as you mention, using casting is not a big problem.

Matthieu Vergne · Answer 24 · Wed Jul 29 2020 16:03:52 GMT+0800 (China Standard Time)

OK. I guess that if we tend to use more and more attributes, the second approach will become more interesting, but we can do it later.

Antonio J. Nebro · Answer 25 · Fri Feb 19 2021 19:24:11 GMT+0800 (China Standard Time)

I have just refactorized the Solution interface:

public interface Solution<T> extends Serializable {
  List<T> variables() ;
  double[] objectives() ;
  double[] constraints() ;
  Map<Object,Object> attributes() ;

  Solution<T> copy() ;
}

This is a first step in the idea of replacing the objectives and constraints as lists instead of arrays. I'm considering the next step, thinking in replacing the Solution interface by a class similar to this:

public class Solution<T> {
  private final List<T> variables;
  private final List<Double> objectives;
  private final List<Double> constraints;
  private final Map<Object, Object> attributes;

  public List<T> variables() {
    return variables;
  }

  public List<Double> objectives() {
    return objectives;
  }

  public List<Double> constraints() {
    return constraints;
  }

  public Map<Object, Object> attributes() {
    return attributes;
  }

  /**
   * Constructor
   *
   * @param numberOfVariables
   * @param numberOfObjectives
   * @param numberOfConstraints
   */
  public Solution(int numberOfVariables, int numberOfObjectives, int numberOfConstraints) {
    attributes = new HashMap<>();

    variables = new ArrayList<>(numberOfVariables);
    for (int i = 0; i < numberOfVariables; i++) {
      variables.add(i, null);
    }

    objectives = new ArrayList<>(numberOfObjectives);
    for (int i = 0; i < numberOfObjectives; i++) {
      objectives.add(0.0);
    }

    constraints = new ArrayList<>(numberOfConstraints);
    for (int i = 0; i < numberOfConstraints; i++) {
      constraints.add(0.0);
    }
  }

  /**
   * Constructor. The list of variables are shallow copied.
   *
   * @param variables
   * @param objectives
   * @param constraints
   * @param attributes
   */
  public Solution(
      List<T> variables,
      List<Double> objectives,
      List<Double> constraints,
      Map<Object, Object> attributes) {
    this.variables = new ArrayList<>(variables);
    this.objectives = new ArrayList<>(objectives);
    this.constraints = new ArrayList<>(constraints);
    this.attributes = new HashMap<>(attributes);
  }
}

Then, a SolutionFactory could provide methods to create and copy solutions:

public class SolutionFactory {
     public static Solution<Double> createDoubleSolution(
      List<Bounds<Double>> bounds, int numberOfObjectives, int numberOfConstraints) {
    var newSolution = new Solution<Double>(bounds.size(), numberOfObjectives, numberOfConstraints);
    newSolution.attributes().put(attributes.BOUNDS, bounds);
    IntStream.range(0, bounds.size())
        .forEach(
            i ->
                newSolution
                    .variables()
                    .set(
                        i,
                        SingletonRandomGenerator.getInstance()
                            .nextDouble(
                                bounds.get(i).getLowerBound(), bounds.get(i).getUpperBound())));

    return newSolution;
  }

  public static Solution<Double> createFromDoubleSolution(Solution<Double> solution) {
    return new Solution<>(solution.variables(), solution.objectives(), solution.constraints(),
        solution.attributes());
  }
 ...

This way, there would no need of DoubleSolution, IntegerSolution and so on, but I have to check that if all solution implementations could be adapted in this way.

Matthieu Vergne · Answer 26 · Fri Feb 19 2021 23:15:45 GMT+0800 (China Standard Time)

Several comments here.

Keep the interface

I strongly recommend to keep the interface as such and create a separate class for the implementation. You can create a InMemorySolution class which takes the 4 components and store them as fields. The point is to keep the interface in case we want to use a different strategy.

For example to create a proxy, which requires to store another instance rather than its own data. The class won't allow that in a clean way: you must extend it, so create the 4 fields anyway, but only use the new field of the proxy. It should be a completely new implementation, but with the same interface, so keep the interface.

Pay attention to the copy method

Copies are more easily managed at the destination rather than the source. The target class can have a constructor which takes a solution to reproduce, like all the implementations of Collection can be instantiated from another Collection. For example for ArrayList:

new ArrayList(new ArrayList())
new ArrayList(new LinkedList())
new ArrayList(new HashSet())

Copy methods like yours (and like Object.clone()) are usually hard to use properly and rarely fit the actual requirements. Among the questions hard to answer without looking at the specific context of the copy:

should you have the very same class or can you have another one? In a case where the class is used as a marker, you want to create a duplicate instance with the same class. For another context, you may prefer to use a different class, more optimized for the intended use.
how deep the copy should be? In context A you may go with reusing the same lists and save some computing. In context B you need to create new instances to modify them without impacting the original. In context C you need to do the same with the objects in these lists because you may impact them as well.

You just cannot know in advance (and manage all the cases) with a simple copy method in the original class.

For instance, you may have Solution implementations like:

InMemorySolution which stores the collections directly
FileSolution which reads its content from a file
CachedSolution which delegates the first call to another Solution and then return the same value (combines well with the previous one)
ProxySolution which delegates the call to another Solution instance, whatever it is
different decorator classes to enrich the solutions with other behaviours
etc.

In your generic algorithm, you will just know you deal with Solution instances, whatever they are implemented with.
But for your algorithm to work efficiently, you may need for instance to minimize recomputation, so you need proper snapshots.
Independently of the original class, you can decide to use a CachedSolution to provide this guarantee:

new CachedSolution(original)

You cannot just ask the original instance to copy itself: if it is a FileSolution, then it will create a new FileSolution on the same file, which still reads from the file every time you call it.

If you reaaaaally want a copy() method, at least to maintain the current code, you can have a default implementation in the interface.
Something like that:

public interface Solution<T> extends Serializable {
  List<T> variables() ;
  List<Double> objectives() ;
  List<Double> constraints() ;
  Map<Object,Object> attributes() ;

  default Solution<T> copy() {
    return new InMemorySolution<T>(
      new ArrayList(variables()),
      new ArrayList(objectives()),
      new ArrayList(constraints()),
      new HashMap(attributes()),
    );
  }
}

This way you guarantee that the intent of the copy is to provide a "somehow optimized" "somehow deep copy" of the solution. I would deprecate it though to favour better practices.

Pay attention to multiple constructors

If at some point you need to create a new way to instantiate your class but which needs the same types of arguments, you may struggle in creating a new constructor.

Rather than multiplying the constructors, prefer to use static factory methods, so this:

public InMemorySolution(int numberOfVariables, int numberOfObjectives, int numberOfConstraints)

becomes this:

public static createZeroSolutionWithSizes(int numberOfVariables, int numberOfObjectives, int numberOfConstraints)

If at some point you need to create an instance based on 3 ints which represent something different, you can just change the name, which you cannot do with the constructor.

The general idea is to keep a single constructor focused on the basics of the class (feeding the fields) and create factory methods for the variants. These factory methods are the usual candidates to be moved to a factory class.

The drawback of that is that you cannot exploit this factory method if you extends the class. But extensions are often overused and badly used. There is a general recommendation to favour composition over inheritance (nice summary of pros & cons), and really the need of inheritance can be quite the exception in practice. The only case I see where composition becomes a problem is when you stack a lot of layers and it impacts your performance. But in this case, it is usually solved with a reduction of the stack by using a class which "snapshots" your object, for example:

new InMemorySolution<>(
  new ArrayList(original.variables()),
  new ArrayList(original.objectives()),
  new ArrayList(original.constraints()),
  new HashMap(original.attributes()),
)

Do you really need a factory class?

Not a hard point here: I give my opinion but really you can do it if you feel better that way. If it is a bad move, we can easily fix that later with simple refactoring & deprecation.

As mentioned in the copy point, you usually need to ensure some requirements that imposes you to know the kind of implementation you deal with, independently of the class we provide you as input.

As mentioned in the constructor point, for each class implementing the interface, you can define factory methods specific to this class.

The first advantage of a factory class is to not have to know the actual implementation. So it is not really the kind of thing to use in the copy case for instance, and since the original is given by the problem, which will use its own too depending on the context, I don't really see a need for a factory class.

Another use is just to centralize the various factory methods from the various classes, but it just makes it redundant, since you expose the same kind of information and do exactly the same (usually just calling the specific factory method). And since in a specific algorithm, you tend to focus on one or some ways to instantiate, it seems overkill to create a class which provides all of them.

The only reason I would see to have a factory class here would be to avoid creating factory methods in each implementation. I can understand that perspective, although I am not convinced it brings more advantages than disadvantages. Once again, instantiating the factory imports plenty of dependencies, not just the ones needed for the algorithm. It breaks the "I" in SOLID (https://en.wikipedia.org/wiki/Interface_segregation_principle).

Antonio J. Nebro · Answer 27 · Sat Feb 20 2021 18:10:04 GMT+0800 (China Standard Time)

Let's take a look to the NullCrossover class:

public class NullCrossover<S extends Solution<?>>
    implements CrossoverOperator<S> {

  /** Execute() method */
  @Override public List<S> execute(List<S> source) {
    Check.notNull(source);
    Check.that(source.size() == 2, "There must be two parents instead of " + source.size());

    List<S> list = new ArrayList<>() ;
    list.add((S) source.get(0).copy()) ;
    list.add((S) source.get(1).copy()) ;

    return list ;
  }
}

How could we reimplement it assuming that the Solution interface does not have the copy() method?

Matthieu Vergne · Answer 28 · Sat Feb 20 2021 21:53:41 GMT+0800 (China Standard Time)

Blocking point

What is blocking you here is the choice of generics:

public class NullCrossover<S extends Solution<?>>

It means that you require to get a given type of Solution for S and, wherever you use it, it must have an equivalent type (same class or extension).

This constraint is used to synchronize the input and output lists:

List<S> execute(List<S> source)

This design was relevant before because the type of variable was related to the type of solution (DoubleSolution, IntegerSolution, etc.).

Even my suggestion of default implementation for the generic copy() does not work because of that, since the copy must return the very same class. You can implement it (so it would be there to help in the progressive refactoring), but all the classes must override it anyway as long as the current design holds.

Solution

Here is your stated objective:

This way, there would no need of DoubleSolution, IntegerSolution and so on, but I have to check that if all solution implementations could be adapted in this way.

If you adapt your algorithms such that you don't rely anymore on the specificities of each class of solution, you can satisfy this hypothesis and only rely on a generic Solution interface. As soon as this is true, then the type of variable reduces to the generics only (Solution, Solution, etc.). That means that it makes no sense to ask for the type of solution itself (<S extends Solution<?>>) since you only need to know the type of variable.

At that point, you can refactor your operator by replacing the solution generics by a variable generics:

public class NullCrossover<S extends Solution<?>>
...
List<S> execute(List<S> source)

becomes:

public class NullCrossover<T>
...
List<Solution<T>> execute(List<Solution<T>> source)

The remaining problem is that the current design is related to the implementation of CrossoverOperator<S>. Indeed, replacing <S extends Solution<?>> by <T> should be done everywhere in jMetal, since the current jMetal assumes everywhere that the variable information is also provided by the class, not only the generics. This is the heavy part to do. The changes can be done iteratively, it would take many steps but you could at least always compile and run the tests to ensure nothing is breaking, or in one shot with a massive change.

Once it is done, you deal with lists of solutions which can have any impleemntation, even heterogeneous ones, as long as they have the same type of variable.

Naive Refactoring

At that point, you can adapt the code of the operator. In a naive way, this:

List<S> list = new ArrayList<>() ;
list.add((S) source.get(0).copy()) ;
list.add((S) source.get(1).copy()) ;

can become this:

List<Solution<T>> list = new ArrayList<>() ;

Solution<T> first = source.get(0);
list.add(new InMemorySolution(
  new ArrayList<>(first.variables()),
  new ArrayList<>(first.objectives()),
  new ArrayList<>(first.constraints()),
  new HashMap<>(first.attributes())
)) ;

Solution<T> second = source.get(1);
list.add(new InMemorySolution(
  new ArrayList<>(second.variables()),
  new ArrayList<>(second.objectives()),
  new ArrayList<>(second.constraints()),
  new HashMap<>(second.attributes())
)) ;

This is the equivalent of the default copy() implementation I suggest in my previous post. This can of course be factored in a factory method of InMemorySolution, let's say <T> Solution<T> createFromSolution(Solution<T> solution):

List<Solution<T>> list = new ArrayList<>() ;
list.add(InMemorySolution.createFromSolution(source.get(0))) ;
list.add(InMemorySolution.createFromSolution(source.get(1))) ;

This would be the "naive" equivalent of the current code but with the new design.

How is it Naive?

The problem here is that we are in a generic operator. As I mentionned, this is in the context of the algorithm that you know best the requirements for the solution implementation. You don't care about the type of variable, which is decided by the user of the algorithm through the generics of Solution<T>, but you care about how to optimize the storage and computation of the variables and other solution data, since the algorithm is the one creating and reading masses of solutions.

But here we are in a generic operator. It does not know about the requirements of the algorithm which will use it. So it cannot reliably know which implementation to use. Here we assume that we should create a new solution with everything stored in memory, so we use InMemorySolution, but it is not always the best choice. For example, in an algorithm where we know that we aren't going to change the values of the source solutions, we can just reuse the same solution (or create a proxy if the instance must be different). This way we don't even spend time reading and creating collections to store the values.

In short, the operator should not know about how to duplicate the solutions, but it should be the one who duplicate them.

Better Refactoring

Splitting the implementation and the execution responsibilities is the whole point of interfaces. And when we deal with a single opreation, lambdas come in handy.

Semantically, NullCrossover requires a way to duplicate the solutions, but its user (the algorithm) must tell how. Technically, NullCrossover must define an interface for duplicating, and its user (the algorithm) must provide an instance of it:

public class NullCrossover<T> implements CrossoverOperator<T> {

  interface Copier<T> {
    Solution<T> copy(Solution<T> source);
  }
  
  private final Copier<T> copier;
  
  public NullCrossover(Copier<T> copier) {
    this.copier = copier;
  }

  /** Execute() method */
  @Override public List<Solution<T>> execute(List<Solution<T>> source) {
    Check.notNull(source);
    Check.that(source.size() == 2, "There must be two parents instead of " + source.size());

    List<Solution<T>> list = new ArrayList<>() ;
    list.add(copier.copy(source.get(0))) ;
    list.add(copier.copy(source.get(1))) ;

    return list ;
  }
}

The point here is that, when the algorithm instantiates the operator, it knows which are the requirements to fulfill for duplicating the solutions. So it can provide the correct Copier depending on that:

// Just reuse it
crossoverOperator = new NullCrossover(source -> source);
// Create a new instance but based on the source
crossoverOperator = new NullCrossover(source -> new ProxySolution(source));
// Duplicate it completely in memory
crossoverOperator = new NullCrossover(source -> InMemory.createFromSolution(source));
// ...

This Copier interface is something that you may reuse heavily in many operators, so you can have it separately.

Summary

Refactoring this class thus come with the following requirements:

Refactor everywhere <S extends Solution<?>> to require only the type of variable <T>.
Make the algorithm provide a Copier<T> to the operator to let it know how to duplicate the solutions.

Nothing else should change.

Antonio J. Nebro · Answer 29 · Mon Feb 22 2021 20:34:02 GMT+0800 (China Standard Time)

I like the idea of the Copier interface (I would rename it to SolutionCopier).

The point then is that a factory should be needed to provide static methods for creating and copying different types of solutions.

We must take into account that this way of creating solutions

new InMemorySolution(
  new ArrayList<>(first.variables()),
  new ArrayList<>(first.objectives()),
  new ArrayList<>(first.constraints()),
  new HashMap<>(first.attributes())
)

works for integer and double solutions, but not for other solutions. For example, deep copy is needed in the case of the variables of binary solutions.

Matthieu Vergne · Answer 30 · Mon Feb 22 2021 23:30:57 GMT+0800 (China Standard Time)

The point is that, at some point, you should know which type you have. If something needs to be defined when you have this information, it should be defined at this level. For example, from the point of view of the operator, the algorithm must provide the right stuff. But maybe that from the point of view of the algorithm, this is something to be provided by the user. Thus the algorithm get it from its constructor, maybe adapt it, and provides it to the operator. Something like that.

Matthieu Vergne · Answer 31 · Mon Feb 22 2021 23:33:57 GMT+0800 (China Standard Time)

This problem is also related to the broad discussion about final values. Here, I would see good reasons to use final values in solutions, so you actually don't need to copy anything. But that is again another broad perspective to consider.

Antonio J. Nebro · Answer 32 · Mon Feb 22 2021 23:54:27 GMT+0800 (China Standard Time)

I'm looking at all the classes implementing solutions (DoubleSolution, BinarySolution, etc.) and most of them seem to accommodate well to be implemented in terms of Solution<anything>.

However, in the case of IntegerPermutationSolution, the direct refactoring should be as Solution<Integer>, so we would lose information. I'm exploring the idea of defining an IntegerPermutation class, to have then a Solution<IntegerPermutation>. However, this would lead to a solution whose variables are lists of integer permutations, while in the ``IntegerPermutationSolution` the variables are lists of integers.

Matthieu Vergne · Answer 33 · Tue Feb 23 2021 00:41:37 GMT+0800 (China Standard Time)

I see it is used for the TSP.
I assume each integer represents a city, so you don't want to arbitrarily generate an ID, but ensures you find all IDs exactly once.
You rely on permutation operations to keep that property.

I would see two road for that:

you keep the dependency between the integers: actually, you don't deal with integers, but with cities, so you should be able to replace the integers by custom objects and have the same result (representing them as integers is a matter of preference, not a need). I would tend to keep a Solution<City> (Solution<Integer> if we want to represent them as integers) and use operators that take in charge the permutation.
you make them independent: each integer in the list represent the city to take in the remaining ones, so for N cities, the first value can be any in [0;N-1], the second can be any in [0;N-2], etc. This way, forget the permutations, you just want to generate a value within the constraints. Each variable becomes independent of the others.

We should be able to manage both, since these are different algorithms. But I think the difference comes from the combination between the representation and the operators. So Solution<Integer> should work as well as Solution<Object> if we remain in the case of permutations.

Solution refactor

Generalyze from the many XxxSolution extensions to the unique Solution<Xxx>

Allow the use of custom solution generators

Use of a generic Solution implementation

Replace primitive arrays by abstractions

Simplify Solution uses

New major version

Keep the interface

Pay attention to the copy method

Pay attention to multiple constructors

Do you really need a factory class?

Blocking point

Solution

Naive Refactoring

How is it Naive?

Better Refactoring

Summary

Generalyze from the many `XxxSolution` extensions to the unique `Solution<Xxx>`

Use of a generic `Solution` implementation

Simplify `Solution` uses