A few missing points in the documentation

Question

A few missing points in the documentation

mancellin opened this issue 5 years ago · comments

Matthieu Ancellin commented 5 years ago

Dear @MrUrq,

Thank you for this useful package.

I noticed a few small things that would be nice to add to the documentation:

randomLHC could be mentioned
LHCoptim returns a tuple. The first element seems to be the optimized samples, but I'm not sure how to interpret the second.
I'm guessing gens, popsize, ntour and ptour are the parameters of the generic algorithm. Do you have any advice on how to chose them? The examples do not give an example value of gens.

I can add the first one myself, but I would appreciate your help for the other two.

Magnus Urquhart · Answer 1 · Mon Dec 16 2019 18:36:01 GMT+0800 (China Standard Time)

Thank you @mancellin for the suggestions.

I've update the documentation according to your suggestions. Please have a look at the dev documentation, https://mrurq.github.io/LatinHypercubeSampling.jl/dev/.

randomLHC has been mentioned https://mrurq.github.io/LatinHypercubeSampling.jl/dev/man/lhcoptim/
LHCoptim The first element of the tuple is as you said the sample plan and the second is the fitness of the plan (which is maximized). This is now reflected in the online documentation as well as the julia function documentation.
You are correct in that these are parameters of the GA. These were tuned manually for a few problems I tested it on, this is also reflected in the documentation now. I haven't included an example value of gens as it is problem specific. You can use the inplace variant, LHCoptim! and check the fitness value as you optimize in steps.

Please let me know if it is clear, in that case we can close the issue and publish a release!

Magnus Urquhart · Answer 2 · Tue Dec 17 2019 16:37:43 GMT+0800 (China Standard Time)

I've also added the function scaleLHC to make it easier to use the sampling plan with existing functions. An example can be see in the README and documentation.

Matthieu Ancellin · Answer 3 · Tue Dec 17 2019 17:29:34 GMT+0800 (China Standard Time)

Thank you for these enhancements!

It is still a bit difficult for me to judge how good is the current plan based on the fitness value, especially since it seems to change a lot with with n and d.
Would it be possible to normalize the fitness value? For instance having 0% as the worse and 100% as the best theoretical fitness for the given n and d. This way, the user can more easily understand how much room for improvement there is.

Magnus Urquhart · Answer 4 · Tue Dec 17 2019 18:10:09 GMT+0800 (China Standard Time)

Yes I agree that it is not easy to judge fitness and it does vary significantly. Unfortunately to know the theoretical best fitness requires you to test all possible combinations. For small plans this is not a problem, but for larger plans it is prohibitively expensive which is why this package uses a GA to optimize it.

I think that for most use cases it is important to eliminate significant clustering of samples. This should happen happen fairly early and one can look for a knee in the fitness value. So for the example below it might be sufficient with ~2500 samples.

using Plots, LatinHypercubeSampling
LHC, fitness = LHCoptim(120,2,10000)
plot(fitness; yaxis=:log,ylabel=:Fitness,xlabel=:Iterations)

Matthieu Ancellin · Answer 5 · Tue Dec 17 2019 18:29:28 GMT+0800 (China Standard Time)

Is there a way to use as a reference the maximum distance between n points in a d-dimensional cube without the "latin" constraint? The optimization would never reach 100%, but at least some effects of the dimension might be factored out.

Otherwise, yes, the slope of the fitness curve might be the best way to see the convergence

Magnus Urquhart · Answer 6 · Tue Dec 17 2019 18:39:57 GMT+0800 (China Standard Time)

For some this would be possible, for example if the number of points can be split easily as a n-dimensional cube.

julia> plan, _ = LHCoptim(9,2,1000);
julia> plan2 = [1 1; 1 5; 1 9; 5 1; 5 5; 5 9; 9 1; 9 5; 9 9];
julia> AudzeEglaisObjective(plan)
0.5156563535395958
julia> AudzeEglaisObjective(plan2)
0.8268733850129198

However, I'm not sure how to handle this when it is not possible to form a simple grid, say 10 samples in 2 dimensions. This metric would also be affected by number of samples and dimensions.

Magnus Urquhart · Answer 7 · Wed Dec 18 2019 00:08:07 GMT+0800 (China Standard Time)

There is a new version 1.3.0 released now containing the doc changes and the plan scaling function.

I will close this issue now but please consider making a new one if you have suggestions which improve how the fitness of a plan can be judged.

Matthieu Ancellin · Answer 8 · Wed Dec 18 2019 02:32:43 GMT+0800 (China Standard Time)

Ok, thank you for your help!