MrUrq / LatinHypercubeSampling.jl

Julia package for the creation of optimised Latin Hypercube Sampling Plans

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A few missing points in the documentation

mancellin opened this issue · comments

Dear @MrUrq,

Thank you for this useful package.

I noticed a few small things that would be nice to add to the documentation:

  • randomLHC could be mentioned
  • LHCoptim returns a tuple. The first element seems to be the optimized samples, but I'm not sure how to interpret the second.
  • I'm guessing gens, popsize, ntour and ptour are the parameters of the generic algorithm. Do you have any advice on how to chose them? The examples do not give an example value of gens.

I can add the first one myself, but I would appreciate your help for the other two.

Thank you @mancellin for the suggestions.

I've update the documentation according to your suggestions. Please have a look at the dev documentation, https://mrurq.github.io/LatinHypercubeSampling.jl/dev/.

  • randomLHC has been mentioned https://mrurq.github.io/LatinHypercubeSampling.jl/dev/man/lhcoptim/
  • LHCoptim The first element of the tuple is as you said the sample plan and the second is the fitness of the plan (which is maximized). This is now reflected in the online documentation as well as the julia function documentation.
  • You are correct in that these are parameters of the GA. These were tuned manually for a few problems I tested it on, this is also reflected in the documentation now. I haven't included an example value of gens as it is problem specific. You can use the inplace variant, LHCoptim! and check the fitness value as you optimize in steps.

Please let me know if it is clear, in that case we can close the issue and publish a release!

I've also added the function scaleLHC to make it easier to use the sampling plan with existing functions. An example can be see in the README and documentation.

Thank you for these enhancements!

It is still a bit difficult for me to judge how good is the current plan based on the fitness value, especially since it seems to change a lot with with n and d.
Would it be possible to normalize the fitness value? For instance having 0% as the worse and 100% as the best theoretical fitness for the given n and d. This way, the user can more easily understand how much room for improvement there is.

Yes I agree that it is not easy to judge fitness and it does vary significantly. Unfortunately to know the theoretical best fitness requires you to test all possible combinations. For small plans this is not a problem, but for larger plans it is prohibitively expensive which is why this package uses a GA to optimize it.

I think that for most use cases it is important to eliminate significant clustering of samples. This should happen happen fairly early and one can look for a knee in the fitness value. So for the example below it might be sufficient with ~2500 samples.

using Plots, LatinHypercubeSampling
LHC, fitness = LHCoptim(120,2,10000)
plot(fitness; yaxis=:log,ylabel=:Fitness,xlabel=:Iterations)

fn

Is there a way to use as a reference the maximum distance between n points in a d-dimensional cube without the "latin" constraint? The optimization would never reach 100%, but at least some effects of the dimension might be factored out.

Otherwise, yes, the slope of the fitness curve might be the best way to see the convergence

For some this would be possible, for example if the number of points can be split easily as a n-dimensional cube.

julia> plan, _ = LHCoptim(9,2,1000);
julia> plan2 = [1 1; 1 5; 1 9; 5 1; 5 5; 5 9; 9 1; 9 5; 9 9];
julia> AudzeEglaisObjective(plan)
0.5156563535395958
julia> AudzeEglaisObjective(plan2)
0.8268733850129198

However, I'm not sure how to handle this when it is not possible to form a simple grid, say 10 samples in 2 dimensions. This metric would also be affected by number of samples and dimensions.

There is a new version 1.3.0 released now containing the doc changes and the plan scaling function.

I will close this issue now but please consider making a new one if you have suggestions which improve how the fitness of a plan can be judged.

Ok, thank you for your help!