yunshengtian / DGEMO

[NeurIPS 2020] Diversity-Guided Efficient Multi-Objective Optimization With Batch Evaluations

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Supplying initial samples to custom problem

jkstartt opened this issue · comments

Hi Yunsheng, great code we've been using it for awhile now! When using the command line version of autooed to run a custom problem, how do we supply initial samples that we've already calculated at the start of the optimization? Right now it looks as if the only option is to let the code generate its own set of initial samples via LHS, but we would like to supply our own pre-computed set of design/perf. variables if possible. Our objective functions are fairly expensive, and so we would like to be use the samples we already have to jumpstart the optimization, while also avoiding duplicated calculations.

I've dug through almost every file I can find looking for some examples or reference on how to supply an initial dataset, but I haven't found anything. I've also gone through the pymoo documentation quite a bit and have not found much help there either, especially considering the version differences between the current pymoo and the version used in autooed. Looking at your code though, I think I could just modify it on my own, changing the 'generate_initial_samples' function but if there's an already built in solution I would rather go that route before modifying anything. Thanks in advance for any help

Just to follow up on this, I've temporarily around gotten around the problem by commenting out the generate_initial_samples() command in common.py and replaced it with a few commands to load my already existing initial samples into the X_init, Y_init numpy arrays.

I guess a more permanent solution would be to add an argument to the command line to tell common.py whether to generate new LHS samples or read them from files, the locations of which I guess would also be designated with new command line args.

You are right and this is a great suggestion! But I guess that would also limit the format of the initial datasets. I am not sure which one is easier - letting people change generate_initial_samples() or changing their format of the data. Though I guess a common format would be two separate .csv files for X and Y (no headers or a single row of headers) but in practice it may differ. What do you think?

Well from my own experience, I'd wager that anyone capable of using this command line version of the code is probably also going to be able to format their data. Data formatting is relatively easy as long as the requirements are clearly explained somewhere. So I'd think that would your simplest solution.

Where I could see complications, however, would be if a user intended to use some other kind of design variable type, ie. categorical, binary, etc.. Though, as those don't seem to implemented at the moment, I'm not sure thats all that much of a concern, unless of course you intend to implement those features sometime in the future.