Question regarding architecture
GRIGORR opened this issue · comments
Hi. Small question regarding the architecture. Do I understand correctly that at each iteration the candidate networks are trained separately, then each of them is added to Adanet, weighting the logits is learned and then the best one is chosen? Thanks in advance.
I understan it like that too.
@GRIGORR That is correct: all the candidate subnetworks (and their associated ensemble) are trained in parallel in the same TensorFlow graph. At the end of each iteration, the best subnetwork is chosen based on its performance within the ensemble.
@GRIGORR That is correct: all the candidate subnetworks (and their associated ensemble) are trained in parallel in the same TensorFlow graph. At the end of each iteration, the best subnetwork is chosen based on its performance within the ensemble.
From what I gather from the 0.8.0 docs it sounds to me like one AdaNet iteration actually selects a complete Ensemble each iteration and discards the others. Could it be said that each Ensemble from the candidate ensemble set differs from all the other candidate Ensembles in the subnetwork that has been added to it in the current iteration?