d3sm0/skgd

call:

what kind of non-stationarity can it handle?
do you think can it be made off-policy? if so what kind of IS?
how can i learn the features in a scalable fashion? (something like proximal operator?) (even in the non linear case the features are not learned or you meant updating all parameter vector?
where does it break in the stochastic setting?
what kind of uncertainty is P tracking? does it incorporate uncertainty about the future? or only about present estimates?

Outcome:

Status:

check correlation between covariance at different lags and change process of the mass of the pole
different weighting works but not as good as KTD
instability of target update makes KTD sad
how to use the variance of estimate of the KTD for the policy?
- IS doesn't seem to work
- soft-bellman doesn't seem to work

d3sm0 / skgd