No details provided now.
-
Always
assert
when you are not sure; -
Range Check Assertion:
$(a,b), [a,b], [a,b), (a,b]$ ; -
When initialize array:
argmin
withnp.full
,argmax
withnp.zeros
.
- try
fastmath
option in@njit
(no performance difference) - commit a
main_one_shot
function, with reduced and formatted outputmove all related traces under same folderrecording somehow per-stagecompletemain_one_shot
function
- add static policy replacement function
- test policy replacement (with 50 submission)
running on vps and get 50 results
- semi-analytical average cost calculation
one-step/n-step policy improvement for any stage(n < STAGE_EVAL)- Possible: enhance one evaluation with multi-step policy improvement's return?
- finish two simple analysis
- reinforcement learning
- optimized baseline policy (aware of start)
- touch a
plot-traces2.py
with new plot functioncalculate the record data and display