Trying to reproduce the results but failing, unfortunately.

Question

Trying to reproduce the results but failing, unfortunately.

Shikamaru5 opened this issue 2 years ago · comments

I have been working for a few weeks trying to adapt your technique to a RL algorithm I've been developing. Nothing fancy, I was already in the process of testing techniques on a simple algorithm trying to do exactly what your paper claims, to speed up training time. I've examined the paper inside and out, and the code and examined REDQ and its code, and the DROQ paper. So I believe that I understand what's going on, but I must be doing something wrong because I only see a marginal increase in performance w/ my algorithm.
Trouble is I see a huge increase in update time and it's leading me to wonder if I've implemented this correctly. In fact I had to push the model updates outside of step to make it feasible to train the model. The model attempts to learn to play the old Nintendo Entertainment System or NES game, bubble bobble. I find it takes about half an hour for it to get marginally better and based on my experiments I actually have no clue how long it takes to become adept @ the game.
Idk if I should've tried emailing this to you, but I have a github repository setup for this if you'd be able to look it over and give me some pointers, I'd really appreciate it. It also explains in far more detail what I've done to make this work @ the level it currently is, which is the best I've been able to make it. If not that's alright, thank you for taking the time to read this, the repo is called, Shikamaru5/LNDQ-bubble_bot and I have made sure to try and include the credit in the repo so that others understand your work is present in it, as well as others.