PWhiddy / PokemonRedExperiments

Playing Pokemon Red with Reinforcement Learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

reward farm

creeperita09 opened this issue · comments

So, the rewards you used are mostly ok but i have noticed that the ai spends a lot of time in the pokedex and since the pokedex as a lot of lines and they are different enough to trigger the exploration reward. one way to fix this would be to look at the memory and see if the pokedex is open and if it is it cuts in half the exploration reward so it will not spend as much time there, this would also be useful for other situations where menus cover all of the screen and the ai gets more rewards than it should.

i don-t know if this was previously pointed out but i-ll do anyways

commented

Maybe it will be useful to combine the screen k-NN exploration reward with the coordinate-based exploration reward, so the latter doesn't trigger on menu navigation and therefore automatically disincentivizes abuse of such mechanics. Currently, the software uses an either-or boolean switch to decide on which exploration reward is used. In the future, this could be make into two separate options with different reward weights. I'm working on this as part of a reorganization of the RedGymEnv class. In a few days I'll create a pull request that includes this feature.

great! this project as such a great comunity!

The menu(extra buttons) was disabled because of this. Sadly after gym three (Lt.Serge) you need to teach a Pokemon 'cut', never mind surf and strength. Flash for the AI might not be needed, and fly just lets you fast travel. As someone that's been training their AI with the menu enabled for over 400M steps (I think I'm close to 500M), it does get better and doesn't spam as much. It also leaves the door open for the AI to learn how to use items outside of battle. But I'm doing this with different settings from the baseline. This way I can see and when the time comes give the outputs on what we can expect.

The issue is getting the AI to also now use the Pokemart without holding its hand. The point of the training is to let the AI learn how to play the game on its own, but give it a 'pat on the head' when it does something.

Right now a few people are already working on things to improve the code so we can get past Mt.Moon more consistently and beating Misty, and this is for using sates with each starter. Once this is done, code is going to be worked on more and cleaned up. But with every change, we have to start the training models over from ground zero to make sure that the AI can still learn/play the game. This way things can be repeated and confirmed.

Is it possible to incorporate the player's coordinate position instead of pixel-based movement?

Is it possible to incorporate the player's coordinate position instead of pixel-based movement?
there is an option for this if you turn off the screen explore, it will then just track location data