Gambler's Problem: 0 Stake Allowed?
mparigi opened this issue · comments
In the solution, it says "Your minimum bet is 1". However, the specification says "The actions are stakes, a ∈ {0, 1, . . . , min(s, 100 − s)}", implying a bet of 0 is fine. Which is correct?
A bit late to the discussion, but in this problem it's actually not advisable to use 0 as stake because it's an undiscounted MDP.
A stake of 0 gives a reward of 0, which might be ok in a discounted MDP, because the return decreases with time steps, but not in an undiscounted MDP (gamma = 1), especially because the reward is 0 and it ends in the same state, because it will end up considering 0 as a best action (it ends in the same state, and there's no cost because the reward is 0 and is undiscounted, so it has the same value).
If there's a negative reward for the action, or it was a discounted case, it would be ok.
To give it a bit of perspective, you can consider the following cases for a capital of 99 (only stakes of 0 and 1 are allowed):
- You define the stake as 1 (either win or ends with a capital of 98): this has some value (based on the return, that is defined based on the probabilities of each case (ph and 1 - ph) and rewards, and the values of the next states, which is not relevant here).
- You define a stake of 0, ends up with 99 of capital (the same as before), repeat betting with a stake of 0 one million times, ends up still with 99, then you define a stake of 1: this has the same value as the previous case (because it's an undiscounted MDP and a stake of 0 gives a reward of 0 and ends in the same state, creating a loop).
That said, you might be able to use 0 as an action if you always consider the highest stake for the policy (the stake 0 might be the best, but not the unique best action: there will be at least one more stake that is the best too). I haven't tried doing this, tough.
You can see differences in policies for the same values, considering the smallest or highest possible stake, at the other issue regarding the same exercise: #172