Gambler's Problem: 0 Stake Allowed?

Question

Gambler's Problem: 0 Stake Allowed?

mparigi opened this issue 4 years ago · comments

In the solution, it says "Your minimum bet is 1". However, the specification says "The actions are stakes, a ∈ {0, 1, . . . , min(s, 100 − s)}", implying a bet of 0 is fine. Which is correct?

Lucas Basquerotto · Answer 1 · Fri Jul 19 2024 03:33:51 GMT+0800 (China Standard Time)

A bit late to the discussion, but in this problem it's actually not advisable to use 0 as stake because it's an undiscounted MDP.

A stake of 0 gives a reward of 0, which might be ok in a discounted MDP, because the return decreases with time steps, but not in an undiscounted MDP (gamma = 1), especially because the reward is 0 and it ends in the same state, because it will end up considering 0 as a best action (it ends in the same state, and there's no cost because the reward is 0 and is undiscounted, so it has the same value).

If there's a negative reward for the action, or it was a discounted case, it would be ok.

To give it a bit of perspective, you can consider the following cases for a capital of 99 (only stakes of 0 and 1 are allowed):

You define the stake as 1 (either win or ends with a capital of 98): this has some value (based on the return, that is defined based on the probabilities of each case (ph and 1 - ph) and rewards, and the values of the next states, which is not relevant here).
You define a stake of 0, ends up with 99 of capital (the same as before), repeat betting with a stake of 0 one million times, ends up still with 99, then you define a stake of 1: this has the same value as the previous case (because it's an undiscounted MDP and a stake of 0 gives a reward of 0 and ends in the same state, creating a loop).

That said, you might be able to use 0 as an action if you always consider the highest stake for the policy (the stake 0 might be the best, but not the unique best action: there will be at least one more stake that is the best too). I haven't tried doing this, tough.

You can see differences in policies for the same values, considering the smallest or highest possible stake, at the other issue regarding the same exercise: #172