Linux |
Mac OS X |
Windows |
---|---|---|
This package provides a core interface for working with Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs). For examples, please see POMDPExamples, QuickPOMDPs, and the Gallery.
Our goal is to provide a common programming vocabulary for:
- Expressing problems as MDPs and POMDPs.
- Writing solver software.
- Running simulations efficiently.
There are several ways to define (PO)MDPs:
- Transition and observation distributions and rewards can be defined separately with explicit or implicitly sampled distributions.
- All of the dynamics can be defined in a single generative model function: (s', o, r) = G(s,a).
- Problems may be defined with probability tables.
- The QuickPOMDPs interfaces make defining simple problems easier.
POMDPs.jl integrates with other ecosystems:
- The POMDPModelTools package provides two-way integration with CommonRLInterface and therefore with the JuliaReinforcementLearning packages.
- Python can be used to define and solve MDPs and POMDPs via the QuickPOMDPs or tabular interfaces and pyjulia (Example: tiger.py).
For help, please post in GitHub Discussions tab. We welcome contributions from anyone! See CONTRIBUTING.md for information about contributing.
POMDPs.jl and associated solver packages can be installed using Julia's package manager. For example, to install POMDPs.jl and the QMDP solver package, type the following in the Julia REPL:
using Pkg; Pkg.add("POMDPs"); Pkg.add("QMDP")
Some auxiliary packages and older versions of solvers may be found in the JuliaPOMDP registry. To install this registry, see the installation instructions.
To run a simple simulation of the classic Tiger POMDP using a policy created by the QMDP solver, you can use the following code (note that POMDPs.jl is not limited to discrete problems with explicitly-defined distributions like this):
using POMDPs, QuickPOMDPs, POMDPModelTools, POMDPSimulators, QMDP
m = QuickPOMDP(
states = ["left", "right"],
actions = ["left", "right", "listen"],
observations = ["left", "right"],
initialstate = Uniform(["left", "right"]),
discount = 0.95,
transition = function (s, a)
if a == "listen"
return Deterministic(s) # tiger stays behind the same door
else # a door is opened
return Uniform(["left", "right"]) # reset
end
end,
observation = function (s, a, sp)
if a == "listen"
if sp == "left"
return SparseCat(["left", "right"], [0.85, 0.15]) # sparse categorical distribution
else
return SparseCat(["right", "left"], [0.85, 0.15])
end
else
return Uniform(["left", "right"])
end
end,
reward = function (s, a)
if a == "listen"
return -1.0
elseif s == a # the tiger was found
return -100.0
else # the tiger was escaped
return 10.0
end
end
)
solver = QMDPSolver()
policy = solve(solver, m)
rsum = 0.0
for (s,b,a,o,r) in stepthrough(m, policy, "s,b,a,o,r", max_steps=10)
println("s: $s, b: $([pdf(b,s) for s in states(m)]), a: $a, o: $o")
global rsum += r
end
println("Undiscounted reward was $rsum.")
For more examples with visualization see POMDPGallery.jl.
Several tutorials are hosted in the POMDPExamples repository.
Detailed documentation can be found here.
Many packages use the POMDPs.jl interface, including MDP and POMDP solvers, support tools, and extensions to the POMDPs.jl interface. POMDPs.jl and all packages in the JuliaPOMDP project are fully supported on Linux and OS X. Windows is supported for all native solvers*, and most non-native solvers should work, but may require additional configuration.
POMDPs.jl itself contains only the interface for communicating about problem definitions. Most of the functionality for interacting with problems is actually contained in several support tools packages:
Package |
Build |
Coverage |
---|---|---|
POMDPModelTools | ||
BeliefUpdaters | ||
POMDPPolicies | ||
POMDPSimulators | ||
POMDPModels | ||
POMDPTesting | ||
ParticleFilters |
Package |
Build/Coverage |
Online/ Offline |
Continuous States - Actions |
Rating3 |
---|---|---|---|---|
Value Iteration | |
Offline | N-N | ★★★★★ |
Local Approximation Value Iteration | |
Offline | Y-N | ★★ |
Global Approximation Value Iteration | |
Offline | Y-N | ★★ |
Monte Carlo Tree Search | |
Online | Y (DPW)-Y (DPW) | ★★★★ |
Package |
Build/Coverage |
Online/ Offline |
Continuous States-Actions-Observations |
Rating3 |
---|---|---|---|---|
QMDP (suboptimal) | |
Offline | N-N-N | ★★★★★ |
FIB (suboptimal) | |
Offline | N-N-N | ★★ |
BeliefGridValueIteration | |
Offline | N-N-N | ★★ |
SARSOP* | |
Offline | N-N-N | ★★★★ |
BasicPOMCP | |
Online | Y-N-N1 | ★★★★ |
ARDESPOT | |
Online | Y-N-N1 | ★★★★ |
MCVI | |
Offline | Y-N-Y | ★★ |
POMDPSolve* | |
Offline | N-N-N | ★★ |
IncrementalPruning | |
Offline | N-N-N | ★★★ |
POMCPOW | |
Online | Y-Y2-Y | ★★★ |
AEMS | |
Online | N-N-N | ★★ |
PointBasedValueIteration | |
Offline | N-N-N | ★★ |
1: Will run, but will not converge to optimal solution
2: Will run, but convergence to optimal solution is not proven, and it will likely not work well on multidimensional action spaces
Package |
Build/Coverage |
Continuous States |
Continuous Actions |
Rating3 |
---|---|---|---|---|
TabularTDLearning | |
N | N | ★★ |
DeepQLearning | |
Y1 | N | ★★★ |
1: For POMDPs, it will use the observation instead of the state as input to the policy.
3 Subjective rating; File an issue if you believe one should be changed
- ★★★★★: Reliably Computes solution for every problem.
- ★★★★: Works well for most problems. May require some configuration, or not support every edge of interface.
- ★★★: May work well, but could require difficult or significant configuration.
- ★★: Not recently used (unknown condition). May not conform to interface exactly, or may have package compatibility issues
- ★: Not known to run
These packages were written for POMDPs.jl in Julia 0.6 and have not been updated to 1.0 yet.
Package |
Build |
Coverage |
---|---|---|
DESPOT |
Package |
---|
DESPOT |
*These packages require non-Julia dependencies
If POMDPs is useful in your research and you would like to acknowledge it, please cite this paper:
@article{egorov2017pomdps,
author = {Maxim Egorov and Zachary N. Sunberg and Edward Balaban and Tim A. Wheeler and Jayesh K. Gupta and Mykel J. Kochenderfer},
title = {{POMDP}s.jl: A Framework for Sequential Decision Making under Uncertainty},
journal = {Journal of Machine Learning Research},
year = {2017},
volume = {18},
number = {26},
pages = {1-5},
url = {http://jmlr.org/papers/v18/16-300.html}
}