Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

Cameron Voloshin, Hoang M. Le, Nan Jiang, Yisong Yue
2019
5 references

Abstract

We offer an experimental benchmark and empirical study for off-policy policy evaluation (OPE) in reinforcement learning, which is a key problem in many safety critical applications. Given the increasing interest in deploying learning-based methods, there has been a flurry of recent proposals for OPE method, leading to a need for standardized empirical analyses. Our work takes a strong focus on diversity of experimental design to enable stress testing of OPE methods. We provide a comprehensive benchmarking suite to study the interplay of different attributes on method performance. We distill the results into a summarized set of guidelines for OPE in practice. Our software package, the Caltech OPE Benchmarking Suite (COBS), is open-sourced and we invite interested researchers to further contribute to the benchmark.

1 repository
5 references

Code References

â–¶ ray-project/ray
5 files
â–¶ rllib/offline/estimators/direct_method.py
1
For more information refer to https://arxiv.org/pdf/1911.06854.pdf"""
â–¶ rllib/offline/estimators/doubly_robust.py
1
For more information refer to https://arxiv.org/pdf/1911.06854.pdf"""
â–¶ rllib/offline/estimators/fqe_torch_model.py
1
https://arxiv.org/abs/1911.06854
â–¶ rllib/offline/estimators/importance_sampling.py
1
For more information refer to https://arxiv.org/pdf/1911.06854.pdf"""
â–¶ rllib/offline/estimators/weighted_importance_sampling.py
1
For more information refer to https://arxiv.org/pdf/1911.06854.pdf"""
Link copied to clipboard!