Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning

Luofeng Liao, Zuyue Fu, Zhuoran Yang, Mladen Kolar, Zhaoran Wang

February 2021

Abstract

In offline reinforcement learning (RL) an optimal policy is learnt solely from a priori collected observational data. However, in observational data, actions are often confounded by unobserved variables. Instrumental variables (IVs), in the context of RL, are the variables whose influence on the state variables are all mediated through the action. When a valid instrument is present, we can recover the confounded transition dynamics through observational data. We study a confounded Markov decision process where the transition dynamics admit an additive nonlinear functional form. Using IVs, we derive a conditional moment restriction (CMR) through which we can identify transition dynamics based on observational data. We propose a provably efficient IV-aided Value Iteration (IVVI) algorithm based on a primal-dual reformulation of CMR. To the best of our knowledge, this is the first provably efficient algorithm for instrument-aided offline RL.

Type

Preprint

Publication

Technical report

stat.ML cs.LG

Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning

Abstract

Luofeng Liao

MS Student (2020-2021)

Mladen Kolar

Associate Professor of Econometrics and Statistics

Related