My notes from the “Learning to learn” talk by Stanford’s Benjamin Von Roy
Below are my notes from a talk entitled “Learning to Learn” by Benjamin Von Roy. I am reading some of the references and will add more to this document to make it readable for others soon.
I will discuss the importance of learning to learn, and how this is a distinctive element of reinforcement learning relative to other areas of statistical learning. I will then survey some relevant research and discuss recent work with Zheng Wen on an algorithm that efficiently learns to learn (and learns) in dynamic systems with arbitrarily large state spaces by combining optimistic exploration and value function generalization.
Bio: Benjamin Van Roy is broadly interested in the formulation and analysis of mathematical models that address problems in information technology, business, and public policy. He is a Professor of Management Science and Engineering and Electrical Engineering, and, by courtesy, Computer Science, at Stanford University. He has held visiting positions as the Wolfgang and Helga Gaul Visiting Professor at the University of Karlsruhe and as the Chin Sophonpanich Foundation Professor of Banking and Finance at Chulalongkorn University. He has served on the editorial boards of Discrete Event Dynamic Systems, Machine Learning, Mathematics of Operations Research, and Operations Research, for which he is currently the Financial Engineering Area Editor. He has served as a researcher, advisor, founder, or director, for several technology companies. He received the SB (1993) in Computer Science and Engineering and the SM (1995) and PhD (1998) in Electrical Engineering and Computer Science, all from the Massachusetts Institute of Technology.
Reinforcement Learning Models in Literature
- Myopic Learning
- Reinforcement Learning
What is this “Multi-armed bandit” I keep hearing about it everywhere there is an online ad talk. I should learn it. Watch this video lecture later.
Literature on efficient reinforcement learning:
- Kearns-Singh 2002
- Devise plan to learn soon if possible
- Otherwise plan to exploit
- Braffman-Tennenholts 2002
- Optimistic exploration
- Kearns-Koller 1999
- Abbasi -Yadkori-Szepesvari 2011