InfoQ Podcast: Phil Winder on the History, Practical Application, and Ethics of Reinforcement Learning

REINFORCEMENT LEARNING

Charles Humble, friend and editor of InfoQ, was kind enough to ask me for an interview to talk more about my new book, in podcast format. From the blurb:

In this episode of the InfoQ podcast Dr Phil Winder, CEO of Winder Research, sits down with InfoQ podcast co-host Charles Humble. They discuss: the history of Reinforcement Learning (RL); the application of RL in fields such as robotics and content discovery; scaling RL models and running them in production; and ethical considerations for RL.

Key Takeaways

  • Reinforcement Learning came out of experiments around how animals learn. Most intelligent animals learn through reinforcement, the idea of providing positive or negative feedback to actions that were performed in the past. More intelligent animals are able to learn more complex sets of actions that lead to higher order behavior.
  • Likewise, unlike typically myopic Machine Learning approaches, Reinforcement Learning aims to optimize and solve problems based upon sequences of decisions over time.
  • A fundamental mathematical concept for Reinforcement Learning is the Markov Decision Process developed by Richard Bellman in the 1950s. It comprises an agent, an environment and a reward. Rewards are ideally simple, easy to understand, and mapped directly to the problem you’re trying to solve.
  • One area that has seen relatively wide industry adoption of RL is robotics. There has also been success with recommendation systems for content discovery. However it remains challenging to operate Reinforcement Learning models at scale, at least in part because RL models are inherently mutable.
  • The ethical considerations for RL are similar to other Machine Learning approaches, but inherently more complex give the nature of the model. Observability/auditability is key. There are also approaches such as safe RL where the algorithms are being trained to exist within a constrained and confined set of states.