A group of researchers at Google DeepMind have published a paper today outlining how they successfully trained artificial intelligence to play "infamously hard exploration games" with YouTube videos of human playthroughs.
Even if you (like me) aren't well-versed in the art of AI, the paper has some interesting takeaways, including the core premise that it's apparently difficult for deep reinforcement learning algorithms to improve at tasks which, in the words of the paper, take place "where environment rewards are particularly sparse."
"One epitomizing example is Atari’s Montezuma's Revenge, which requires a human-like avatar to navigate a series of platforms and obstacles (the nature of which change substantially room-to-room) to collect point-scoring items," the paper reads. "For example, reaching the first environment reward in Montezuma's Revenge takes approximately 100 environment steps, equivalent to 100^18 possible action sequences."
While researchers say they can feed human demonstrations to the AI to teach the correct way to play a game, this paper seems to suggest that AI learn to play in a more interesting way (i.e. not using artificial demonstrations) by showing it a bunch of (mismatching) YouTube videos of people playing a game, with one earmarked as what the AI should do to receive a reward.
"Specifically, providing a standard RL agent with an imitation reward learnt from a single YouTube video, we are the first to convincingly exceed human-level performance on three of Atari’s hardest exploration games: Montezuma's Revenge, Pitfall! and Private Eye," the paper continues. "Despite the challenges of designing reward functions or learning them using inverse reinforcement learning, we also achieve human-level performance even in the absence of an environment reward signal."
It's fascinating research, and you can read the rest of the details in full in the appropriately titled paper "Playing hard exploration games by watching YouTube" on Arxiv's website. Some demonstration videos (one of which is embedded above) are also available to watch on researcher Tobias Pfaff's YouTube channel.