Computer scientists have demonstrated that positive reinforcement – an approach often used during dog training – could have applications in training algorithms to acquire new skills.
The John Hopkins researchers used positive reinforcement to teach a robot called Spot new skills, including stacking blocks. The robot was able to ‘learn’ in just days what would take a month using conventional approaches, suggesting that positive reinforcement could be a feasible approach for training robots for real-world tasks.
“The question here was how do we get the robot to learn a skill?” said Andrew Hundt, the PhD candidate who led the study. “I’ve had dogs so I know rewards work and that was the inspiration for how I designed the learning algorithm.”
While humans and other animals can learn from trial and error, there is no perfect way to make a machine-learning model adjust efficiently based on its errors.
In this case, Hundt and his colleagues devised a reward system which reflects giving dogs treats during training for performing tasks correctly; the rewards in the case of the algorithm were numerical points.
The researchers used this system (the SPOT system) to teach a robot to stack bricks. As the robot experimented with the blocks, it quickly learned that the correct behaviours for stacking earned points while incorrect ones earned no points; the most points could be earned by placing the final block on top of the stack of blocks. The algorithm was taught several other tasks using the same method, such as playing a navigation game, clearing toys, and lining up blocks.
The SPOT system took just days to teach what previously would have taken weeks – including wasting time exploring dead ends – with the team speeding up the process by running a simulation before running tests with the robot.
Efficiency with respect to actions per trial typically improves by 30 per cent or more, while training takes just 1,000 to 20,000 actions, depending on the task.
“The robot wants the higher score,” said Hundt. “It quickly learns the right behaviour to get the best reward. In fact, it used to take a month of practice for the robot to achieve 100 per cent accuracy. We were able to do it in two days.”
The researchers hope that a positive reinforcement approach could help train robots to perform tasks in real-world settings, such as by training household robots to do laundry and wash dishes, or improving the performance of autonomous driving systems.
“Our goal is to eventually develop robots that can do complex tasks in the real world – like product assembly, caring for the elderly and surgery,” said Professor Gregory Hager, another author of the study. “We don’t currently know how to program tasks like that – the world is too complex. But work like this shows us that there is promise to the idea that robots can learn how to accomplish such real-world tasks in a safe and efficient way.”
Sign up to the E&T News e-mail to get great stories like this delivered to your inbox every day.