A human can make intuitive choices about what actions to take in order to achieve a goal. Robots have a far more difficult time choosing from of a universe of possible actions. Researchers at Brown University are developing an algorithm that can learn that skill from a video game environment.
Researchers from Brown University are developing a new algorithm to help robots better plan their actions in complex environments. It’s designed to help robots be more useful in the real world, but it’s being developed with the help of a virtual world — that of the video game Minecraft.
Basic action planning, while easy for humans, is a frontier of robotics. Part of the problem is that robots don’t intuitively ignore objects and actions that are irrelevant to the task at hand. For example, if someone asked you to empty the trashcan in the kitchen, you would know there’s no need to turn on the oven or open the refrigerator. You’d go right to the trashcan.
Robots, however, lack that intuition. Most approaches to planning consider the entire set of possible objects and actions before deciding which course to pursue. In other words, a robot might actually consider turning on the oven as part of its planning process for taking out the trash. In complex environments, this leads to what computer scientists refer to as the “state-space explosion” — an array of choices so large that it boggles the robot mind.
“It’s a really tough problem,” said Stefanie Tellex, assistant professor of computer science at Brown. “We want robots that have capabilities to do all kinds of different things, but then the space of possible actions becomes enormous. We don’t want to limit the robot’s capabilities, so we have to find ways to shrink the search space.”
The algorithm that Tellex and her students are developing does just that. David Abel, a graduate student in Tellex’s lab, led the work and will present it this week at the International Conference on Automated Planning and Scheduling.
Discovering the likely path
The algorithm augments standard robot planning algorithms using “goal-based action priors” — sets of objects and actions in a given space that are most likely to help an agent achieve a given goal. The priors for a given task can be supplied by an expert operator, but they can also be learned by the algorithm itself through trial and error.
The game Minecraft, as it turns out, provided an ideal world to test how well the algorithm learned action priors and implemented them in the planning process. For the uninitiated, Minecraft is an open-ended game, where players gather resources and build all manner of structures by destroying or stacking 3-D blocks in a virtual world. At over 100 million registered users, it’s among the most popular video games of all time.
“Minecraft is a really good a model of a lot of these robot problems,” Tellex said. “There’s a huge space of possible actions somebody playing this game can do, and it’s really cheap and easy to collect a ton of training data. It’s much harder to do that in the real world.”
Tellex and her colleagues started by constructing small domains, each just a few blocks square, in a model of Minecraft that the researchers developed. Then they plunked a character into the domain and gave it a task to solve — perhaps mining some buried gold or building a bridge to cross a chasm. The agent, powered by the algorithm, then had to try different options in order to learn the task’s goal-based priors — the best actions to get the job done.
“It’s able to learn that if you’re standing next to a trench and you’re trying to walk across, you can place blocks in the trench. Otherwise don’t place blocks,” Tellex said. “If you’re trying to mine some gold under some blocks, destroy the blocks. Otherwise don’t destroy blocks.”
After the algorithm ran through a number of trials of a given task to learn the appropriate priors, the researchers moved to a new domain that it had never seen before to see if it could apply what it learned. Indeed, the researchers showed that, armed with priors, their Minecraft agents could solve problems in unfamiliar domains much faster than agents powered by standard planning algorithms.
Having honed the algorithm in virtual worlds, the researchers then tried it out in a real robot. They used the algorithm to have a robot help a person in the task of baking brownies. The algorithm was supplied with several action priors for the task. For example, one action prior let the robot know that eggs often need to be beaten with a whisk. So when a carton of eggs appears in the robot’s workspace, it is able to anticipate the cook’s need for a whisk and hand him one.
In light of the results, Tellex says she sees goal-based action priors as a viable strategy to help robots cope with the complexities of unstructured environments — something that will be important as robots continue to move out of controlled settings and into our homes.
The work also shows the potential of virtual spaces like Minecraft in developing solutions for real-world robots and other artificial agents. “I think it’s going to provide a way for very rapid iteration for algorithms that we can then run in our robots and have some confidence they’re going to work,” Tellex said.
For this particular paper, the team used very small domains in a Minecraft mock-up in order to help the algorithm learn more quickly. But the team has now developed a modification — a mod — enabling the algorithm to run in the real game, freely available online. The mod was developed using BURLAP, a code library curated at Brown for learning and planning algorithms. The team hopes other researchers can use it to solve new problems, or that regular Minecraft players might find it useful.
In the real game, the researchers hope to use their algorithm to perform tasks in larger and larger Minecraft domains — and ultimately perhaps all of Minecraft. That would be a huge leap for artificial intelligence.
“The whole of Minecraft is what we refer to as ‘A.I. complete,’” Tellex said. “If you can do all of Minecraft you could solve anything. That’s pretty far off, but there are lots of interesting research objectives along the way.”
Source: Brown University