OpenAI experts trained a neural network to play Minecraft at a level as high as human players.
The neural network was trained over 70,000 hours of various in-game footage, supplemented by a small database videos in which contractors performed specific in-game tasks, with the keyboard and Mouse entries also recorded.
After fine-tuning, OpenAI discovered that the model was capable of performing all sorts of complex skills, from swimming to hunting animals and eating their meat. He also grabbed the “pillar jump”, a move where the player places a block of material under them mid-jump in order to gain altitude.
Perhaps most impressively, the AI was able to craft diamond tools (requiring a long series of actions to be performed in sequence), which OpenAI described as an “unprecedented” achievement for a computing agent. .
A breakthrough in AI?
The significance of the Minecraft project is that it demonstrates the effectiveness of a new technique OpenAI is deploying in training AI models – called Video PreTraining (VPT) – which the company says could speed up the development of “agents using general computers”.
Historically, the difficulty of using raw video as a source for training AI models has been that What happened is quite simple to understand, but not necessarily How? ‘Or’ What. This is because the AI model would absorb the desired results, but not understand the combinations of inputs needed to achieve them.
With VPT, however, OpenAI combines a large set of video data pulled from public web sources with a pool of carefully curated footage labeled with relevant keyboard and mouse movements to establish the baseline model.
To refine the base model, the team then incorporates smaller datasets designed to teach specific tasks. In this context, OpenAI used footage of players performing early game actions such as chopping down trees and building crafting tables, which would have resulted in a “massive improvement” in the reliability with which the model was able to perform these tasks.
Another technique is to “reward” the AI model for completing each step of a sequence of tasks, a practice known as reinforcement learning. It was this process that allowed the neural network to collect all the ingredients for a diamond pickaxe with a human-level success rate.
“VPT paves the way for agents to learn how to act by watching large numbers of videos on the Internet. Compared to generative video modeling or contrastive methods that would only produce representational priors, VPT offers the exciting possibility of ‘directly learn large-scale behavioral priors in more areas than language,’ OpenAI explained in a blog post (opens in a new tab).
“Although we’re only experimenting in Minecraft, the game is very open and the native human interface (mouse and keyboard) is very generic, so we think our results bode well for other similar areas, e.g. using a computer.”
To encourage new experiments in space, OpenAI has partnered with the MineRL NeurIPS Contest, donating its contractor data and template code to candidates trying to use AI to solve complex Minecraft tasks. The grand prize: $100,000.