Building Open-Ended Embodied Agents with Internet-Scale Knowledge
[Website] [Arxiv Paper] [PDF] [Docs] [Open Database] [MineCLIP] [Team]
is a new AI research framework for building open-ended, generally capable embodied agents. MineDojo features a massive simulation suite built on Minecraft with 1000s of diverse tasks, and provides open access to an internet-scale knowledge base of 730K YouTube videos, 7K Wiki pages, 340K Reddit posts.
Using MineDojo, AI agents can freely explore a procedurally generated 3D world with diverse terrains to roam 🌏 , materials to mine 💎, tools to craft 🔧, structures to build 🏰, and wonders to discover ✨. Instead of training in isolation, your agent will be able to learn from the collective wisdom of millions of human players around the world!
🥳 NEWS:
- MineDojo won the Outstanding Paper award at NeurIPS!
- MineCLIP reward model and agent code are released!
- We have open-sourced the creative task labeling UI, so researchers can curate more tasks from YouTube themselves. This tool can also be used beyond Minecraft for other agent domains.
MineDojo requires Python ≥ 3.9. We have tested on Ubuntu 20.04 and Mac OS X. Please follow this guide to install the prerequisites first, such as JDK 8 for running Minecraft backend. We highly recommend creating a new Conda virtual env to isolate dependencies. Alternatively, we have provided a pre-built Docker image for easier installation.
Installing the MineDojo stable version is as simple as:
pip install minedojo
To install the cutting edge version from the main branch of this repo, run:
git clone https://github.com/MineDojo/MineDojo && cd MineDojo
pip install -e .
You can run the script below to verify the installation. It takes a while to compile the Java code for the first time. After that you should see a Minecraft window pop up, with the same gaming interface that human players receive. You should see the message [INFO] Installation Success
if everything goes well.
python minedojo/scripts/validate_install.py
Note that if you are on a headless machine, don't forget to prepend either xvfb-run
or MINEDOJO_HEADLESS=1
:
xvfb-run python minedojo/scripts/validate_install.py
# --- OR ---
MINEDOJO_HEADLESS=1 python minedojo/scripts/validate_install.py
MineDojo provides a Gym-style interface for developing embodied agents that interact with the simulator in a loop. Here is a very simple code snippet of a hardcoded agent that runs forward and jumps every 10 steps in the "Harvest Wool" task:
import minedojo
env = minedojo.make(
task_id="harvest_wool_with_shears_and_sheep",
image_size=(160, 256)
)
obs = env.reset()
for i in range(50):
act = env.action_space.no_op()
act[0] = 1 # forward/backward
if i % 10 == 0:
act[2] = 1 # jump
obs, reward, done, info = env.step(act)
env.close()
Please refer to this tutorial for a detailed walkthrough of your first agent. MineDojo features a multimodal observation space (RGB, compass, voxels, etc.) and a compound action space (movement, camera, attack, craft, etc.). See this doc to learn more. We recommend you to reference the full observation and action space specifications.
MineDojo can be extensively customized to be tailored to your research needs. Please check out customization guides on tasks, simulation, and privileged observation.
MineCLIP reward model and agent code are open-sourced. Please refer to the paper for more algorithmic details.
MineDojo features a massively multitask benchmark with 3142 tasks in the current release.
We design a unified top-level function minedojo.make()
, similar to gym.make
, that creates all the tasks and environments in our benchmarking suite. We categorize the tasks into Programmatic, Creative, and Playthrough.
Task Category | Count | Description |
---|---|---|
Programmatic | 1581 | Can be automatically scored based on ground-truth simulator states |
Creative | 1560 | Do not have well-defined or easily-automated success criteria |
Playthrough | 1 | Special achievement: defeat the Ender dragon, "beat the game" |
We pair all tasks with natural language descriptions of task goals (i.e. "prompts"), such as "obtain 8 bone in swampland"
and "make a football stadium"
. Many tasks also have step-by-step guidance generated by GPT-3. Users can access a comprehensive listing of prompts and guidance for all task by:
# list of string IDs
all_ids = minedojo.tasks.ALL_TASK_IDS
# dict: {task_id: (prompt, guidance)}
all_instructions = minedojo.tasks.ALL_TASK_INSTRUCTIONS
1581 Programmatic tasks can be further divided into four categories: (1) Survival: surviving for a designated number of days, (2) Harvest: finding, obtaining, cultivating, or manufacturing hundreds of materials and objects, (3) Tech Tree: the skills of crafting and using a hierarchy of tools, and (4) Combat: fight various monsters and creatures to test agent's reflex and martial skills. Refer to this doc for more information.
The following code creates a Programmatic task with ID harvest_milk
with 160x256 resolution:
env = minedojo.make(task_id="harvest_milk", image_size=(160, 256))
You can access task-related attributes such as task_prompt
and task_guidance
:
>>> env.task_prompt
obtain milk from a cow
>>> env.task_guidance
1. Find a cow.
2. Right-click the cow with an empty bucket.
Here we show a few examples from each category:
Similar to Programmatic tasks, Creative tasks can be instantiated by minedojo.make()
. The only difference is that task_id
no longer has any semantic meaning. Instead, the format becomes creative:{task_index}
. You can query all Creative task IDs from minedojo.tasks.ALL_CREATIVE_TASK_IDS
.
The following code instantiates the 256th task from our Creative suite:
env = minedojo.make(task_id="creative:255", image_size=(160, 256))
Let's see what the task prompt and guidance are:
>>> env.task_prompt
Build a replica of the Great Pyramid of Giza
>>> env.task_guidance
1. Find a desert biome.
2. Find a spot that is 64 blocks wide and 64 blocks long.
3. Make a foundation that is 4 blocks high.
4. Make the first layer of the pyramid using blocks that are 4 blocks wide and 4 blocks long.
5. Make the second layer of the pyramid using blocks that are 3 blocks wide and 3 blocks long.
6. Make the third layer of the pyramid using blocks that are 2 blocks wide and 2 blocks long.
7. Make the fourth layer of the pyramid using blocks that are 1 block wide and 1 block long.
8. Make the capstone of the pyramid using a block that is 1 block wide and 1 block long.
Please refer to this doc for more details on Creative tasks.
Playthrough task's instruction is to "Defeat the Ender Dragon and obtain the trophy dragon egg". This task holds a unique position because killing the dragon means "beating the game" in the traditional sense of the phrase, and is considered the most significant achievement for a new player. The mission requires lots of preparation, exploration, agility, and trial-and-error, which makes it a grand challenge for AI:
env = minedojo.make(task_id="playthrough",image_size=(160, 256))
Minecraft has more than 100M active players, who have collectively generated an enormous wealth of data. MineDojo features a massive database collected automatically from the internet. AI agents can learn from this treasure trove of knowledge to harvest actionable insights, acquire diverse skills, develop complex strategies, and discover interesting objectives to pursue. All our databases are open-access and available to download today!
Minecraft is among the most streamed games on YouTube. Human players have demonstrated a stunning range of creative activities and sophisticated missions that take hours to complete. We collect 730K+ narrated Minecraft videos, which add up to ~300K hours and 2.2B words in English transcripts. The time-aligned transcripts enable the agent to ground free-form natural language in video pixels and learn the semantics of diverse activities without laborious human labeling. Please refer to the doc page for how to load our YouTube database.
The Wiki pages cover almost every aspect of the game mechanics, and supply a rich source of unstructured knowledge in multimodal tables, recipes, illustrations, and step-by-step tutorials. We scrape ~7K pages that interleave text, images, tables, and diagrams. To preserve the layout information, we also save the screenshots of entire pages and extract bounding boxes of the visual elements. Please refer to the doc page for how to load our Wiki database.
We collect 340K+ Reddit posts along with 6.6M comments under the “r/Minecraft” subreddit. These posts ask questions on how to solve certain tasks, showcase cool architectures and achievements in image/video snippets, and discuss general tips and tricks for players of all expertise levels. Large language models can be finetuned on our Reddit corpus to internalize Minecraft-specific concepts and develop sophisticated strategies. Please refer to the doc page for how to load our Reddit database.
Our paper is available on Arxiv. If you find our code or databases useful, please consider citing us!
@inproceedings{fan2022minedojo,
title = {MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge},
author = {Linxi Fan and Guanzhi Wang and Yunfan Jiang and Ajay Mandlekar and Yuncong Yang and Haoyi Zhu and Andrew Tang and De-An Huang and Yuke Zhu and Anima Anandkumar},
booktitle = {Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year = {2022},
url = {https://openreview.net/forum?id=rc8o_j8I8PX}
}
Component | License |
---|---|
Codebase (this repo) | MIT License |
YouTube Database | Creative Commons Attribution 4.0 International (CC BY 4.0) |
Wiki Database | Creative Commons Attribution Non Commercial Share Alike 3.0 Unported |
Reddit Database | Creative Commons Attribution 4.0 International (CC BY 4.0) |