Andyyoung0507 / PALM-E

Implementation of "PaLM-E: An Embodied Multimodal Language Model"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multi-Modality

🌴 PALM-E: A Foundational Multi-Modal AI Model

This is the open source implementation of the SOTA multi-modality foundation model "PALM-E: An Embodied Multimodal Language Model" from Google, PALM-E is a single large embodied multimodal model, that can address a variety of embodied reasoning tasks, from a variety of observation modalities, on multiple embodiments, and further, exhibits positive transfer: the model benefits from diverse joint training across internet-scale language, vision, and visual-language domains.

GitHub issues GitHub forks GitHub stars GitHub license Share on Twitter Share on Facebook Share on LinkedIn Discord Share on Reddit Share on Hacker News Share on Pinterest Share on WhatsApp

Appreciation

  • All the creators in Agora, Join Agora the community of AI engineers changing the world with their creations.
  • LucidRains for inspiring me to devote myself to open source AI

πŸš€ Quick Start

Installation πŸ“¦

pip install palme

Usage 🎨

import torch
from palme.model import PalmE

#usage
img = torch.randn(1, 3, 256, 256)
text = torch.randint(0, 20000, (1, 1024))

model = PalmE()
output = model(text, img)

Contribute || Be Part of the PALM-E Adventure 🀝

Your brilliance is needed! Join us, and together, let's make PALM-E even more awe-inspiring:

  1. Get Your Copy: Fork the PALM-E repo.
  2. Make It Local: Clone your fork.
  3. Prep Your Tools: Install the necessities.
  4. Discover & Innovate: Dive into the code.
  5. Craft Your Magic: Branch and code away.
  6. Show & Tell: Push your changes and craft a pull request.

🐞 Fixes, 🎨 enhancements, πŸ“ docs, or πŸ’‘ ideas – all are welcome! Let's shape the future of AI, hand in hand.

Roadmap

  • πŸ•΅οΈ Verify decoder configurations.
  • πŸš‚ Recreate the training strategy detailed in the paper.
  • 🌐 Train on the datasets used in the paper.

Citation

@article{driess2023palme,
  title={PALM-E: An Embodied Multimodal Language Model},
  author={Driess, Danny and Xia, Fei and Sajjadi, Mehdi S. M. and Lynch, Corey and Chowdhery, Aakanksha and Ichter, Brian and Wahid, Ayzaan and Tompson, Jonathan and Vuong, Quan and Yu, Tianhe and Huang, Wenlong and Chebotar, Yevgen and Sermanet, Pierre and Duckworth, Daniel and Levine, Sergey and Vanhoucke, Vincent and Hausman, Karol and Toussaint, Marc and Greff, Klaus and Zeng, Andy and Mordatch, Igor and Florence, Pete},
  journal={arXiv preprint arXiv:2303.03378},
  year={2023},
  url={https://doi.org/10.48550/arXiv.2303.03378}
}

About

Implementation of "PaLM-E: An Embodied Multimodal Language Model"

License:Apache License 2.0


Languages

Language:Python 100.0%