xetdata / seamless_monorepo

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Seamless Communication Monorepo

hero

XetHub hosted fork of Meta's Seamless Communication models in a monorepo design:

  • models\ folder contains all of the model files themselves
  • code\ folder is a Git submodule to Meta's repo containing code and documentation

Large files like the model files are hosted by XetHub while source code is still hosted by GitHub using our GitHub app. This ML monorepo design bakes in reproducibility with no workflow changes and simplifies versioning since source code and large files can live in the same logical folder.

ML_monorepo

๐Ÿ”‚ Clone the entire Repo

This repo contains 41 GB of model files so double check your available storage on your target machine.

  1. Install our tiny git-xet extension for your operating system. This extension lets you pull and push large files in Xet managed GitHub repos.
  2. Then clone the repo locally:
git clone git@github.com:xetdata/seamless_monorepo.git
  1. The download may take a while and you will see output from the git-xet client resembling the following:
git-xet 0.12.5 filter started
Updating files: 100% (39/39), done.
Xet: Retrieving data blocks: 15.34 GiB / 110 MiB/s
Filtering content: 45% (11/24), 4.72 GiB / 70 MiB/s
  1. The code/ folder is a Git sub-module that links to Meta's original repo. Download it using from the root directory of this monorepo:
git submodule update --init --recursive

Bonus tip: save your SSH passphrase in your keychain so you don't have to enter it 4 times every time you git clone or git push.

๐Ÿ›‹๏ธ Quickly lazy clone just the code

If you have limited storage space or don't want to wait for the full download of all the model files, you can use the lazy clone feature baked into our git-xet extension:

git xet clone --lazy git@github.com:xetdata/seamless_monorepo.git

This command downloads all files managed by GitHub directly (like source code and markdown files) and only downloads pointers to larger binary files managed by XetHub.

Use the following command to materialize specific files:

git xet materialize models/seamless-streaming/seamless_streaming_unity.pt

View a full list of currently materialized files using:

git xet lazy show

๐Ÿ—ปMount the entire repo locally

You can also mount the entire model repo in just a few seconds. The files you need are fetched behind the scenes as you need them.

git xet mount git@github.com:xetdata/seamless_monorepo.git

๐ŸŒƒ Join our community

Join our Slack community here.

๐Ÿ“ Legal Disclosures

Seamless Expressive models

Meta requires that you register your email with them to use the Seamless Expressive models. You can fill out the form here.

Licenses

Meta's Seamless models have multiple licenses that you need to comply with.

The following non-generative components are MIT licensed as found in MIT_LICENSE:

The following models are CC-BY-NC 4.0 licensed as found in the LICENSE:

  • SeamlessM4T models (v1 and v2).
  • SeamlessStreaming models.

The following models are Seamless licensed as found in SEAMLESS_LICENSE:

  • Seamless models.
  • SeamlessExpressive models.

About