Embed media like images, audio, 3d, video or etc?

Question

Embed media like images, audio, 3d, video or etc?

fire opened this issue 7 months ago · comments

K. S. Ernest (iFire) Lee commented 7 months ago

Hi,

I was wondering if it was in scope to embed media?

emrgnt-cmplxty · Answer 1 · Tue Feb 27 2024 03:31:47 GMT+0800 (China Standard Time)

That's definitely in scope. The best way to approach this would be to introduce the necessary embedding providers and to modify or create a new pipeline that shows an example of this in action.

I'm happy to team up on this.

K. S. Ernest (iFire) Lee · Answer 2 · Tue Feb 27 2024 03:50:41 GMT+0800 (China Standard Time)

I have two primary usecases:

The basic use-case is taking an image and making it an embedding for use. Like stable diffusion or the various combined vision-text models. There are a few models that can also also do video.
My pet emerging technologies use-case is to take a 3d mesh from https://github.com/lucidrains/meshgpt-pytorch and have it auto complete vertices or search a database of other embedded meshes using the mesh-token-embedding.
Someday maybe: audio, speech. I am not familiar at all with this.

emrgnt-cmplxty · Answer 3 · Wed Feb 28 2024 04:51:19 GMT+0800 (China Standard Time)

For image embedding, do you think we can fit it into the pipeline here [https://github.com/SciPhi-AI/R2R/blob/main/r2r/pipelines/basic/ingestion.py] with a specific embedding provider, or do you think we need to fundamentally rework the structure of the codebase in some way?

I think multi-modal is an important use case and I am very interested in figuring out how to best support this.

K. S. Ernest (iFire) Lee · Answer 4 · Thu Feb 29 2024 09:40:56 GMT+0800 (China Standard Time)

I don't think I can drive multi-modal too much, but I'll see what spare time I can gather.

K. S. Ernest (iFire) Lee · Answer 5 · Thu Feb 29 2024 09:41:53 GMT+0800 (China Standard Time)

The obvious question are like what happens when we have two different embedding models like token integers, how do we sync them?