Encoder that embeds documents using either the CLIP vision encoder or the CLIP text encoder, depending on the content type of the document.
CLIPEncoder is an encoder that wraps the image and text embedding functionality of the CLIP model from huggingface transformers. The encoder embeds documents using either the text or the visual part of CLIP, depending on their content.
For more information on the gpu
usage and volume
mounting, please refer to the documentation.
For more information on CLIP model, checkout the blog post,
paper and hugging face documentation
from jina import Flow
f = Flow().add(uses='jinahub+docker://CLIPEncoder')
from jina import Flow
f = Flow().add(uses='jinahub://CLIPEncoder')
- To override
__init__
args & kwargs, use.add(..., uses_with: {'key': 'value'})
- To override class metas, use
.add(..., uses_metas: {'key': 'value})