SimonCqk / towhee

Open source platform for generating embedding vectors.

Home Page:https://towhee.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

https://towhee.io

x2vec, Towhee is all you need!

Slack Twitter License Github Actions Coverage

What is Towhee?

Towhee is a flexible, application-oriented framework for computing embedding vectors over unstructured data. It aims to make democratize anything2vec, allowing everyone - from beginner developers to large organizations - to train and deploy complex machine learning pipelines with just a few lines of code.

Towhee has pre-built pipelines for a variety of tasks, including audio/music embeddings, image embeddings, celebrity recognition, and more. For a full list of pipelines, feel free to visit our Towhee hub.

Key features

  • Easy embedding for everyone: Transform your data into vectors with less than five lines of code.

  • Rich operators and pipelines: No more reinventing the wheel! Collaborate and share pipelines with the open source community.

  • Automatic versioning: Our versioning mechanism for pipelines and operators ensures that you never run into dependency hell.

  • Support for fine-tuning models*: Feed your dataset into our Trainer and get a new model in just a few easy steps.

  • Deploy to cloud*: Ready-made pipelines can be deployed to the cloud with minimal effort.

Features marked with a star (*) are on our roadmap and have not yet been implemented. Help is always appreciated, so come join our Slack or check out our docs for more information.

Getting started

Towhee requires Python 3.6+ and Pytorch 1.4.0+. Support for Tensorflow and scikit-learn models is coming soon. Towhee can be installed via pip:

% pip install -U pip  # if you run into installation issues, try updating pip
% pip install towhee

Towhee provides a variety of pre-built embedding pipelines. For example, generating an embedding can be done in as little as five lines of code:

>>> from towhee import pipeline

# Use our in-built embedding pipeline
>>> img_path = 'towhee_logo.png'
>>> embedding_pipeline = pipeline('image-embedding')
>>> embedding = embedding_pipeline(img_path)

Your image embedding is now stored in embedding. It's that simple.

Dive deeper

If you find that one of our default embedding pipelines does not suit you, you can also specify a custom pipeline from the hub as follows:

>>> embedding_pipeline = pipeline('towhee/image-embedding-resnet101')

For a full list of supported pipelines, visit our docs page.

Custom machine learning pipelines can be defined in a YAML file or via a Spark-like high-level programming interface (coming soon ™). The first time you instantiate and use a pipeline, all Python functions, configuration files, and model weights are automatically downloaded from the Towhee hub. To ease the development process, pipelines which already exist in the local Towhee cache (/$HOME/.towhee/pipelines) will be automatically loaded:

# This will load the pipeline defined at $HOME/.towhee/pipelines/fzliu/my-embedding-pipeline.yaml
>>> embedding_pipeline = pipeline('fzliu/my-embedding-pipeline')

Architecture overview

Towhee is composed of three main building blocks - Pipelines, Operators, and a singleton Engine.

  • Pipeline: A Pipeline is a single embedding generation task that is composed of several operators. Operators are connected together within the pipeline via a directed acyclic graph.

  • Operator: An Operator is a single node within a pipeline. An operator can be a machine learning model, a complex algorithm, or a Python function. All files needed to run the operator are contained within a directory (e.g. code, configs, models, etc...).

  • Engine: The Engine sits at Towhee's core. Given a Pipeline, the Engine will drive dataflow between individual operators, schedule tasks, and monitor compute resource (CPU/GPU/etc) usage. We provide a basic Engine within Towhee to run pipelines on a single-instance machine - K8s and other more complex Engine implementations are coming soon.

For a deeper dive into Towhee and its architecture, check out the Towhee docs.

Contributing

Remember that writing code is not the only way to contribute! Submitting issues, answering questions, and improving documentation are some of the many ways you can join our growing community. Check out our contributing page for more information.

Special thanks goes to these folks for contributing to Towhee, either on Github, our Towhee Hub, or elsewhere:




Looking for a database to store and index your embedding vectors? Check out Milvus.

About

Open source platform for generating embedding vectors.

https://towhee.io

License:Apache License 2.0


Languages

Language:Python 96.6%Language:Jupyter Notebook 3.4%