AkshaykrishnaM / jina

Cloud-native neural search framework for ๐™–๐™ฃ๐™ฎ kind of data

Home Page:https://docs.jina.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Jina logo: Jina is a cloud-native neural search framework

Cloud-Native Neural Search? Framework for Any Kind of Data

Python 3.7 3.8 3.9 PyPI Docker Image Version (latest semver) codecov

Jina is a neural search framework that empowers anyone to build SOTA and scalable deep learning search applications in minutes.

โฑ๏ธ Save time - The design pattern of neural search systems. Native support on PyTorch/Keras/ONNX/Paddle, solution building in just minutes.

๐ŸŒŒ All data types - Processing, indexing, querying, understanding of video, image, long/short text, music, source code, PDF, etc.

๐ŸŒฉ๏ธ Local & cloud friendly - Distributed architecture, scalable & cloud-native from day one. Same developer experience on both local and cloud.

๐Ÿฑ Own your stack - Keep end-to-end stack ownership of your solution. Avoid integration pitfalls you get with fragmented, multi-vendor, generic legacy tools.

Install

pip install -U jina

More install options including Conda, Docker, on Windows can be found here.

Get Started

Get started with Jina to build production-ready neural search solution via ResNet in less than 20 minutes

We promise you to build a scalable ResNet-powered image search service in 20 minutes or less, from scratch. If not, you can forget about Jina.

Basic Concepts

Document, Executor, and Flow are three fundamental concepts in Jina.

  • Document is the basic data type in Jina;
  • Executor is how Jina processes Documents;
  • Flow is how Jina streamlines and distributes Executors.

Leveraging these three components, let's build an app that find similar images using ResNet50.

ResNet50 Image Search in 20 Lines

๐Ÿ’ก Preliminaries: download dataset, install PyTorch & Torchvision

from jina import DocumentArray, Document

def preproc(d: Document):
    return (d.load_uri_to_image_blob()  # load
             .set_image_blob_normalization()  # normalize color 
             .set_image_blob_channel_axis(-1, 0))  # switch color axis
docs = DocumentArray.from_files('img/*.jpg').apply(preproc)

import torchvision
model = torchvision.models.resnet50(pretrained=True)  # load ResNet50
docs.embed(model, device='cuda')  # embed via GPU to speedup

q = (Document(uri='img/00021.jpg')  # build query image & preprocess
     .load_uri_to_image_blob()
     .set_image_blob_normalization()
     .set_image_blob_channel_axis(-1, 0))
q.embed(model)  # embed
q.match(docs)  # find top-20 nearest neighbours, done!

Done! Now print q.matches and you will see most-similar images URIs.

Print q.matches to get visual similar images in Jina using ResNet50

Add 3 lines of code to visualize them:

for m in q.matches:
    m.set_image_blob_channel_axis(0, -1).set_image_blob_inv_normalization()
q.matches.plot_image_sprites()

Visualize visual similar images in Jina using ResNet50

Sweet! FYI, one can use Keras, ONNX, PaddlePaddle for the embedding model. Jina supports them well.

As-a-Service in 10 Extra Lines

With an extremely trivial refactoring and 10 extra lines of code, you can make the local script as a ready-to-serve service:

  1. Import what we need.

    from jina import Document, DocumentArray, Executor, Flow, requests
  2. Copy-paste the preprocessing step and wrap it via Executor:

    class PreprocImg(Executor):
        @requests
        def foo(self, docs: DocumentArray, **kwargs):
            for d in docs:
                (d.load_uri_to_image_blob()  # load
                 .set_image_blob_normalization()  # normalize color
                 .set_image_blob_channel_axis(-1, 0))  # switch color axis
  3. Copy-paste the embedding step and wrap it via Executor:

    class EmbedImg(Executor):
        def __init__(self, **kwargs):
            super().__init__(**kwargs)
            import torchvision
            self.model = torchvision.models.resnet50(pretrained=True)        
    
        @requests
        def foo(self, docs: DocumentArray, **kwargs):
            docs.embed(self.model)
  4. Wrap the matching step into Executor:

    class MatchImg(Executor):
        _da = DocumentArray()
    
        @requests(on='/index')
        def index(self, docs: DocumentArray, **kwargs):
            self._da.extend(docs)
    
        @requests(on='/search')
        def foo(self, docs: DocumentArray, **kwargs):
            docs.match(self._da)
            for d in docs.traverse_flat('r,m'):  # only require for visualization
                d.convert_uri_to_datauri()  # convert to datauri
                d.pop('embedding', 'blob')  # remove unnecessary fields for save bandwidth
  5. Connect all Executors in a Flow, scale embedding to 3:

    f = Flow(port_expose=12345, protocol='http').add(uses=PreprocImg).add(uses=EmbedImg, replicas=3).add(uses=MatchImg)

    Plot it via f.plot('flow.svg') and you get:

  6. Index image data and serve REST query from public:

    with f:
        f.post('/index', DocumentArray.from_files('img/*.jpg'), show_progress=True, request_size=8)
        f.block()

Done! Now query it via curl you can get most-similar images:

Use curl to query image search service built by Jina & ResNet50

Or go to http://0.0.0.0:12345/docs and test requests via Swagger UI:

Visualize visual similar images in Jina using ResNet50

Or use a Python client to access the service:

from jina import Client, Document
from jina.types.request import Response

def print_matches(resp: Response):  # the callback function invoked when task is done
    for idx, d in enumerate(resp.docs[0].matches):  # print top-3 matches
        print(f'[{idx}]{d.scores["cosine"].value:2f}: "{d.uri}"')

c = Client(protocol='http', port=12345)  # connect to localhost:12345
c.post('/search', Document(uri='img/00021.jpg'), on_done=print_matches)

At this point, you probably have taken 15 minutes but here we are: an image search service with rich features:

โœ… Solution as microservices โœ… Scale in/out any component โœ… Query via HTTP/WebSocket/gRPC/Client
โœ… Distribute/Dockerize components โœ… Async/non-blocking I/O โœ… Extendable REST interface

Deploy to Kubernetes in 7 Minutes

Have another 7 minutes? We can show you how to bring your service to the next level by deploying it to Kubernetes.

  1. Create a Kubernetes cluster and get credentials (example in GCP, more K8s providers here):
    gcloud container clusters create test --machine-type e2-highmem-2  --num-nodes 1 --zone europe-west3-a
    gcloud container clusters get-credentials test --zone europe-west3-a --project jina-showcase
  2. Move each Executor class to a separate folder with one Python file:
    • PreprocImg -> ๐Ÿ“ preproc_img/exec.py
    • EmbedImg -> ๐Ÿ“ embed_img/exec.py
    • MatchImg -> ๐Ÿ“ match_img/exec.py
  3. Push all Executors to Jina Hub:
    jina hub push preproc_img
    jina hub push embed_img
    jina hub push embed_img
    You will get three Hub Executors that can be used via Docker container.
  4. Adjust Flow a bit and open it:
    f = Flow(name='readme-flow', port_expose=12345, infrastructure='k8s').add(uses='jinahub+docker://PreprocImg').add(uses='jinahub+docker://EmbedImg', replicas=3).add(uses='jinahub+docker://MatchImg')
    with f:
        f.block()

Intrigued? Then find more about Jina from our docs.

Run Quick Demo

Support

Join Us

Jina is backed by Jina AI and licensed under Apache-2.0. We are actively hiring AI engineers, solution engineers to build the next neural search ecosystem in open source.

Contributing

We welcome all kinds of contributions from the open-source community, individuals and partners. We owe our success to your active involvement.

All Contributors

About

Cloud-native neural search framework for ๐™–๐™ฃ๐™ฎ kind of data

https://docs.jina.ai

License:Apache License 2.0


Languages

Language:Python 54.8%Language:HTML 44.2%Language:Shell 0.3%Language:CSS 0.2%Language:Dockerfile 0.2%Language:EJS 0.1%Language:JavaScript 0.1%