BestJex/jina

Cloud-Native Neural Search^[?] Framework for Any Kind of Data

Jina^🔊 allows you to build search-as-a-service powered by deep learning in just minutes.

🌌 All data types - Large-scale indexing and querying of any kind of unstructured data: video, image, long/short text, music, source code, PDF, etc.

🌩️ Fast & cloud-native - Distributed architecture from day one, scalable & cloud-native by design: enjoy containerizing, streaming, paralleling, sharding, async scheduling, HTTP/gRPC/WebSocket protocol.

⏱️ Save time - The design pattern of neural search systems, from zero to a production-ready system in minutes.

🍱 Own your stack - Keep end-to-end stack ownership of your solution, avoid integration pitfalls you get with fragmented, multi-vendor, generic legacy tools.

Run Quick Demo

👗 Fashion image search: jina hello fashion
🤖 QA chatbot: pip install "jina[chatbot]" && jina hello chatbot
📰 Multimodal search: pip install "jina[multimodal]" && jina hello multimodal
🍴 Fork the source of a demo to your folder: jina hello fork fashion ../my-proj/

Install

via PyPI: pip install -U "jina[standard]"
via Docker: docker run jinaai/jina:latest

More installation options

x86/64, arm64, v6, v7	Linux/macOS & Python 3.7/3.8/3.9	Docker Users
Minimum _{(no HTTP, WebSocket, Docker support)}	`pip install jina`	`docker run jinaai/jina:latest`
_Daemon	_{pip install "jina[daemon]"}	_{docker run --network=host jinaai/jina:latest-daemon}
_{With Extras}	_{pip install "jina[devel]"}	_{docker run jinaai/jina:latest-devel}

Version identifiers are explained here. Jina can run on Windows Subsystem for Linux. We welcome the community to help us with native Windows support.

Get Started

Document, Executor, and Flow are the three fundamental concepts in Jina.

📄 Document is the basic data type in Jina;
⚙️ Executor is how Jina processes Documents;
🔀 Flow is how Jina streamlines and distributes Executors.

1️⃣ Copy-paste the minimum example below and run it:

^{💡 Preliminaries: character embedding, pooling, Euclidean distance}

import numpy as np
from jina import Document, DocumentArray, Executor, Flow, requests

class CharEmbed(Executor):  # a simple character embedding with mean-pooling
    offset = 32  # letter `a`
    dim = 127 - offset + 1  # last pos reserved for `UNK`
    char_embd = np.eye(dim) * 1  # one-hot embedding for all chars

    @requests
    def foo(self, docs: DocumentArray, **kwargs):
        for d in docs:
            r_emb = [ord(c) - self.offset if self.offset <= ord(c) <= 127 else (self.dim - 1) for c in d.text]
            d.embedding = self.char_embd[r_emb, :].mean(axis=0)  # average pooling

class Indexer(Executor):
    _docs = DocumentArray()  # for storing all documents in memory

    @requests(on='/index')
    def foo(self, docs: DocumentArray, **kwargs):
        self._docs.extend(docs)  # extend stored `docs`

    @requests(on='/search')
    def bar(self, docs: DocumentArray, **kwargs):
        q = np.stack(docs.get_attributes('embedding'))  # get all embeddings from query docs
        d = np.stack(self._docs.get_attributes('embedding'))  # get all embeddings from stored docs
        euclidean_dist = np.linalg.norm(q[:, None, :] - d[None, :, :], axis=-1)  # pairwise euclidean distance
        for dist, query in zip(euclidean_dist, docs):  # add & sort match
            query.matches = [Document(self._docs[int(idx)], copy=True, scores={'euclid': d}) for idx, d in enumerate(dist)]
            query.matches.sort(key=lambda m: m.scores['euclid'].value)  # sort matches by their values

f = Flow(port_expose=12345, protocol='http', cors=True).add(uses=CharEmbed, parallel=2).add(uses=Indexer)  # build a Flow, with 2 parallel CharEmbed, tho unnecessary
with f:
    f.post('/index', (Document(text=t.strip()) for t in open(__file__) if t.strip()))  # index all lines of _this_ file
    f.block()  # block for listening request

2️⃣ Open http://localhost:12345/docs (an extended Swagger UI) in your browser, click /search tab and input:

{"data": [{"text": "@requests(on=something)"}]}

That means, we want to find lines from the above code snippet that are most similar to @request(on=something). Now click Execute button!

3️⃣ Not a GUI guy? Let's do it in Python then! Keep the above server running and start a simple client:

from jina import Client, Document
from jina.types.request import Response


def print_matches(resp: Response):  # the callback function invoked when task is done
    for idx, d in enumerate(resp.docs[0].matches[:3]):  # print top-3 matches
        print(f'[{idx}]{d.scores["euclid"].value:2f}: "{d.text}"')


c = Client(protocol='http', port_expose=12345)  # connect to localhost:12345
c.post('/search', Document(text='request(on=something)'), on_done=print_matches)

, which prints the following results:

         Client@1608[S]:connected to the gateway at localhost:12345!
[0]0.168526: "@requests(on='/index')"
[1]0.181676: "@requests(on='/search')"
[2]0.192049: "query.matches = [Document(self._docs[int(idx)], copy=True, score=d) for idx, d in enumerate(dist)]"

^{😔 Doesn't work? Our bad! Please report it here.}

Read Tutorials

🧠 What is "Neural Search"?
📄 Document & DocumentArray: the basic data type in Jina.
⚙️ Executor: how Jina processes Documents.
🔀 Flow: how Jina streamlines and distributes Executors.
- Minimum Working Example
- Flow API
🤹 Serving Jina
📓 Developer Reference
🧼 Clean & Efficient Coding in Jina
😎 3 Reasons to Use Jina 2.0

Support

Join our Slack community to chat to our engineers about your use cases, questions, and support queries.
Join our Engineering All Hands meet-up to discuss your use case and learn Jina's new features.
- When? The second Tuesday of every month
- Where? Zoom (see our public events calendar/.ical) and live stream on YouTube
Subscribe to the latest video tutorials on our YouTube channel