tattle-made / feluda

A configurable engine for analysing multi-lingual and multi-modal content.

Home Page:https://tattle.co.in/products/feluda/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add basic authentication to Feluda's /search endpoint.

dennyabrain opened this issue · comments

All API requests from clients go via Kosh API server and this has worked well for us in the past. When it comes to searching for images or videos, going to feluda via kosh server adds a latency that makes our search feel very slow. our image search engine actually returns results in milliseconds but adding these intermediary nodes between the client and search server causes latency of 3,5 seconds. In our current code for search operations clients talk directly to feluda and the experience is great but this can't be deployed as it would be insecure.

commented

Based on documentation,

There are bunch of actionable here:

  1. Why would accessing Feluda API via Kosh add a latency. They are on same AWS network. This should not be happening. If image search engine response is fast, we could start with Kosh's code that interact with Search API. This could belong to a separate issue.
  2. Adding authentication to search endpoint.

I came across a library: Flask-HTTPAuth that is under active development and can give us feature of adding different types of authentication to this endpoint. I haven't explored native flask's support on adding authentication to an endpoint and maybe that could be even more straightforward.

So while remaining of Kosh's interface will use Kosh's API, the search endpoint would point to Feluda API. In that situation, we would want the authentication to be common across both Kosh and Feluda. Adding another, separate login for search would break the UX.

Reg 1. I should have clarified this.
Take a look at this part of the code, it will give you more clarity

elif "multipart/form-data" in request.content_type:
data = json.load(request.files["data"])
if data["query_type"] == "image":
file = request.files["media"]
print(file, type(file))
image_obj = media_factory[MediaType.IMAGE].make_from_file_in_memory(
file
)
image_vec = self.feluda.operators.active_operators[
"image_vec_rep_resnet"
].run(image_obj)
results = self.feluda.store.find("image", image_vec)
return {"matches": results}
elif data["query_type"] == "video":
file = request.files["media"]
print(file, type(file))
vid_obj = media_factory[MediaType.VIDEO].make_from_file_in_memory(
file
)
vid_vec = self.feluda.operators.active_operators[
"vid_vec_rep_resnet"
].run(vid_obj)
average_vector = next(vid_vec)
results = self.feluda.store.find("image", average_vector)
return {"matches": []}

You correctly said that the /search endpoint should not add latency since both kosh and feluda are on the same AWS network.
The latency happens when we are trying to do /search on image and video files, in which case you send the image along with some metadata.

In an earlier version image and video search flow from client to feluda would involve :

  1. Client uploads the file to s3
  2. Client makes an API call to kosh
  3. Kosh makes an API call to /search on feluda with some metadata (including the file's s3 url)
  4. Feluda downloads the file
  5. Feluda processes the file

We realized that a bulk of the latency in our search operation was coming from (4)
So when i tried to make the new search flow as

  1. client uploads the file AND some metadata to Feluda
  2. Feluda processes the file
    I noticed a drastic improvement in latency (went from seconds to milliseconds)
    But of course this involved exposing the feluda server to public internet. Which requires some fix.

Yeah i was also hoping some library like the one you attached would help us secure this.
Similar to what you pointed about about not breaking the UX, I was also unsure how do you reconcile having 1 client use 2 servers that have potentially different authentication.
So for instance right now clients interact with kosh via an accessToken that they receive after logging in. We'll need to figure out if this same token can be reused for feluda authentication (and if that is secure to do)