milvus-io / bootcamp

Dealing with all unstructured data, such as reverse image search, audio search, molecular search, video analysis, question and answer systems, NLP, etc.

Home Page:https://milvus.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[FEATURE]: 图片搜索服务qps很低

yylstudy opened this issue · comments

Is there an existing issue for this?

  • I have searched the existing issues

Is your feature request related to a problem? Please describe.

towhee:0.9.0
milvus:2.0.0
img-search-server:towhee
以官方例子搭建图片搜索服务,发现qps只有1左右
https://github.com/milvus-io/bootcamp/tree/master/solutions/image/reverse_image_search/quick_deploy
查看了img-search-server服务的源码,去除了相关的async的代码,相关代码如下
image
image

最终发现,resnet50_extract_feat方法是单线程的,这个有什么办法能提升图片转向量的性能呢,希望得到回答,非常谢谢!

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Anything else?

No response

Hi @yylstudy, you can try to update the Resnet50 in encode.py with the following code:

Note that you need to install towhee with pip install towhee==1.0.0rc1. And will release 1.0.0 later.

from towhee import pipe, ops


class Resnet50:
    def __init__(self):
        self.image_embedding_pipe = (
            pipe.input('path')
                .map('path', 'img', ops.image_decode.cv2_rgb())
                .map('img', 'embedding', ops.image_embedding.timm(model_name='resnet50'))
                .map('embedding', 'embedding', ops.towhee.np_normalize())
                .output('embedding')
            )

    def resnet50_extract_feat(self, img_path):
        feat = self.image_embedding_pipe(img_path)
        return feat.get()[0]

The new towhee code will just load the pipeline just one time.

@shiyu22 你好,麻烦再请问下,性能现在是没问题了,但是用上面代码测试后发现distance会变得非常大,之前都是
0.1~1.x之间,这个是为什么呢,谢谢
image

现在milvus版本为v2.0.2,distance的值大小是否与milvus版本有关系

现在milvus版本为v2.0.2,distance的值大小是否与milvus版本有关系

No, it's because there is no normalization of the vector. I will add np_normalize() for it.

Hi @yylstudy, you can try to update the Resnet50 in encode.py with the following code:

Note that you need to install towhee with pip install towhee==1.0.0rc1. And will release 1.0.0 later.

from towhee import pipe, ops


class Resnet50:
    def __init__(self):
        self.image_embedding_pipe = (
            pipe.input('path')
                .map('path', 'img', ops.image_decode.cv2_rgb())
                .map('img', 'embedding', ops.image_embedding.timm(model_name='resnet50'))
                .map('embedding', 'embedding', ops.towhee.np_normalize())
                .output('embedding')
            )

    def resnet50_extract_feat(self, img_path):
        feat = self.image_embedding_pipe(img_path)
        return feat.get()[0]

The new towhee code will just load the pipeline just one time.

@yylstudy can you try it again?

And if it fixed your issue, can you submit a PR to modify the encode.py and be our contributor?

@shiyu22 非常感谢,解决了我的问题,去除async后,现在qps从原先的1到达30+了,基于此issue我提交了一个pr,不好意思我之前还未提交过pr,麻烦你看下是否符合规范,再次感谢您!

It's my pleasure:) And thanks for your contribution.

你这个图搜图的效果怎么样?我在用这个的时候发现把已经导入进去的图截取一部分进行搜索,搜不出来之前完整的图片。

cropping image may cause significant change of the semantic. so it's not surprising to not return the original full picture.