mazzzystar / Queryable

Run OpenAI's CLIP model on iOS to search photos.

Home Page:https://queryable.app

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multiple languages supports

mazzzystar opened this issue · comments

Because the CLIP model is language-dependent, if we use something like Multilingual-CLIP (supporting 40+ languages), the size of the Queryable app would exceed 1GB. And because of offline limitations, it's impossible to use a translation API, which is why Queryable currently only supports English.

I have added a script to export PyTorch models to Core ML. Therefore, if you're interested, you can train a CLIP model in your own language and integrate it into Queryable. New model additions are welcome : )

hello, @mazzzystar .

how to modify script to support Chinese-CLIP?

I try to convert https://github.com/OFA-Sys/Chinese-CLIP to Core ML via script but failed.

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[30], line 6
      3 text_encoder.eval()
      5 example_input = clip.tokenize("a diagram").to(device)
----> 6 traced_model = torch.jit.trace(text_encoder, example_input)
      7 out = traced_model(example_input)

File ~/anaconda3/envs/py3817/lib/python3.8/site-packages/torch/jit/_trace.py:794, in trace(func, example_inputs, optimize, check_trace, check_inputs, check_tolerance, strict, _force_outplace, _module_class, _compilation_unit, example_kwarg_inputs, _store_inputs)
    792         else:
    793             raise RuntimeError("example_kwarg_inputs should be a dict")
--> 794     return trace_module(
    795         func,
    796         {"forward": example_inputs},
    797         None,
    798         check_trace,
    799         wrap_check_inputs(check_inputs),
    800         check_tolerance,
    801         strict,
    802         _force_outplace,
    803         _module_class,
    804         example_inputs_is_kwarg=isinstance(example_kwarg_inputs, dict),
    805         _store_inputs=_store_inputs
    806     )
    807 if (
    808     hasattr(func, "__self__")
    809     and isinstance(func.__self__, torch.nn.Module)
    810     and func.__name__ == "forward"
    811 ):
    812     if example_inputs is None:

File ~/anaconda3/envs/py3817/lib/python3.8/site-packages/torch/jit/_trace.py:1056, in trace_module(mod, inputs, optimize, check_trace, check_inputs, check_tolerance, strict, _force_outplace, _module_class, _compilation_unit, example_inputs_is_kwarg, _store_inputs)
   1054 else:
   1055     example_inputs = make_tuple(example_inputs)
-> 1056     module._c._create_method_from_trace(
   1057         method_name,
   1058         func,
   1059         example_inputs,
   1060         var_lookup_fn,
   1061         strict,
   1062         _force_outplace,
   1063         argument_names,
   1064         _store_inputs
   1065     )
   1067 check_trace_method = module._c._get_method(method_name)
   1069 # Check the trace against new traces created from user-specified inputs

File ~/anaconda3/envs/py3817/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/anaconda3/envs/py3817/lib/python3.8/site-packages/torch/nn/modules/module.py:1488, in Module._slow_forward(self, *input, **kwargs)
   1486         recording_scopes = False
   1487 try:
-> 1488     result = self.forward(*input, **kwargs)
   1489 finally:
   1490     if recording_scopes:

Cell In[23], line 69, in TextEncoder.forward(self, text)
     65 def forward(self, text):
     66     # print(f'text: {text}')
     67     x = self.token_embedding(text).type(self.dtype)  # [batch_size, n_ctx, d_model]
---> 69     x = x + self.positional_embedding.type(self.dtype)
     70     x = x.permute(1, 0, 2)  # NLD -> LND
     71     x = self.transformer(x)

RuntimeError: The size of tensor a (52) must match the size of tensor b (77) at non-singleton dimension 1

The two model architecture is different, that's why there is an error, you may change the network to fit Chinese-CLIP.

This is a fantastic project, and I'm very interested in it. I'm currently learning about this project, and I will attempt to provide multi-language support. If I succeed, I will update it here. Thank you to the author for their selfless sharing. This is truly an amazing project.

@Enternalcode That's very kind of you, thank you so much!