microsoft / BlingFire

A lightning fast Finite State machine and REgular expression manipulation library.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A list of feature requests for BlingFire

GeorgeS2019 opened this issue · comments

Through a recent evaluation of the feasibility of using BlingFire to tokenize GPT2 for .NET, it seems practical that there is need for interoperability of BlingFire with Tensor Text manipulation through a .NET library.

This issue aims to gather feedback, as there are potential new .NET users here who are interested of deep NLP to consider using dotnet/TorchSharp for interoperability with BligFiure, in the same spirit as use cases in PyTorch.

For these .NET users, one tentative idea is to look at NLP features provided PyTorch/Text to do an evaluation that many of the PyTorch.Text NLP functionalities have already provided by BlingFire and perhaps with better performance.

We need feedback, by looking through the functionalities provided by PyTorch/Text and make these PyTorch NLP features (through BlingFire) available in TorchSharp.

==> Likewise, these unmet .NET NLP features found in PyTorch/Text could provide ideas/inspiration what else to develop to improve BlingFire

Requests

Could BlingFire address all the tokenization needs listed here by Onnxruntime.Extension

image

I think BlingFire can solve most of the tokenization ops needs, whatever is missing please let me know I can add. It would be great to see BlingFire integrated into ONNX Extension Ops.

@SergeiAlonichau After your feedback, I dig into the codes: => BlingFire already integrated into ONNX Extension Ops

I wonder what else (from BlingFire) can be integrated. Has or should BlingFire being integrated and available from the distributed ortcustomops.dll?

Exported function

  • RegisterCustomOps
  • AddExternalCustomOp

Still exploring => can BlingFire learn from the implemented tokenizer to improve .NET tokenizer experience?