aphp / edsnlp

Modular, fast NLP framework, compatible with Pytorch and spaCy, offering tailored support for French clinical notes.

Home Page:https://aphp.github.io/edsnlp/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

edsnlp.data.read_standoff() the tokenizer argument fails with EDSTokenizer

Aremaki opened this issue · comments

edsnlp.data.read_standoff() the tokenizer argument fails with EDSTokenizer

Description

Pydantic expects Tokenizer and not EDSTokenizer. The issue is that EDSTokenizer inherits from spacy.DummyTokenizer class instead of spacy.Tokenizer.

How to reproduce the bug

import edsnlp

nlp = edsnlp.blank("eds")
path = "path to BRAT file"
docs = list(edsnlp.data.read_standoff(
            path,
            tokenizer=nlp.tokenizer,
        ))

Your Environment

  • Operating System: Windows
  • Python Version Used: 3.10.13
  • spaCy Version Used: 3.7.2
  • EDS-NLP Version Used: 0.10.5
  • Environment Information:

Thank you for the issue ! This was just fixed in #260, hopefully this works for you 🤞