AstraZeneca / KAZU

Fast, world class biomedical NER

Home Page:https://AstraZeneca.github.io/KAZU/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Explain running Kazu in a Notebook in Quickstart

raylite opened this issue · comments

I am trying to run Kazu (documentation) example in a Notebook but it raised an error.

  • Kazu pip install
  • model pack unzipped and pathe set using os.environ["KAZU_MODEL_PACK"]

I got this error: ipykernel_launcher.py: error: unrecognized arguments: -f

Not sure where the arguments is being passed and what to do. See the attached images for error dump

Screenshot 2024-02-01 192156
Screenshot 2024-02-01 192254

Hi,

This appears to be the Jupyter Notebook interacting poorly with Hydra's behaviour. I assume you're using the code in the Quickstart docs?

if so, this is the appropriate code for a notebook - I've just tested it myself and got it working:

first cell:

from hydra import compose, initialize_config_dir
from hydra.utils import instantiate

from kazu.data.data import Document
from kazu.pipeline import Pipeline
from kazu.utils.constants import HYDRA_VERSION_BASE
from pathlib import Path
import os

# the hydra config is kept in the model pack
cdir = Path(os.environ["KAZU_MODEL_PACK"]).joinpath("conf")


def kazu_test():
    with initialize_config_dir(version_base=HYDRA_VERSION_BASE, config_dir=str(cdir)):
        cfg = compose(config_name="config")
    pipeline: Pipeline = instantiate(cfg.Pipeline)
    text = "EGFR mutations are often implicated in lung cancer"
    doc = Document.create_simple_document(text)
    pipeline([doc])
    print(f"{doc.sections[0].text}")

second cell:

kazu_test()

Let me know if that works for you or if you have further issues.

I'll think about how best we can add to the quickstart documentation so that notebook users are better supported. Thanks for opening the issue so we had the opportunity to improve this! And thanks for giving kazu a try 😃

Yes, thank you for your response, later yesterday I tried the example using hydra compose (same as you provided) and I can confirm it worked.

great! In that case, I think let's leave the ticket open until I've added documentation for others - but I'll rename the ticket to reflect that if it's ok?

Just to keep you updated, I've got a PR on our 'internal' version of the repo that resolves this, should be made public in the next release.