asyml / ForteHealth

The project is in the incubation stage and still under development. ForteHealth is a flexible and powerful ML workflow builder for biomedical and clinical scenarios. This is part of the CASL project: http://casl-project.ai/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

mimic-iii pipeline

tanyuqian opened this issue · comments

Is your feature request related to a problem? Please describe.
Adapting the clinical pipeline from the forte example into this pacakge.

Describe the solution you'd like
place the components (ontology, dataset reader, and processor, etc.) into corresponding folders in forte_medical/, and provide the pipeline example into examples/mimic-iii/.

Describe alternatives you've considered
Given it's the first example in this package, so feel free to change the code framework if necessary.

@hunterhector Are we planning to have independent ontology, dataset readers and processors for this repo or will we be reusing most of these components from forte itself?

I don't think the main Forte repo itself provides enough components. It should be a case-by-case decided by @tanyuqian

After reviewing the code of PR #2 detailedly, I realize some points to polish in our code:

  • removing redundant files, e.g., __pycache__ , .DS_Store etc.
  • building ontologies in the root folder of this repo instead of in the ontologie_specs/. (the folder can be named as "ftm/" and work similarly as ft/) in forte.

@hunterhector Are we planning to have independent ontology, dataset readers and processors for this repo or will we be reusing most of these components from forte itself?

  • we don't expect the users of this package to learn the use of forte/fortex, so yes, we want to have an independent ontology (except concepts in the base_ontology of forte), dataset readers and processors. Some of them may just be trivial wrappers of existing components in forte/fortex.

Thanks to Bowen for the review. To help clarify one of Bowen's points, we will need to put the ontology at the root folder with a good ontology prefix. "demo.clinical" is not really a permanent choice. Also, since the generated content is python source, so we don't want to put them in another directory like "ontologies_specs", we simply put them at the top level, so that the package prefix actually matches the package path.

Also note that we do not want to mix generated sources with your own code (this didn't happen, just a heads up in advance).

Btw, there might be reasons that we want to stick to "ftx". One reason is that annotators in forte-wrappers are already using the classes under "ftx". Another reason is that we keep inventing new prefixes for every project we will have a lot by the end of the day :p. So, to do this, we should use name space packageing. This will ensure even we have "ftx" package in different places they can still be imported.

We are developing the new namespace packaging feature (asyml/forte#577) in the generator, but we can also do it manually by removing an __init__ file.

So the directory structure would look like this:

- forte_medical (source code our main content)
- ftx (generated source code)
  |_ med
      |_ __init__.py (note that __init__ is only at this level, not in the ftx folder)
      |_ clinical
      |_ biological

Note that we can even modify the "forte-wrapper" repo to make both sides to be consistent if we want.

Thank you for the review.
I have created a PR for the immediate changes.

Also, creating issues in forte and forte-wrappers, to make the ftx structure consistent since its a small task.
@hunterhector I wasn't sure if 'biological' would be the right tag for our existing medical ontology classes, i.e. UMLSConceptLink and MedicalEntityMention. Just wanted to double check this once.

Also, creating issues in forte and forte-wrappers, to make the ftx structure consistent since its a small task. @hunterhector I wasn't sure if 'biological' would be the right tag for our existing medical ontology classes, i.e. UMLSConceptLink and MedicalEntityMention. Just wanted to double check this once.

You are right, I used biological just as an example so you know there can be multiple files there, we probably will stick with clinical for a while.