jbdel / vilmedic

ViLMedic (Vision-and-Language medical research) is a modular framework for vision and language multimodal research in the medical field

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

News

Papers
Toward Expanding the Scope of Radiology Report Summarization to Multiple Anatomies and Modalities Dataset
Overview of the RadSum23 Shared Task on Multi-modal and Multi-anatomical Radiology Report Summarization Challenge
Improving the Factual Correctness of Radiology Report Generation with Semantic Rewards Replicate

ViLMedic: a framework for research at the intersection of vision and language in medical AI

ViLMedic has a dedicated website at: https://vilmedic.app/



MIT License


@inproceedings{delbrouck-etal-2022-vilmedic,
    title = "{V}i{LM}edic: a framework for research at the intersection of vision and language in medical {AI}",
    author = "Delbrouck, Jean-benoit  and
      Saab, Khaled  and
      Varma, Maya  and
      Eyuboglu, Sabri  and
      Chambon, Pierre  and
      Dunnmon, Jared  and
      Zambrano, Juan  and
      Chaudhari, Akshay  and
      Langlotz, Curtis",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.acl-demo.3",
    pages = "23--34",
}

Quickstart and documentation

Rendez-vous at: https://vilmedic.app/installation/

Implemented solutions

ViLMedic replicates solutions from the multimodal medical literature.

Solutions
Medical Visual Question Answering
SYSU-HCP at VQA-Med 2021
Radiology report generation
Generating Radiology Reports via Memory-driven Transformer
Optimizing the Factual Correctness of a Summary: A Study of Summarizing Radiology Reports
Improving Factual Completeness and Consistency of Image-to-text Radiology Report Generation
Radiology report summarization
Multimodal Radiology Report Summarization
Multimodal self-supervised Learning
Contrastive Learning of Medical Visual Representations from Paired Images and Text
DALLE: Zero-Shot Text-to-Image Generation
CLIP: Learning Transferable Visual Models From Natural Language Supervision
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-efficient Medical Image Recognition

Blocks

Blocks
Natural Language Processing
HuggingFace transformer encoder and decoder
HuggingFace transformer beam-search and model ensembling 🔥
NLG metrics (BLEU, ROUGE, METEOR, MAUVE) and Radiology Reports Generation metrics (F1-CheXbert)
RadGraph
Vision
All PyTorch VisualEncoder architectures
Vision Transformer
TorchXRayVision
Losses
All PyTorch losses
ConVirt loss
GLoRIA loss
InfoNCE loss
SuperLoss
Reinforcement Learning
Self-critical Sequence Training (HuggingFace compliant) 🔥
PPO optimization (HuggingFace compliant)

Citation

If you use ViLMedic in your work or use any models published in ViLMedic, please cite:

License

ViLMedic is MIT-licensed. The license applies to the pre-trained models as well.

About

ViLMedic (Vision-and-Language medical research) is a modular framework for vision and language multimodal research in the medical field

License:MIT License


Languages

Language:Python 73.7%Language:Jupyter Notebook 26.3%