aasha01 / ClinicalNER

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Project Name

-- Project Status: [In-Progress]

Project Intro/Objective

The purpose of this project is to identify the Named Entity Recognition in clinical reports. I used the Unified Medical Language System (UMLS) database for the baseline for disease and symptoms names. The data used in this project is "n2c2 NLP Research Data Sets". For UMLS and n2c2 dataset, we need proper license from the respective sites.

Notebooks are not committed to this repo. due to the n2c2 dataset PHI & policies.

Symptoms, Disease Names Extraction

Collaborators

Name Github Page Personal Website
Asha Ponraj asha www.devskrol.com

Methods Used

  • Conditional Random Fields from sklearn_crfsuite
  • Machine Learning techniques
  • NLP tasks
  • Predictive Modeling
  • Manual Data Annotation
  • Feature Engineering and Extraction
  • etc.

Technologies

  • Python
  • sklearn_crfsuite
  • MySql
  • Pandas, jupyter
  • HTML
  • etc.

Future Enhancements

  1. Increase accuracy
  2. CRF implementation
  3. Negation detection
  4. Differentiate Symptoms, Disease names, Exam names.

This file structure is based on the DSSG machine learning pipeline.

About


Languages

Language:Jupyter Notebook 97.2%Language:Python 1.4%Language:JavaScript 1.1%Language:Perl 0.1%Language:HTML 0.1%Language:CSS 0.0%