Project Name

-- Project Status: [In-Progress]

Project Intro/Objective

The purpose of this project is to identify the Named Entity Recognition in clinical reports. I used the Unified Medical Language System (UMLS) database for the baseline for disease and symptoms names. The data used in this project is "n2c2 NLP Research Data Sets". For UMLS and n2c2 dataset, we need proper license from the respective sites.

Notebooks are not committed to this repo. due to the n2c2 dataset PHI & policies.

Collaborators

Name	Github Page	Personal Website
Asha Ponraj	asha	www.devskrol.com

Methods Used

Conditional Random Fields from sklearn_crfsuite
Machine Learning techniques
NLP tasks
Predictive Modeling
Manual Data Annotation
Feature Engineering and Extraction
etc.

Technologies

Python
sklearn_crfsuite
MySql
Pandas, jupyter
HTML
etc.

Future Enhancements

Increase accuracy
CRF implementation
Negation detection
Differentiate Symptoms, Disease names, Exam names.

This file structure is based on the DSSG machine learning pipeline.

About

Languages

Language:Jupyter Notebook 97.2%Language:Python 1.4%Language:JavaScript 1.1%Language:Perl 0.1%Language:HTML 0.1%Language:CSS 0.0%