Trial PathFinder

Python library for Trial Pathfinder, an AI framework to systematically evaluate clinical trial eligibility criteria. Functions provided by TrialPathfinder: encoding eligibility criteria, emulating existing trials under combinations of eligibility rules, evaluating individual eligibility rule with Shapley value and suggesting data-driven criteria.

Reference paper: Evaluating Oncology Trial Eligibility Criteria using Real-World Data and AI.

Installation

The package TrialPathfinder is available on PyPI

pip install TrialPathfinder

We also provide the option for manual installation: download this Github repository and run

cd TrialPathfinder/
python setup.py install --user

Quick Guidance

Here we give a quick guidance of using TrialPathfinder. More details see tutorial/tutorial.ipynb.

import TrialPathfinder as tp

###### Encode Eligibility Criteria #####

# Create cohort selection object
cohort = tp.cohort_selection(patientids, name_PatientID='PatientID')
# Add the data tables needed in the eligibility criterion
cohort.add_table(name_table1, table1)
# Add individual eligibility criterion
cohort.add_rule(rule1)

###### Emulate Existing Trials and Survival Analysis ######

# Given a combination of eligibility rules names_rules (an empty list name_rules=[] indicates fully-relaxed criteria)).
HR, CI, data_cox = tp.emulate_trials(cohort, features, drug_treatment, drug_control, name_rules)

###### Evaluate Individual Criterion ######

# Return the Shapley values for each rule in names_rules
shapley_values = tp.shapley_computation(cohort, features, drug_treatment, drug_control, names_rules)

###### Criteria Relaxation - Data-driven Criteria ######

# Select all the rules with Shapley value less than 0.
names_rules_relax = names_rules[shapley_values<0.]
# Survival analysis on the data-driven criteria
HR, CI, data_cox = tp.emulate_trials(cohort, features, drug_treatment, drug_control, name_rules_relax)

Documentation

We highly recommend reading the tutorial/tutorial.ipynb.

1. Data Requirements

TrialPathfinder reads tables in Pandas dataframe structure (pd.dataframe) as default. The date information should be read as datetime (use function pd.to_datetime to convert if not).

1. Features:

Patient ID
Treatment Information
- Drug name.
- Start date.
- Date of outcome. For example, if overall survival (OS) is used as metric, the date of outcome is the date of death. If progression-free survival (PFS) is used as metric, the date of outcome is the date of progression.
- Date of last visit. The patient's last record date of visit, used for censoring.
Covariates (optional): adjusted to emulate the blind assignment, used by Inverse probability of treatment weighting (IPTW) or propensity score matching (PSM). Some examples: age, gender, composite race/ethnicity, histology, smoking status, staging, ECOG, and biomarkers status.

2. Tables used by eligibility criteria.

Use the same Patient ID as the features table.

2. Standards of encoding eligibility criteria

We built a computational workflow to encode the description of eligibility criteria in the protocols into standardized instructions which can be parsed by Trial Pathfinder for cohort selection use.

1. Basic logic.

Name of the criteria is written in the first row.
A new statement starts with “#inclusion” or “#exclusion” to indicate the criterion’s type. Whether to include patients who have missing entries in the criteria: “(missing include)” or “(missing exclude)”. The default choice is including patients with missing entries.
Data name format: “Table[‘featurename’]”. For example, “demographics[‘birthdate’]” denotes column date of birth in table demographics.
Equation: ==, !=, <, <=, >, >=.
Logic: AND, OR.
Other operations: MIN, MAX, ABS.
Time is encoded as “DAYS(80)”: 80 days; “MONTHS(4)”: 4 months; “YEARS(3)”: 3 years.

Example: criteria "Age" - include patients more than 18 years old when they received the treatment.

Age
#Inclusion
features['StartDate'] >= demographics['BirthDate'] + @YEARS(18>

2. Complex rule with hierarchy.

Each row is operated in sequential order
- The tables are prepared before the last row.
- The patients are selected at the last row.

Example: criteria "Platelets" - include patients whose platelet count ≥ 100 x 10^3/μL.
To encode this criterion, we follow the procedure:

Prepare the lab table:
1. Pick the lab tests for platelet count
2. The lab test date should be within a -28 to +0 window around the treatment start date
3. Use the record closest to the treatment start date to do selection.
Select patients: lab value larger than 100 x 10^3/μL.

Platelets
#Inclusion
(lab['LabName'] == 'Platelet count')
(lab['TestDate'] >= features['StartDate'] - @DAYS(28) ) AND (lab['TestDate'] <= features['StartDate'])
MIN(ABS(lab['TestDate'] - features['StartDate']))
lab['LabValue'] >= 100

zonghui0228 / TrialPathfinder