Comparison of effects of four anti-cancer drugs on three cancer metrics in mice

Overview

This study was about assessing the effects of four anti-cancer drug therapies on three measures of cancer. Mice were used as the test subjects, following a repeat measures design. This study focused on data visualisation rather than on hypothesis testing; hence, the code used does not implement linear mixed models and post-hoc tests to compare means.

Getting Started

Python modules Pandas, Numpy, and MatPlotLib were used for data analyses and visualisation.

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

To standardise the size and the colour schemes for each graph to be generated, the style was set to ggplot and the dimensions were set before any of the analyses were conducted.

# Choose ggplot as style for plots
plt.style.use('ggplot')

# Size of plots
fig_size = plt.rcParams["figure.figsize"] # get current size
fig_size[0] = 12
fig_size[1] = 8
plt.rcParams["figure.figsize"] = fig_size # customise plot size

The data came from two .csv files: mouse_drug_data.csv and clinicaltrial_data.csv. These files were merged into a dataframe mt_df. Four of the ten treatments were used for the analyses; hence, the dataframe was further filtered so that only the four drug therapies of concern were included.

# Load csv
mouse_drug_data_to_load = "mouse_drug_data.csv"
clinical_trial_data_to_load = "clinicaltrial_data.csv"

# Read the Mouse and Drug Data and the Clinical Trial Data
mouse_df = pd.read_csv(mouse_drug_data_to_load)
trial_df = pd.read_csv(clinical_trial_data_to_load)

# Combine the data into a single dataset (mt = mouse trial)
mt_df = pd.merge(mouse_df, trial_df, on = "Mouse ID")

# Display the data table for preview
mt_df.head()

# select the four drugs for comparison
list_of_drugs = ["Capomulin", "Infubinol", "Ketapril", "Placebo"]
mt_df = mt_df[mt_df["Drug"].isin(list_of_drugs)]
mt_df.head()

Data Analyses

Experiment Overview

The experiment followed a repeat measures design in which there were two treatments: time and drug. Mice were randomly assigned to each drug treatment and response variables, tumour volume and number of metastatic sites. A third response variable, number of surviving mice, was extracted based on the number of observations per time. A table was generated to provide an overview of the experimental set-up.

# list of subjects and treatments
mouse = mt_df["Mouse ID"].unique()
drugs = mt_df["Drug"].unique()
time = mt_df["Timepoint"].unique()

# counts of subjects and treatments
mouse_popn = len(mouse)
no_drugs = len(drugs)
no_measurements = len(time)
no_samples = no_drugs * no_measurements

# summarise in a dataframe
overview = pd.DataFrame({"Number of Mice": mouse_popn,
                         "Number of Drug Treatments": [no_drugs],
                         "Number of Time Measurements": [no_measurements],
                         "Number of Samples": [no_samples]})

Tumour Analyses: Data Preparation

A new dataframe, mt_df2, was generated by dropping the column containing the data on metastatic sites. The new dataframe was then converted into a groupby object in which the results were grouped according to both Timepoint and Drug.

# Group the data by drug and timepoint
mt_df2 = mt_df.drop("Metastatic Sites", axis = 1)
mt_grped = mt_df2.groupby(["Drug","Timepoint"])

The means and the standard errors of the mean (sem) of tumour volumes were then calculated and placed in separate dataframes tumour_means and tumour_sem.

# Get the mean and SEM of tumour volume
tumour_means = pd.DataFrame(mt_grped["Tumor Volume (mm3)"].mean()) # mean for each Drug-Timepoint combination
tumour_sem = pd.DataFrame(mt_grped["Tumor Volume (mm3)"].sem()) # SEM for each Drug-Timepoint combination

Because both dataframes were "stacked" (long), these were unstacked.

# reshape the data
tumour_means = tumour_means.unstack(0)
tumour_sem = tumour_sem.unstack(0)

tumour_means = tumour_means["Tumor Volume (mm3)"]
tumour_sem = tumour_sem["Tumor Volume (mm3)"]

The values for the x-axis were then assigned.

# values for plotting
x_axis = np.arange(0, time.max() + 5, 5) # time
no_series = np.arange(0,no_drugs)

Tumour Size Changes

To determine changes in tumour volume, the tumour sizes at the beginning of the study (t = 0) and at the end of the study (t = 45) were extracted from tumour_means. Calculating the percentage change followed.

# % tumour size change between time 0 and time 45
pct_tumour_change = round((((j - a) / j) * 100),2)
pct_tumour_change

These values were plotted on a bar graph in which tumour volume increases were positive (and green) and the decreases were negative (and red).

# graph the tumour changes
plt.bar(no_series,
        pct_tumour_change,
        color = ["green" if pct_tumour_change[i] > 0 else "red" for i in no_series])
plt.xticks(no_series, xlabels)
plt.title("Tumour Volumes Across Drug Therapies")
plt.xlabel("Drug")
plt.ylabel("Change in tumour volume (%)")
plt.axhline(y = 0, color = "black")

The bar graph was further annotated with the percentage values (white font).

# Add label inside the bar graph (%)
count = 0

for i in pct_tumour_change:
    if i < 0:
        y = -4
    else:
        y = 3
    plt.text(count, y, str(round(i, 1)) + '%', ha = 'center', color = 'white')
    count += 1

Tumour Response Over Time

A line chart was generated to show the tumour volumes against time. SEMs were also included for each data point.

# Plot means and SE

for i in no_series:
    std_error = tumour_sem[drugs[i]]
    plt.errorbar(x_axis, 
                 tumour_means[drugs[i]], 
                 yerr = std_error, 
                 marker = "o", capsize = 3)
    plt.title("Comparison of Tumour Reponse to Each Drug Therapy During the Treatment")
    plt.xlabel("Day Number")
    plt.ylabel("Tumour Volume (mm3)")
    plt.xlim(-5, max(time) + 5)
    plt.ylim(30, 75)
    plt.legend()

It was also interesting to plot the rate of change in tumour volume. Hence, the percent changes were calculated

# Data transformation to get %tumour size change
pct_tumour_change = (tumour_means.diff() / tumour_means) * 100

and then plotted.

# Prepare plot of tumour change vs time
plt.plot(x_axis,
         pct_tumour_change,
         marker = "o")

plt.xlabel("Time (days)")
plt.ylabel("Change in tumour volume (%)")
plt.legend(tumour_means.keys())

Metastasis Analysis: Data Preparation

Another dataframe, mt_df3, was created by dropping tumour volume data from the mt_df dataframe. The data was then grouped by Drug and Timepoint.

mt_df3 = mt_df.drop("Tumor Volume (mm3)", axis = 1)
mt_grped2 = mt_df3.groupby(["Drug","Timepoint"])

The means and the SEMs for the number of metastatic sites were calculated and placed in a new dataframe. The dataframes were unstacked as well.

# Get the mean and SEM of the number of metastatic sites
meta_means = pd.DataFrame(mt_grped2["Metastatic Sites"].mean()) # mean for each Drug-Timepoint combination
meta_sem = pd.DataFrame(mt_grped2["Metastatic Sites"].sem()) # SEM for each Drug-Timepoint combination

# reshape the data
meta_means = meta_means.unstack(0)
meta_sem = meta_sem.unstack(0)

meta_means = meta_means["Metastatic Sites"]
meta_sem = meta_sem["Metastatic Sites"]

Metastatic Reponse to Treatment

A plot of the means and the SEMs of the number of metastatic sites was generated.

# Plot means and SE

x_axis = np.arange(0,time.max() + 5, 5) # time
no_series = np.arange(0,no_drugs)

for i in no_series:
    std_error = meta_sem[drugs[i]]
    plt.errorbar(x_axis, 
                 meta_means[drugs[i]], 
                 yerr = std_error, 
                 marker = "o", capsize = 3)
    plt.title("Metastatic Response to the Drugs Across the Treatment ")
    plt.xlabel("Day Number")
    plt.ylabel("Number of Metastasic Sites")
    plt.xlim(-5, max(time) + 5)
    plt.ylim(0, 5)
    plt.legend()

Survival Rates

There are two approaches for determining survival rates. One is to get a count of the population at each timepoint. The number of surviving mice per timepoint and per drug was determined based on the number of observations. This was put in a new dataframe, no_mice.

# Store the Count of Mice Grouped by Drug and Timepoint (W can pass any metric)
no_mice = mt_grped["Mouse ID"].count()
no_mice = pd.DataFrame(no_mice.unstack(0))
no_mice

The results could be visualised as a line plot.

# Plot number of surviving mice per unit time

x_axis = np.arange(0,time.max() + 5, 5) # time
no_series = np.arange(0,no_drugs)

for i in no_series:
    plt.plot(x_axis, 
             no_mice[drugs[i]], 
             marker = "o")
    plt.title("Number of Mice Surviving Each Day of Treatment")
    plt.xlabel("Day Number")
    plt.ylabel("Number of Mice")
    plt.xlim(-5, max(time) + 5)
    plt.ylim(0, 30)
    plt.legend()

Another approach is based on the mortality rate. According to Nohrmann (1953), this approach is particularly useful for repeat measures studies because the beginning population size at one time point is the ending population size of the previous time point. To get mortality rate, the formula is:

Q = d / d + l
where Q = mortality rate
      d = number of deceased
      l = number of survivors

For this study, Q was calculated using the following expression:

# Mouse mortality values
mortality = (abs(no_mice.diff()) / no_mice)

Survival rate (SR), therefore, is based on (1 - Q). For instance:

SR t1 = 100 (1 - Q1)
SR t2 = 100 (1 - Q1) (1 - Q2)
SR tn = 100 (1 - Q1) (1 - Q2) ... (1 - Qn)

The implementation of Nohrmann's SR equation for this study was as follows:

# survival = 1 - mouse mortality
def survive(x): # where x is the index of time (range: time[1] = 5 to time[9] = 45)
    return 1 - mortality.iloc[x,0:4]

surv_t05 = survive(1)
surv_t10 = survive(2)
surv_t15 = survive(3)
surv_t20 = survive(4)
surv_t25 = survive(5)
surv_t30 = survive(6)
surv_t35 = survive(7)
surv_t40 = survive(8)
surv_t45 = survive(9)

# survival rate per year
survival_rate_t05 = 100 * surv_t05
survival_rate_t10 = 100 * surv_t05 * surv_t10
survival_rate_t15 = 100 * surv_t05 * surv_t10 * surv_t15
survival_rate_t20 = 100 * surv_t05 * surv_t10 * surv_t15 * surv_t20
survival_rate_t25 = 100 * surv_t05 * surv_t10 * surv_t15 * surv_t20 \
                        * surv_t25
survival_rate_t30 = 100 * surv_t05 * surv_t10 * surv_t15 * surv_t20 \
                        * surv_t25 * surv_t30
survival_rate_t35 = 100 * surv_t05 * surv_t10 * surv_t15 * surv_t20 \
                        * surv_t25 * surv_t30 * surv_t35
survival_rate_t40 = 100 * surv_t05 * surv_t10 * surv_t15 * surv_t20 \
                        * surv_t25 * surv_t30 * surv_t35 * surv_t40
survival_rate_t45 = 100 * surv_t05 * surv_t10 * surv_t15 * surv_t20 \
                        * surv_t25 * surv_t30 * surv_t35 * surv_t40 \
                        * surv_t45

The SR data was then placed in another dataframe, survival_rates.

survival_rates = pd.DataFrame(dict(survival_rate_t05 = survival_rate_t05,
                                   survival_rate_t10 = survival_rate_t10,
                                   survival_rate_t15 = survival_rate_t15,
                                   survival_rate_t20 = survival_rate_t20,
                                   survival_rate_t25 = survival_rate_t25,
                                   survival_rate_t30 = survival_rate_t30,
                                   survival_rate_t35 = survival_rate_t35,
                                   survival_rate_t40 = survival_rate_t40,
                                   survival_rate_t45 = survival_rate_t45))
survival_rates = survival_rates.rename(columns = {"survival_rate_t05": "t05",
                                                  "survival_rate_t10": "t10",
                                                  "survival_rate_t15": "t15",
                                                  "survival_rate_t20": "t20",
                                                  "survival_rate_t25": "t25",
                                                  "survival_rate_t30": "t30",
                                                  "survival_rate_t35": "t35",
                                                  "survival_rate_t40": "t40",
                                                  "survival_rate_t45": "t45"})
sr1 = survival_rates.transpose() # use the drug names as the keys in the dataframe

The results were also plotted into a line graph.

# Plot survival rates

plt.plot(np.delete(time, 0), 
        sr1, 
        marker = "o")
plt.title("Survival Rates for Each Anti-Cancer Drug")
plt.xlabel("Day Number")
plt.ylabel("Survival Rate (%)")
plt.xlim(0, max(time) + 5)
plt.ylim(0, 120)
plt.legend(sr1.keys())

Resources

Nohrmann, B. A. 1953. Survival rate calculation. Acta Radiologica. 39(1): 78–82.

rochiecuevas / Pymaceuticals