dpeerlab / Palantir

Single cell trajectory detection

Home Page:https://palantir.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

plot_gene_trend

yitengfei120011 opened this issue · comments

Why are the same genes expressed differently in different terminal states ? And how to align the expression ruler in the different terminal state maps ?

eg: https://nbviewer.org/github/dpeerlab/Palantir/blob/master/notebooks/Palantir_sample_notebook.ipynb ,

palantir.plot.plot_gene_trend_heatmaps(ad, genes)
plt.show()
Shown in Fig , 'CD34' gene expressed differently in different terminal states,

Hi @yitengfei120011,

Thank you for your inquiry. I am not sure I fully understand the question. Why would you expect all terminal cell states to have the same gene expressions? Since different terminal refer to different cell states that are only defined through their gene expression, it seems their gene expressions should differ by definition.

For all visual customization such as color range and positioning of the color bar you can use its argument

cbkwargs : dict
    Additional keyword arguments for matplotlib.pyplot.colorbar.
**kwargs : dict
    Additional keyword arguments for matplotlib.pyplot.matshow.

s. matplotlib colorbar and matplotlib matshow. For example

palantir.plot.plot_gene_trend_heatmaps(ad, genes, vmin=-1, vmax=1, cbkwargs=dict(shrink=.5))
plt.show()

fixes the color gradient to cover a value range between -1 and 1, and reduces the size of the color bar.

Please let me know if this resolves the issue.

My question is why is the same gene expressed differently in different lineages?
eg: as shown with 'CD34' in the picture.
image

Hi @yitengfei120011,

As the left hand side of this plot does relate to the same stem cell state, it is indeed expected to show the approximately same gene expression values for the same gene. And indeed, looking at the color bar to the right of these plots reveals that colors of CD34 correspond to the same expression value of approximately 1 in both linages. The reason for the different scaling in the Ery linage is the high expression reached by MPO towards the terminal. If you want to fix the scale to the same values (e.g., -3 to 2) you can use the vmin and vmax parameters like so:

palantir.plot.plot_gene_trend_heatmaps(ad, genes, vmin=-3, vmax=2)
plt.show()

Does that resolve the issue?

hi, my palantir version is 1.2
palantir.plot.plot_gene_trend_heatmaps(ad, genes, vmin=-3, vmax=2) not work .
Can I succeed without changing another version ?

Hi @yitengfei120011,

Unfortunately, this is not possible with Palantir v1.2. I would recommend updating, e.g., with

python -m pip install --no-deps --upgrade palantir

or you can copy the new function from here:

Palantir/src/palantir/plot.py

Lines 1583 to 1663 in 906bddf

def plot_gene_trend_heatmaps(
data: Union[sc.AnnData, Dict],
genes: Optional[List[str]] = None,
gene_trend_key: str = "gene_trends",
branch_names: Union[str, List[str]] = "branch_masks",
scaling: Optional[Literal["none", "z-score", "quantile", "percent"]] = "z-score",
basefigsize: Tuple[int, int] = (7, 0.7),
cbkwargs: Dict = dict(),
**kwargs,
) -> plt.Figure:
"""
Plot the gene trends on heatmaps: a heatmap is generated for each branch.
Parameters
----------
data : Union[sc.AnnData, Dict]
AnnData object or dictionary of gene trends.
genes : Optional[List[str]], optional
List of genes to include in the plot. If None, all genes are included.
Default is None.
gene_trend_key : str, optional
Key to access gene trends in the AnnData object's varm. Default is 'gene_trends'.
branch_names : Union[str, List[str]], optional
Key to access branch names from AnnData object or list of branch names. If a string is provided,
it is assumed to be a key in AnnData.uns. Default is 'branch_masks_columns'.
scaling : Optional[Literal["none", "z-score", "quantile", "percent"]], optional
Scaling method to apply on the gene trends. Options are:
- "none" : no scaling is applied.
- "z-score" : standardizes the data to have 0 mean and 1 variance.
- "quantile" : scales the data to have values between 0 and 1.
- "percent" : scales the data to represent percentages of the max value in the row.
Default is 'z-score'.
basefigsize : Tuple[int, int], optional
Base width and height in inches of the figure. The actual height of the figure is calculated
based on the number of genes and branches. Default base size is (7, 0.7).
cbkwargs : dict
Additional keyword arguments for matplotlib.pyplot.colorbar.
**kwargs : dict
Additional keyword arguments for matplotlib.pyplot.matshow.
Returns
-------
fig : matplotlib.figure.Figure
Matplotlib figure object of the plot.
"""
gene_trends = _validate_gene_trend_input(data, gene_trend_key, branch_names)
default_kwargs = {
"cmap": matplotlib.rcParams["image.cmap"],
"aspect": 50,
}
default_kwargs.update(kwargs)
default_cbkwargs = {
"shrink": 0.9,
"drawedges": False,
}
default_cbkwargs.update(cbkwargs)
# Get the branch names
branches = list(gene_trends.keys())
if genes is None:
genes = gene_trends[branches[0]]["trends"].index
height = basefigsize[1] * len(genes) * len(branches)
figsize = (basefigsize[0], height)
fig = plt.figure(figsize=figsize)
for i, branch in enumerate(branches):
ax = fig.add_subplot(len(branches), 1, i + 1)
mat = gene_trends[branch]["trends"].loc[genes, :]
mat = _scale(mat, scaling)
cbd = ax.matshow(mat, **default_kwargs)
ax.set_xticks([])
ax.set_yticks(range(len(genes)), genes)
ax.set_frame_on(False)
ax.set_title(branch, fontsize=12)
cb = plt.colorbar(cbd, ax=ax, **default_cbkwargs)
cb.outline.set_visible(False)
return fig

Also note that by default z-score scaling is applied. This means the gene expression values are rescaled to nicely use the whole color spectrum. This is a common practice for heat maps to preserve contrast of each row. If you do not want any scaling and leave the gene expressions as is you can pass scaling="none" to the function, like so:

palantir.plot.plot_gene_trend_heatmaps(ad, genes, scaling="none")
plt.show()

Hi, When I unify the ruler , like the gene 'Tgfbr1' , the start point of the gene has different expression levels, how to explain it ?
image

Hi @yitengfei120011,

Thank you for your question! Is this plot done with the scaling="none":

palantir.plot.plot_gene_trend_heatmaps(ad, genes, scaling="none")
plt.show()

The default scaling (zscore) adjusts the levels such that the whole color spectrum is used, which is a common practice for heatmaps. e.g., see "zscore" in seaborn.clustermap.

Without scaling, the initial value on the heatmap is influenced by the selection of cells for each branch. This can cause discrepancies if high-expressing cells near the start point are unevenly distributed between the two branches. To better understand this, I recommend examining the branch selection using palantir.plot.plot_branch_selection(ad) and comparing gene expression across the UMAP plot. This approach helps identify if certain cells are disproportionately influencing the trend.

For a more granular analysis, inspecting the raw vs. smoothed gene expression trends can be enlightening. Here’s how you can visualize this for specified branches:

palantir.plot.plot_trend(ad, "GCAGGCTAGAGGTCAC-1-FDEHP", "Tgfbr1", color="celltype")
plt.show()
palantir.plot.plot_trend(ad, "GTAACCACAGTTGTTG-1-FDEHP", "Tgfbr1", color="celltype")
plt.show()

If your data doesn’t include celltype annotations yet, simply omit or modify the color="celltype" argument accordingly.

I hope this clarifies your query. Please don’t hesitate to reach out if you have more questions or need further assistance. I'm here to help!

hi, Before performing the palantir.plot.plot_trend() drawing ,the gene trends I calculated are as follows .
image
image
According to the figure above, the expression level of the start point is different, and the value should be without scaling.
Does the column '0.000000' in the figure represent the expression of the starting point cell? Or does it represent the expression level of the nearby cell set near the starting point cell ?

Hi @yitengfei120011,

Indeed, the observed gene trend discrepancies are influenced by the expression levels of all cells within similar pseudotime ranges in their respective branches. Generally, branches at pseudotime 0 select a comparable set of cells. The marked difference between the two branches in your case suggests a potential issue with cell selection, possibly due to an outlier cell present in one branch but not the other. I recommend revisiting the cell selection process. Utilizing palantir.plot.plot_branch_selection(ad) and overlaying gene expression on the UMAP plot could provide valuable insights.

To further investigate the source of these differences, examining the specific gene expression trends together with the cell-wise gene-expression levels could be helpful. This can be achieved with the following commands, which plot both and may pinpoint the contributions to the observed gene trend:

palantir.plot.plot_trend(ad, "GCAGGCTAGAGGTCAC-1-FDEHP", "Tgfbr1", color="celltype")
plt.show()
palantir.plot.plot_trend(ad, "GTAACCACAGTTGTTG-1-FDEHP", "Tgfbr1", color="celltype")
plt.show()

These plots could identify the cells driving the trend computation. If your dataset lacks celltype annotations, adjust or omit the color="celltype" parameter as necessary.

I'm keen to assist further, so please don't hesitate to share these plots or ask additional questions.