AllenInstitute / cell_type_mapper

Repository for storing prototype functionality implementations for the BKP

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MapMyCells hierachical algorithm. No marker genes listed

JRS92 opened this issue · comments

Hi,

After running MapMyCells, I looked over the JSON file in the output and found that some subclasses do not have any marker genes listed. However, there were cells assigned to these subclasses and they have an avg_correlation listed. How is the correlation calculated if there are no marker genes?

Thanks!

Hi @JRS92,

When the marker gene listing says

{"subclassA": ["gene1", "gene2", "gene3"]...}

it means that gene1, gene2, and gene3 are the marker genes used after the cell has been assigned to subclassA and the mapper needs to select which child of subclassA the cell belongs to.

So:

  1. the markers listed for subclassA are not the markers used when calculating the correlation with subclassA. In that case, the markers you want to be looking at are the markers for subclassA's parent class

  2. if subclassA has exactly one child cluster, there will be no markers listed for it since there is no choice to make between children of subclassA (there is one child, so membership in subclassA necessarily implies membership in clusterB, etc.)

Does this make sense given what you are seeing in your results?

Thanks, that helps a lot! Just as you said, one of the subclasses I see this in has only one child cluster.

Is there a way to see the average expression profile of a given subclass for the marker genes of its parent class? I suppose one could figure this out by looking at cells that were assigned the subclass in the reference data, but that seems a bit roundabout.

It depends on how you are running MapMyCells.

If you are running the code yourself, then the data you need is in the precomputed_stats file as described here.

However, if you are running MapMyCells through the web app, then the answer is "not easily", as we haven't exposed the precomputed_stats file to the public yet. You could download the reference Whole Mouse Brain dataset as described here and then find the average gene expression of each cell type across all of the cells assigned to that cell type, but that is a lot of data (several hundred GB).

Sorry I don't have a quick and easy answer for you.