Exploratory differential expression (DE) analysis can be performed in Single Cell Portal to look for genes that drive differences between distinctly annotated clusters of cells in a study. For example, you could use DE to compare T cells to all other cell types. Exploratory DE analysis for cell type annotations will be generated for those studies where the study owner has made raw count files available.
In addition to this written help, a video demo is also available at the bottom of this article.
Differential expression is calculated in the context of the viewed cluster plot and annotation.
When the differential expression option is selected, you’ll see a list of all the cell groups that are eligible for DE analysis, within the selected annotation.
Upon selecting an annotation group, the top 15 most differentially expressed genes (relative to all other cells in the cluster plot) are shown in table format that displays log (base 2) fold change and a p-value adjusted with Benjamini–Hochberg FDR correction. Selecting a gene in the DE table triggers the display of gene expression data for that gene in a side-by-side view with the cluster plot. Selection can be done by either clicking a gene name in the DE table or, for quicker scanning, pressing the up and down arrow keys.
Single Cell Portal calculates differential expression results using a Wilcoxon Rank-Sum (Mann-Whitney U Test) implemented with Scanpy’s rank_genes_groups function. For each gene in a comparison, this method tests whether that gene’s expression values are consistently higher – or lower – in one group of cells than in another. The output of this test is a z-score, which is included for download in the extended results table (the "scores" column). The adjusted p-value for each gene indicates whether this z-score is statistically significant – that is, whether it’s possible to differentiate between the cell groups based on that gene’s expression better than one could expect given random chance.
This is a non-parametric version of a differential expression test. We chose this test because it is less likely to be affected by outliers or skewed distributions of expression values than a parametric test. Note that this test measures whether a given gene is consistently expressed more or less in one cell group versus another; it does not measure how different the expression values are between the two groups. Therefore, we also report a Log2 Fold Change (Log2FC) value for each gene. Log2FC measures the difference in mean expression values between the two cell groups being compared. However, Log2FC does not capture the variability around these means. To fully understand how much a gene’s expression differs across cell groups, it is therefore useful to visually compare the distribution of each gene's expression values across cell groups – for example, through violin plots. To see a violin plot for a given gene, click on the circle next to the gene's name in the DE results table, then click on the "Distribution" tab (these steps are highlighted in the figure below).
Filtering Differential Expression results
You can narrow down the list of genes by using filters to threshold your results based on p-value and log2FC.
Click the Dot plot button at the top right of the DE results table:
and generate a dot plot to look at relative expression of your refined gene list.
Video demo
Further details
For details about our implementation, see the open source code for SCP differential analysis. This analysis is only intended for data exploration as this simple approach does not account for technical covariates/batch effects and may produce false positives. We believe this risk of false positives to be lowest when comparing cells based on cell type (as opposed to tissues or disease state, for example). Explore this differential expression analysis for yourself in SCP1671!
Comments
0 comments
Please sign in to leave a comment.