Single Cell Portal generates visualizations using information from three key study files: processed matrix, metadata and cluster files.
This article covers:
- NAME
- Metadata file
- Cluster/spatial files
- Processed Matrix files
- Generating unique NAMEs for studies with multiple matrix files
NAME
Metadata and cluster files require a "NAME" column. In scatter plots of dimensionality reduction or projection data, the provided NAME identifies each plotted cell. (For spatial transcriptomics data, the NAME identifies each plotted spatial point.) For dense matrix files, column headers correspond to NAME. For sparse matrix data, the barcode file provides NAME identifiers. NAME is used to extract key data from each study file type needed for data visualization:
- coordinates of the plotted point in the cluster file
- annotations available for visualization of the plotted point from the metadata file
- gene expression values for the plotted point from the processed matrix file
Metadata file
Each SCP study has one metadata file. All points (aka NAME) in the study should be enumerated in the metadata file and all entries in the NAME column must be unique. In the metadata file, SCP requires specific metadata so studies can be indexed for Metadata-powered Advanced Search. Please visit the Metadata File section for details about expected format and content of SCP metadata files.
Cluster (and spatial transcriptomics) files
Cluster/spatial files provide the coordinates that represent plotted points for point-based visualizations (such as scatter plots and gene expression plots). NAMEs in the cluster/spatial file MUST match the NAMEs in the metadata file in order to plot data for visualization.
NAMEs in any single cluster/spatial file must be unique but NAMEs across cluster/spatial files can be repeated. In other words, a cell or spatial point can be represented in multiple plots.
Processed Matrix files
Processed expression matrices provide the basis for gene expression plots. NAMEs in Matrix files (cell headers for dense matrices; barcode file for sparse matrix data) MUST also match those in metadata and cluster files to enable gene search.
Matrix files are expected to be species-specific. "Barnyard" or other comparative studies should separate their data into separate matrices per species.
Generating unique NAMEs for studies with multiple matrix files
When a study has multiple matrices (due to multiple species or multiple timepoint experiments) the same barcode (aka NAME) may be found in more than one matrix - duplicate NAMEs will cause data ingest errors.
We recommend the following process to generate unique NAMEs when you have multiple matrices:
- for each additional matrix file, prepend an identifier to each barcode (aka NAME). This will make the NAME unique.
- When preparing the metadata file, use the new barcode with the prepended identifier in the NAME column.
- When preparing cluster files, use the new barcode with the prepended identifier in the NAME column.
If using Seurat v3, see this post in Single Cell Portal Community for an example.
Comments
0 comments
Please sign in to leave a comment.