Single cell experiments can produce large analysis files (expression matrices, AnnData/Seurat files etc). Primary data, like FASTQ files, may also be large in file size or in number. High file count can be simplified by creating an archive if the data doesn't benefit from individual file download. The resulting archive file and/or large analysis files can often be uploaded more easily using the Google Cloud command line interface (CLI). This tool allows you to upload files directly into your study in the cloud. The gcloud CLI requires some use of the command line, but we are happy to help you get acclimated to the process. Let's get started!
Step 1
Download Google Cloud CLI and follow the official install instructions for your operating system.
Step 2
Have your data (eg.fastq.gz files) in a directory you can access with a command-line terminal (e.g. Mac Terminal or Windows Terminal).
Step 3
Get your bucket ID from the Single Cell Portal (if you have already created a study):
- Log in and navigate to your study.
- Click the "Settings" tab and then click the "Study details" Management page link
- At the top of the Study Details page, your bucket id is listed in the "General Info & Defaults" panel:
When you get your first bucket, you will also get an email about registering for Terra, the platform SCP uses for underlying cloud services. IMPORTANT: Please register with Terra to ensure successful authentication and access to your study's cloud-based resources. You are welcome to access your studies and data through Terra; many advanced services for expert users can be found in Terra. Read more here.
Step 4
Authenticate to obtain access to the study's Google resources.
The following steps only need to be executed once. Afterward, as long as you are logged in to the same computer with the same username, you will already be configured.
-
In a command-line terminal (e.g. Mac Terminal or Windows Terminal) run the following command:
gcloud auth login
-
On a machine with access to a web browser, a web page will be opened in your browser for authentication.
-
If you are on a server without web browser access, a lengthy URL will be provided. Paste the entire URL into a web browser (using a separate computer, if necessary).
-
Log in with the Google account you used to create your SCP study.
-
NOTE: You will most likely be asked to pick a project to use with your session. Since you will not be able to see the Single Cell Portal GCP project, please select an existing project you have, or create a new one. This will not impact your ability to upload files to your bucket.
-
For URL-based access, a verification code will appear after account selection. Copy the code from the web browser and paste it into your terminal at the "Enter authorization code" prompt.
Step 5
Transfer files to the study bucket.
The following commands will always be executed to load data.
-
From the terminal, navigate to your data.
cd /path/to/the/directory/holding/your/data
-
Example copy commands
-
copy a single local file to the bucket:
gcloud storage cp metadata.txt gs://bucket_id
-
copy all local compressed fastq files to the bucket.
gcloud storage cp *.fastq.gz gs://bucket_id
- copy a single file from an existing google bucket to the study bucket.
gcloud storage cp gs://my-personal-bucket/my-file.txt gs://bucket_id
- copy a single file from another cloud storage vendor to the study bucket.
gcloud storage cp s3://my-personal-bucket/my-file.txt gs://bucket_id
- If your file large and/or compressible (ie. txt, mtx, csv, tsv), please use
gcloud storage cp -Z <file_protocol>://<path_to_file> gs://bucket_id
to enable in-cloud compression and reduce storage costs. The file will be automatically be decompressed upon download therefore the file name should not include ".gzip" or ".gz".
Step 6
Once your files are transferred to the study bucket, log back into the portal and navigate to your study. Configure your files for data ingest, using one of two options:
- Choose a file from you bucket for data ingest using the "Use bucket path" option in the upload wizard to configure data ingest (Click the "Settings" tab and then choose the "Upload/Edit data" Management page link).
- "Synchronize" the data in the bucket to the study (Click the "Settings" tab and then choose the "Sync workspace" Management page link).
Comments
0 comments
Please sign in to leave a comment.