R version: R version 4.1.1 (2021-08-10)
Bioconductor version: 3.13
Package: 1.1.4 <>

Introduction

The GeoMx Digital Spatial Profiler (DSP) is a platform for capturing spatially resolved high-plex gene (or protein) expression data from tissue. In particular, formalin-fixed paraffin-embedded (FFPE) or fresh-frozen (FF) tissue sections are stained with barcoded in-situ hybridization probes that bind to endogenous mRNA transcripts. The user then selects Locations of the interest (ROI) to profile; if desired, each ROI segment can be further sub-divided into areas of illumination (AOI) based on tissue morphology. The GeoMx then photo-cleaves and collects expression barcodes for each AOI segment separately for downstream sequencing and data processing.

The final results are spatially resolved unique expression datasets for every protein-coding gene (>18,000 genes) from every individual segments profiled from tissue.

Steps & Scope

We start with raw gene expression count files. Using open source R packages, we evaluate samples and expression targets and prepare gene-level count data for downstream analysis. To understand our spatial data, we perform unsupervised clustering, dimension reduction, and differential gene expression analyses and visually explore the results.

Our specific objectives:

  • Load GeoMx raw count files and metadata (DCC, PKC, and annotation file)
  • Perform quality control (QC), filtering, and normalization to prepare the data
  • Perform downstream visualizations and statistical analyses including:
    • Dimension reduction with UMAP or t-SNE
    • Heatmaps and other visualizations of gene expression
    • Differential expression analyses with linear mixed effect models

Getting started

Loading Data

In this analysis, we will analyze a dataset created with the human whole transcriptome atlas (WTA) assay. The dataset includes 8x3 PCR positive Covid (Group1), 3x3 PCR negative Covid (Group2), and 4x3 PCR negative control (Group3) samples. Regions of interest (ROI) were spatially profiled to focus on two different structures: lumen or internal.

The key data files are:

  • DCCs files - expression count data and sequencing quality metadata
  • PKCs file(s) - probe assay metadata describing the gene targets present in the data
  • Annotation file - useful tissue information.

We then load the data to create a data object using the readNanoStringGeoMxSet function.

All of the expression, annotation, and probe information are now linked and stored together into a single data object.

Study Design

Modules Used

First let's access the PKC files, to ensure that the expected PKCs have been loaded for this study. For the data we are using the file Hs_R_NGS_WTA_v1.0.pkc.

Sample Overview

Now that we have loaded the data, we can visually summarize the experimental design for our dataset to look at the different types of samples and ROI/AOI segments that have been profiled. We present this information in a Sankey diagram.