R version: R version 4.1.1 (2021-08-10)
Bioconductor version: 3.13
Package: 1.1.4 <>

Introduction

The GeoMx Digital Spatial Profiler (DSP) is a platform for capturing spatially resolved high-plex gene (or protein) expression data from tissue. In particular, formalin-fixed paraffin-embedded (FFPE) or fresh-frozen (FF) tissue sections are stained with barcoded in-situ hybridization probes that bind to endogenous mRNA transcripts. The user then selects Locations of the interest (ROI) to profile; if desired, each ROI segment can be further sub-divided into areas of illumination (AOI) based on tissue morphology. The GeoMx then photo-cleaves and collects expression barcodes for each AOI segment separately for downstream sequencing and data processing.

The final results are spatially resolved unique expression datasets for every protein-coding gene (>18,000 genes) from every individual segments profiled from tissue.

Steps & Scope

We start with raw gene expression count files. Using open source R packages, we evaluate samples and expression targets and prepare gene-level count data for downstream analysis. To understand our spatial data, we perform unsupervised clustering, dimension reduction, and differential gene expression analyses and visually explore the results.

Our specific objectives:

Load GeoMx raw count files and metadata (DCC, PKC, and annotation file)
Perform quality control (QC), filtering, and normalization to prepare the data
Perform downstream visualizations and statistical analyses including:
- Dimension reduction with UMAP or t-SNE
- Heatmaps and other visualizations of gene expression
- Differential expression analyses with linear mixed effect models

Getting started

Loading Data

In this analysis, we will analyze a dataset created with the human whole transcriptome atlas (WTA) assay. The dataset includes 8x3 PCR positive Covid (Group1), 3x3 PCR negative Covid (Group2), and 4x3 PCR negative control (Group3) samples. Regions of interest (ROI) were spatially profiled to focus on two different structures: lumen or internal.

The key data files are:

DCCs files - expression count data and sequencing quality metadata
PKCs file(s) - probe assay metadata describing the gene targets present in the data
Annotation file - useful tissue information.

We then load the data to create a data object using the readNanoStringGeoMxSet function.

All of the expression, annotation, and probe information are now linked and stored together into a single data object.

Study Design

Modules Used

First let's access the PKC files, to ensure that the expected PKCs have been loaded for this study. For the data we are using the file Hs_R_NGS_WTA_v1.0.pkc.

Sample Overview

Now that we have loaded the data, we can visually summarize the experimental design for our dataset to look at the different types of samples and ROI/AOI segments that have been profiled. We present this information in a Sankey diagram.

QC & Pre-processing

The steps above encompass the standard pre-processing workflow for GeoMx data. In short, they represent the selection of ROI/AOI segments and genes based on quality control (QC) or limit of quantification (LOQ) metrics and data normalization.

Before we begin, we will shift any expression counts with a value of 0 to 1 to enable in downstream transformations.

Segment QC

We first assess sequencing quality and adequate tissue sampling for every ROI/AOI segment.

Every ROI/AOI segment will be tested for:

Raw sequencing reads: segments with <1000 raw reads are removed.
% Aligned,% Trimmed, or % Stitched sequencing reads: segments below ~80% for one or more of these QC parameters are removed.
% Sequencing saturation ([1-deduplicated reads/aligned reads]%): segments below ~50% require additional sequencing to capture full sample diversity and are not typically analyzed until improved.
Negative Count: this is the geometric mean of the several unique negative probes in the GeoMx panel that do not target mRNA and establish the background count level per segment; segments with low negative counts (<5-10) are not necessarily removed but may be studied closer for low endogenous gene signal and/or insufficient tissue sampling.
Nuclei: >100 nuclei per segment is generally recommended; however, this cutoff is highly study/tissue dependent and may need to be reduced; what is most important is consistency in the nuclei distribution for segments within the study.
Area: generally correlates with nuclei; a strict cutoff is not generally applied based on area.

Select Segment QC

First, we select the QC parameter cutoffs, against which our ROI/AOI segments will be tested and flagged appropriately. We have selected the appropriate study-specific parameters for this study.

# Default QC cutoffs are commented in () adjacent to the respective parameters
# study-specific values were selected after visualizing the QC results in more
# detail below
QC_params <-
  list(minSegmentReads = 1000, # Minimum number of reads (1000)
       percentTrimmed = 80,    # Minimum % of reads trimmed (80%)
       percentStitched = 80,   # Minimum % of reads stitched (80%)
       percentAligned = 75,    # Minimum % of reads aligned to known targets (80%)
       percentSaturation = 50, # Minimum sequencing saturation (50%)
       minNegativeCount = 1,   # Minimum negative control counts (10)
       minNuclei = 20,         # Minimum # of cells observed in a segment (100)
       minArea = 1000)         # Minimum segment area (5000)
m666Data <-
  setSegmentQCFlags(m666Data, qcCutoffs = QC_params)        

# Collate QC Results
QCResults <- protocolData(m666Data)[["QCFlags"]]
flag_columns <- colnames(QCResults)
QC_Summary <- data.frame(Pass = colSums(!QCResults[, flag_columns]),
                         Warning = colSums(QCResults[, flag_columns]))
QCResults$QCStatus <- apply(QCResults, 1L, function(x) {
  ifelse(sum(x) == 0L, "PASS", "WARNING")
})
QC_Summary["TOTAL FLAGS", ] <-
  c(sum(QCResults[, "QCStatus"] == "PASS"),
    sum(QCResults[, "QCStatus"] == "WARNING"))

We plot all of the QC Summary information in a table.

QC Summary Table for each Segment
	Pass	Warning
LowReads	45	0
LowTrimmed	45	0
LowStitched	45	0
LowAligned	45	0
LowSaturation	45	0
LowNegatives	45	0
TOTAL FLAGS	45	0

Remove flagged segments

As the final step in Segment QC, we don't need to remove any flagged segments since all segments pass QC.

#> Features  Samples 
#>    18815       45

Probe QC

Before we summarize our data into gene-level count data, we will remove low-performing probes. In short, this QC is an outlier removal process, whereby probes are either removed entirely from the study (global) or from specific segments (local). The QC applies to gene targets for which there are multiple distinct probes representing the count for a gene per segment. In WTA data, only one probe exists per target gene; thus, Probe QC does not apply to the endogenous genes in the panel. Rather, it is performed on the negative control probes; there are multiple probes representing our negative controls, which do not target any sequence in the genome. These probes enable calculation of the background per segment and will be important for determining gene detection downstream.

After Probe QC, there will always remain at least one probe representing every gene target. In other words, Probe QC never removes genes from your data.

Set Probe QC Flags

A probe is removed globally from the dataset if either of the following is true:

the geometric mean of that probe's counts from all segments divided by the geometric mean of all probe counts representing the target from all segments is less than 0.1
the probe is an outlier according to the Grubb's test in at least 20% of the segments

A probe is removed locally (from a given segment) if the probe is an outlier according to the Grubb's test in that segment.

We report the number of global and local outlier probes.

Probes flagged or passed as outliers
Passed	Global	Local
18813	1	1

Exclude Outlier Probes

#Subset object to exclude all that did not pass Ratio & Global testing
ProbeQCPassed <- 
  subset(m666Data, 
         fData(m666Data)[["QCFlags"]][,c("LowProbeRatio")] == FALSE &
           fData(m666Data)[["QCFlags"]][,c("GlobalGrubbsOutlier")] == FALSE)
dim(ProbeQCPassed)
#> Features  Samples 
#>    18814       45
m666Data <- ProbeQCPassed

Create Gene-level Count Data

With our Probe QC steps complete, we will generate a gene-level count matrix. The count for any gene with multiple probes per segment is calculated as the geometric mean of those probes (see gene_level_counts.xlsx).

Limit of Quantification

In addition to Segment and Probe QC, we also determine the limit of quantification (LOQ) per segment. The LOQ is calculated based on the distribution of negative control probes and is intended to approximate the quantifiable limit of gene expression per segment. The formula for calculating the LOQ in a \(i^{th}\) segment is:

\[LOQ_{i} = geomean(NegProbe_{i}) * geoSD(NegProbe_{i})^{n}\]

We typically use 2 geometric standard deviations (\(n = 2\)) above the geometric mean as the LOQ, which is reasonable for most studies. We keep the segments if their LOQ > 2.

Filtering

After determining the limit of quantification (LOQ) per segment, we filter out either segments and/or genes with abnormally low signal. Filtering is an important step to focus on the true biological data of interest.

We determine the number of genes detected in each segment across the dataset.

Segment Gene Detection

We first filter out segments with exceptionally low signal. These segments will have a small fraction of panel genes detected above the LOQ relative to the other segments in the study. Let's visualize the distribution of segments with respect to their % genes detected:

We can also create a table to review what Group (1, 1a, 2 vs 3) is going to be impacted by each threshold:

	1a	1b	2	3
<1%	1	3	1	1
1-5%	5	15	8	11
5-10%	0	0	0	0
10-15%	0	0	0	0
>15%	0	0	0	0

In this example, we choose to remove segments with less than 0.1% of the genes detected.

target_m666Data <-
  target_m666Data[, pData(target_m666Data)$GeneDetectionRate >= .001]

dim(target_m666Data)
#> Features  Samples 
#>    18677       45

Gene Detection Rate

Next, we determine the detection rate for genes across the study. To illustrate this idea, we create a small gene list including all genes in which DetectionRate>=0.97 to review.

Detection rate for Genes of Interest
Gene	Detection, # Segments	Detection Rate, % of Segments
NFKB2	44	97.8%
CKS1B	44	97.8%
CUX1	44	97.8%
RNPEP	44	97.8%
GDAP1L1	44	97.8%
FNDC11	44	97.8%
ZFHX4	44	97.8%
WLS	44	97.8%
CHODL	44	97.8%
MINDY3	44	97.8%
MPIG6B	44	97.8%
TMEM30A	45	100.0%
CRABP2	45	100.0%
CPS1	45	100.0%
CSNK2A2	45	100.0%
DR1	45	100.0%
DSC3	45	100.0%
MAZ	45	100.0%
TMA7	45	100.0%
TMEM106C	45	100.0%
SNIP1	45	100.0%
MZT2B	45	100.0%
IFT74	45	100.0%
PTBP1	45	100.0%
ADAM15	45	100.0%
DHRS1	45	100.0%

We can see that individual genes are detected to varying degrees in the segments, which leads us to the next QC we will perform across the dataset.

Gene Filtering

We will graph the total number of genes detected in different percentages of segments. Based on the visualization below, we can better understand global gene detection in our study and select how many low detected genes to filter out of the dataset. Gene filtering increases performance of downstream statistical tests and improves interpretation of true biological signal.

We typically set a % Segment cutoff ranging from 5-20% based on the biological diversity of our dataset. For this study, we selected 5% as our cutoff. In other words, we focused on the genes detected in at least 5% of our segments; we filtered out the remainder of the targets.

	x
Features	674
Samples	45

Normalization

We will now normalize the GeoMx data for downstream visualizations and differential expression. Q3 normalization is typically the preferred normalization strategy for most DSP-NGS RNA studies.

Before normalization, we will explore the relationship between the upper quartile (Q3) of the counts in each segment with the geometric mean of the negative control probes in the data. Ideally, there should be a separation between these two values to ensure we have stable measure of Q3 signal.

As expected, we see separation of the Q3 and negative probe counts at both the distribution (A) and per segment (B) levels. Next, we normalize our data.

We use the normalize function from NanoStringNCTools to create normalization factors reflecting each data type. Upper quartile (Q3) normalization is performed using norm_method = "quant" setting the desiredQuantile flag to 0.75. Other quantiles could be specified by changing that value. We save the normalized data to a specific slot using toELT = "q_norm".

# Q3 norm (75th percentile) for WTA/CTA  with or without custom spike-ins
target_m666Data <- normalize(target_m666Data , data_type = "RNA",
                             norm_method = "quant", 
                             desiredQuantile = .75,
                             toElt = "q_norm")

To demonstrate the effects of normalization, we graph a representative boxplots of the data for individual segments before and after normalization.

Unsupervised Analysis

UMAP & t-SNE

One common approach to understanding high-plex data is dimension reduction. Two common methods are UMAP and tSNE, which are non-orthogonally constrained projections that cluster samples based on overall gene expression. In this study, we see by either UMAP (from the umap package) or tSNE (from the Rtsne package), clusters of segments related to structure (glomeruli or internals) and disease status (normal or diabetic kidney disease).

Clustering high CV Genes

Another approach to explore the data is to calculate the coefficient of variation (CV) for each gene (\(g\)) using the formula \(CV_g = SD_g/mean_g\). We then identify for genes with high CVs that should have large differences across the various profiled segments. This unbiased approach can reveal highly variable genes across the study (see high_CV_Genes.xlsx).

We plot the results in a heatmap.

#>     IGHG1     IGHG4     IGHG2      IGKC     IGHG3 
#> 0.2565193 0.2427206 0.2280666 0.2238702 0.1992845

Differential Expression

A central method for exploring differences between groups of segments or samples is to perform differential gene expression analysis. A common statistical approach is to use a linear mixed-effect model (LMM).

Analysis Between Group Analysis: 1 vs 2

We can show the top differentially expressed genes. The unfiltered table (including all genes) with the statistics is saved in differential_analysis.xls/1_vs_2.

DE results for top differentially expressed genes (`P<0.05 & abs(Estimate)>=0.5)
Gene	Contrast	Estimate	Pr(>\|t\|)	FDR
USP17L11	1 - 2	-0.518	0.000	0.010
USP17L3	1 - 2	-0.532	0.001	0.136
PITX1	1 - 2	1.562	0.001	0.157
ANK1	1 - 2	1.123	0.006	0.312
MGARP	1 - 2	1.079	0.011	0.499
ATP5MC2	1 - 2	0.529	0.022	0.565
AREG	1 - 2	0.773	0.026	0.565
PPIG	1 - 2	0.918	0.024	0.565
SUPT7L	1 - 2	0.711	0.027	0.565
DHRS1	1 - 2	-0.528	0.022	0.565

Volcano Plots

A canonical visualization for interpreting differential gene expression results is the volcano plot.

Plotting Genes of Interest

A simple and effective plot to view individual genes is the violin plot. I selected a significant differentially expressed gene 'ATP5MC2' as example.

First let's review the model results for these targets:

Gene	Subset	Contrast	Estimate	Pr(>\|t\|)	FDR	Color
ATP5MC2	Tissue	1 - 2	0.5293242	0.0215988	0.5645755	P < 0.05

Let's look at the distribution for ATP5MC2.

Heatmap of Significant Genes

In addition to generating individual gene boxplots or volcano plots, we can again create a heatmap from our data. Here, we plot with all genes with a P < 0.05.

Analysis Between Group Analysis: 1 vs 3

We can show the top differentially expressed genes. The unfiltered table (including all genes) with the statistics is saved in differential_analysis.xls/1_vs_3.

DE results for top differentially expressed genes (`P<0.05 & abs(Estimate)>=0.5)
Gene	Contrast	Estimate	Pr(>\|t\|)	FDR
TPM2	1 - 3	-1.059	0.000	0.001
CSRP1	1 - 3	-0.658	0.000	0.001
MYH11	1 - 3	-1.144	0.000	0.001
CNN1	1 - 3	-0.723	0.000	0.001
MFAP4	1 - 3	-0.723	0.000	0.001
TAGLN	1 - 3	-1.261	0.000	0.001
ACTA2	1 - 3	-1.246	0.000	0.001
MYL9	1 - 3	-1.181	0.000	0.001
CCN3	1 - 3	-0.730	0.000	0.002
FLNA	1 - 3	-1.270	0.000	0.003
ACTN1	1 - 3	-0.577	0.000	0.003
RGS5	1 - 3	-0.819	0.000	0.003
ITGA8	1 - 3	-0.541	0.000	0.004
SPARC	1 - 3	-0.652	0.000	0.005
C4orf46	1 - 3	0.529	0.000	0.005
MYH9	1 - 3	-0.743	0.000	0.006
CALD1	1 - 3	-0.555	0.000	0.010
DUSP1	1 - 3	-0.618	0.000	0.010
S100A6	1 - 3	-0.725	0.000	0.010
MYL6	1 - 3	-0.523	0.001	0.012
NR4A1	1 - 3	-0.710	0.001	0.012
RHOB	1 - 3	-0.848	0.001	0.015
DSTN	1 - 3	-0.507	0.001	0.015
RPL37A	1 - 3	-0.600	0.001	0.018
KANK3	1 - 3	0.532	0.001	0.018
SIRT6	1 - 3	0.565	0.001	0.018
OGN	1 - 3	-0.573	0.001	0.020
C11orf96	1 - 3	-0.651	0.002	0.022
HLA-B	1 - 3	0.645	0.002	0.028
COL6A2	1 - 3	-0.546	0.003	0.032
RPLP1	1 - 3	-0.571	0.003	0.033
HSPB1	1 - 3	-0.734	0.003	0.033
VIM	1 - 3	-0.807	0.004	0.043
DHRS1	1 - 3	0.885	0.006	0.047
BGN	1 - 3	-0.910	0.006	0.048
PTMA	1 - 3	-0.544	0.007	0.053
SPARCL1	1 - 3	-0.670	0.008	0.053
LYRM2	1 - 3	0.531	0.008	0.056
IGFBP7	1 - 3	-0.623	0.016	0.085
FCF1	1 - 3	0.530	0.021	0.096
MZT2B	1 - 3	0.825	0.034	0.130
PTBP1	1 - 3	0.712	0.037	0.138
IGHG4	1 - 3	1.233	0.038	0.140

Volcano Plots

A canonical visualization for interpreting differential gene expression results is the volcano plot.

Plotting Genes of Interest

A simple and effective plot to view individual genes is the violin plot. I selected a significant differentially expressed gene 'TPM2' as example.

First let's review the model results for these targets:

Gene	Subset	Contrast	Estimate	Pr(>\|t\|)	FDR	Color
TPM2	Tissue	1 - 3	-1.0594	1.2e-06	0.0005436	FDR < 0.001

Let's look at the distribution for TPM2.

Heatmap of Significant Genes

In addition to generating individual gene boxplots or volcano plots, we can again create a heatmap from our data. Here, we plot with all genes with a P < 0.05.

Analysis Between Group Analysis: 1 vs 2+3

We can show the top differentially expressed genes. The unfiltered table (including all genes) with the statistics is saved in differential_analysis.xls/1_vs_2+3.

DE results for top differentially expressed genes (`P<0.05 & abs(Estimate)>=0.5)
Gene	Contrast	Estimate	Pr(>\|t\|)	FDR
MYH11	1 - 2+3	-0.822	0.000	0.016
CNN1	1 - 2+3	-0.553	0.000	0.016
TPM2	1 - 2+3	-0.768	0.000	0.016
MYL9	1 - 2+3	-0.776	0.001	0.046
RGS5	1 - 2+3	-0.525	0.002	0.056
MYH9	1 - 2+3	-0.537	0.002	0.068
PITX1	1 - 2+3	1.135	0.002	0.068
RHOB	1 - 2+3	-0.600	0.003	0.068
TAGLN	1 - 2+3	-0.779	0.002	0.068
FLNA	1 - 2+3	-0.871	0.003	0.072
ACTA2	1 - 2+3	-0.748	0.005	0.088
HLA-B	1 - 2+3	0.510	0.005	0.090
ANK1	1 - 2+3	0.832	0.006	0.095
SPARCL1	1 - 2+3	-0.527	0.008	0.100
HSPB1	1 - 2+3	-0.614	0.011	0.110
MGARP	1 - 2+3	0.767	0.015	0.127
PTBP1	1 - 2+3	0.652	0.015	0.127
IGHG4	1 - 2+3	1.098	0.015	0.127
AREG	1 - 2+3	0.584	0.016	0.136
IGHG2	1 - 2+3	0.920	0.026	0.182
IGHG3	1 - 2+3	0.827	0.027	0.182
PPIG	1 - 2+3	0.644	0.030	0.193
TRIM36	1 - 2+3	0.518	0.034	0.208
VIM	1 - 2+3	-0.527	0.038	0.215
IGHG1	1 - 2+3	0.910	0.043	0.231

Volcano Plots

A canonical visualization for interpreting differential gene expression results is the volcano plot.

Plotting Genes of Interest

A simple and effective plot to view individual genes is the violin plot. I selected a significant differentially expressed gene 'CNN1' as example.

First let's review the model results for these targets:

Gene	Subset	Contrast	Estimate	Pr(>\|t\|)	FDR	Color
CNN1	Tissue	1 - 2+3	-0.5525309	0.0001523	0.0159073	FDR < 0.05

Let's look at the distribution for CNN1.

Heatmap of Significant Genes

In addition to generating individual gene boxplots or volcano plots, we can again create a heatmap from our data. Here, we plot with all genes with a P < 0.05.

Analysis Between Group Analysis: 1a vs 2

We can show the top differentially expressed genes. The unfiltered table (including all genes) with the statistics is saved in differential_analysis.xls/1a_vs_2.

DE results for top differentially expressed genes (`P<0.05 & abs(Estimate)>=0.5)
Gene	Contrast	Estimate	Pr(>\|t\|)	FDR
NKX1-2	1a - 2	-0.727	0.000	0.089
HBA1	1a - 2	0.701	0.014	0.379
GYPC	1a - 2	-0.620	0.018	0.416
CNN3	1a - 2	-0.731	0.022	0.426
HBA2	1a - 2	0.822	0.025	0.437
FBLN5	1a - 2	-0.547	0.030	0.437
USP17L3	1a - 2	-0.509	0.025	0.437
BTG2	1a - 2	0.519	0.038	0.459

Volcano Plots

A canonical visualization for interpreting differential gene expression results is the volcano plot.

Plotting Genes of Interest

A simple and effective plot to view individual genes is the violin plot. I selected a significant differentially expressed gene 'KDM4B' as example.

First let's review the model results for these targets:

Gene	Subset	Contrast	Estimate	Pr(>\|t\|)	FDR	Color
KDM4B	Tissue	1a - 2	0.4334072	0.0277851	0.4365136	NS or FC < 0.5

Let's look at the distribution for KDM4B.

Heatmap of Significant Genes

In addition to generating individual gene boxplots or volcano plots, we can again create a heatmap from our data. Here, we plot with all genes with a P < 0.05.

Analysis Between Group Analysis: 1a vs 3

We can show the top differentially expressed genes. The unfiltered table (including all genes) with the statistics is saved in differential_analysis.xls/1a_vs_3.

DE results for top differentially expressed genes (`P<0.05 & abs(Estimate)>=0.5)
Gene	Contrast	Estimate	Pr(>\|t\|)	FDR
MYH11	1a - 3	-1.447	0.000	0.015
TPM2	1a - 3	-1.205	0.000	0.015
TPM1	1a - 3	-0.595	0.000	0.022
TAGLN	1a - 3	-1.622	0.000	0.022
SMIM17	1a - 3	-0.588	0.000	0.023
FLNA	1a - 3	-1.673	0.000	0.023
ANTXR1	1a - 3	-0.605	0.000	0.023
MYL9	1a - 3	-1.591	0.000	0.023
NKX1-2	1a - 3	-0.818	0.000	0.023
CSRP1	1a - 3	-0.720	0.000	0.023
CNN1	1a - 3	-0.753	0.001	0.029
ACTN1	1a - 3	-0.693	0.001	0.040
ACTA2	1a - 3	-1.362	0.001	0.040
ITGA8	1a - 3	-0.710	0.001	0.040
CCN3	1a - 3	-0.889	0.001	0.041
HLA-B	1a - 3	1.027	0.001	0.045
LTBP1	1a - 3	-0.763	0.001	0.049
S100A6	1a - 3	-0.971	0.002	0.057
DUSP1	1a - 3	-0.796	0.002	0.057
IGHA1	1a - 3	0.683	0.002	0.057
MFAP4	1a - 3	-0.817	0.002	0.058
MYH9	1a - 3	-0.936	0.002	0.058
COL6A2	1a - 3	-0.882	0.002	0.059
ELOVL5	1a - 3	0.543	0.003	0.078
TUBA1C	1a - 3	-0.522	0.003	0.078
ID2	1a - 3	-0.523	0.004	0.081
ITGB1	1a - 3	-0.763	0.004	0.084
SPARC	1a - 3	-0.707	0.004	0.084
S100A4	1a - 3	-0.540	0.004	0.085
RPS21	1a - 3	-0.621	0.006	0.101
RHOB	1a - 3	-1.306	0.006	0.101
VIM	1a - 3	-1.200	0.006	0.103
FBLN5	1a - 3	-0.680	0.007	0.103
HBA1	1a - 3	0.726	0.006	0.103
CCL2	1a - 3	0.511	0.008	0.124
COL4A2	1a - 3	-0.604	0.008	0.124
CALD1	1a - 3	-0.753	0.009	0.126
GYPC	1a - 3	-0.627	0.010	0.135
RGS5	1a - 3	-0.969	0.010	0.136
GSN	1a - 3	-0.678	0.012	0.136
PFN1	1a - 3	-0.585	0.013	0.136
OGN	1a - 3	-0.772	0.012	0.136
MYL6	1a - 3	-0.670	0.011	0.136
PRMT8	1a - 3	0.585	0.014	0.137
BTG2	1a - 3	0.533	0.016	0.148
MFGE8	1a - 3	-0.593	0.018	0.160
H2BC5	1a - 3	0.593	0.020	0.167
CNN3	1a - 3	-0.607	0.024	0.185
RPLP1	1a - 3	-0.622	0.025	0.186
HSPB1	1a - 3	-0.945	0.025	0.186
DSTN	1a - 3	-0.540	0.024	0.186
RPL22	1a - 3	-0.550	0.026	0.188
FHL1	1a - 3	-0.548	0.026	0.189
PTMA	1a - 3	-0.819	0.026	0.189
C11orf96	1a - 3	-0.814	0.027	0.189
SIRT6	1a - 3	0.717	0.028	0.191
IGHG2	1a - 3	2.111	0.028	0.192
CCN1	1a - 3	-0.500	0.031	0.199
IGHG4	1a - 3	2.259	0.031	0.199
HBA2	1a - 3	0.793	0.033	0.207
TIMP2	1a - 3	-0.615	0.035	0.212
EFEMP1	1a - 3	-0.592	0.035	0.212
RPL9	1a - 3	-0.570	0.038	0.226
A2M	1a - 3	-0.548	0.045	0.253

Volcano Plots

A canonical visualization for interpreting differential gene expression results is the volcano plot.

Plotting Genes of Interest

A simple and effective plot to view individual genes is the violin plot. I selected a significant differentially expressed gene 'PDLIM3' as example.

First let's review the model results for these targets:

Gene	Subset	Contrast	Estimate	Pr(>\|t\|)	FDR	Color
PDLIM3	Tissue	1a - 3	-0.4703381	0.0003127	0.0228538	NS or FC < 0.5

Let's look at the distribution for PDLIM3.

Heatmap of Significant Genes

In addition to generating individual gene boxplots or volcano plots, we can again create a heatmap from our data. Here, we plot with all genes with a P < 0.05.

Analysis Between Group Analysis: 1a vs 2+3

We can show the top differentially expressed genes. The unfiltered table (including all genes) with the statistics is saved in differential_analysis.xls/1a_vs_2+3.

DE results for top differentially expressed genes (`P<0.05 & abs(Estimate)>=0.5)
Gene	Contrast	Estimate	Pr(>\|t\|)	FDR
NKX1-2	1a - 2+3	-0.787	0.000	0.005
SMIM17	1a - 2+3	-0.533	0.000	0.085
FBLN5	1a - 2+3	-0.623	0.001	0.094
GYPC	1a - 2+3	-0.656	0.001	0.094
HBA1	1a - 2+3	0.715	0.001	0.094
MYH11	1a - 2+3	-1.101	0.002	0.096
CNN3	1a - 2+3	-0.660	0.002	0.096
HBA2	1a - 2+3	0.805	0.004	0.099
LTBP1	1a - 2+3	-0.616	0.002	0.099
ACTN1	1a - 2+3	-0.593	0.004	0.099
COL6A2	1a - 2+3	-0.735	0.004	0.099
RHOB	1a - 2+3	-1.058	0.004	0.099
MYL9	1a - 2+3	-1.185	0.004	0.099
BTG2	1a - 2+3	0.527	0.003	0.099
HLA-B	1a - 2+3	0.871	0.003	0.099
H2BC5	1a - 2+3	0.553	0.004	0.099
IGHG2	1a - 2+3	1.987	0.006	0.111
IGHG4	1a - 2+3	2.124	0.006	0.121
TPM2	1a - 2+3	-0.873	0.007	0.124
PFN1	1a - 2+3	-0.515	0.007	0.124
ITGB1	1a - 2+3	-0.572	0.009	0.138
CCN3	1a - 2+3	-0.642	0.009	0.138
TAGLN	1a - 2+3	-1.140	0.009	0.138
CSRP1	1a - 2+3	-0.528	0.012	0.158
IGHG3	1a - 2+3	1.763	0.012	0.158
FLNA	1a - 2+3	-1.233	0.013	0.163
MYH9	1a - 2+3	-0.730	0.014	0.165
ITGA8	1a - 2+3	-0.535	0.013	0.165
GSN	1a - 2+3	-0.652	0.015	0.168
CNN1	1a - 2+3	-0.552	0.016	0.168
S100A6	1a - 2+3	-0.695	0.016	0.168
MFAP4	1a - 2+3	-0.574	0.017	0.171
CALD1	1a - 2+3	-0.544	0.020	0.185
IGKC	1a - 2+3	1.700	0.023	0.205
IGHG1	1a - 2+3	1.801	0.027	0.219
COL4A2	1a - 2+3	-0.502	0.029	0.224
RGS5	1a - 2+3	-0.674	0.029	0.225
VIM	1a - 2+3	-0.905	0.036	0.260
PTMA	1a - 2+3	-0.620	0.037	0.261
TIMP2	1a - 2+3	-0.535	0.040	0.270
SPARC	1a - 2+3	-0.505	0.044	0.278
IGFBP5	1a - 2+3	-0.560	0.044	0.278
SPARCL1	1a - 2+3	-0.679	0.047	0.291

Volcano Plots

A canonical visualization for interpreting differential gene expression results is the volcano plot.

Plotting Genes of Interest

A simple and effective plot to view individual genes is the violin plot. I selected a significant differentially expressed gene 'NKX1-2' as example.

First let's review the model results for these targets:

Gene	Subset	Contrast	Estimate	Pr(>\|t\|)	FDR	Color
NKX1-2	Tissue	1a - 2+3	-0.7874966	7.8e-06	0.0052511	FDR < 0.05

Let's look at the distribution for NKX1-2.

Heatmap of Significant Genes

In addition to generating individual gene boxplots or volcano plots, we can again create a heatmap from our data. Here, we plot with all genes with a P < 0.05.

Analysis Between Group Analysis: 1b vs 2

We can show the top differentially expressed genes. The unfiltered table (including all genes) with the statistics is saved in differential_analysis.xls/1b_vs_2.

DE results for top differentially expressed genes (`P<0.05 & abs(Estimate)>=0.5)
Gene	Contrast	Estimate	Pr(>\|t\|)	FDR
PITX1	1b - 2	1.549	0.000	0.001
USP17L11	1b - 2	-0.525	0.000	0.008
ANK1	1b - 2	1.084	0.000	0.008
MGARP	1b - 2	1.026	0.000	0.017
USP17L3	1b - 2	-0.531	0.000	0.040
PPIG	1b - 2	0.879	0.001	0.050
AREG	1b - 2	0.681	0.001	0.064
TRIM36	1b - 2	0.577	0.004	0.185
ATP5MC2	1b - 2	0.548	0.010	0.298
DHRS1	1b - 2	-0.532	0.011	0.298
SUPT7L	1b - 2	0.764	0.015	0.359

Volcano Plots

A canonical visualization for interpreting differential gene expression results is the volcano plot.

Plotting Genes of Interest

A simple and effective plot to view individual genes is the violin plot. I selected a significant differentially expressed gene 'PITX1' as example.

First let's review the model results for these targets:

Gene	Subset	Contrast	Estimate	Pr(>\|t\|)	FDR	Color
PITX1	Tissue	1b - 2	1.54882	2e-06	0.001315	FDR < 0.05

Let's look at the distribution for PITX1.

Heatmap of Significant Genes

In addition to generating individual gene boxplots or volcano plots, we can again create a heatmap from our data. Here, we plot with all genes with a P < 0.05.

Analysis Between Group Analysis: 1b vs 3

We can show the top differentially expressed genes. The unfiltered table (including all genes) with the statistics is saved in differential_analysis.xls/1b_vs_3.

DE results for top differentially expressed genes (`P<0.05 & abs(Estimate)>=0.5)
Gene	Contrast	Estimate	Pr(>\|t\|)	FDR
MYH11	1b - 3	-1.083	0.000	0.004
TPM2	1b - 3	-1.051	0.000	0.004
CNN1	1b - 3	-0.736	0.000	0.005
CSRP1	1b - 3	-0.638	0.000	0.005
MFAP4	1b - 3	-0.692	0.000	0.007
TAGLN	1b - 3	-1.141	0.000	0.007
ACTA2	1b - 3	-1.231	0.000	0.008
MYL9	1b - 3	-1.044	0.000	0.008
C4orf46	1b - 3	0.554	0.000	0.009
RGS5	1b - 3	-0.769	0.000	0.011
FLNA	1b - 3	-1.178	0.000	0.013
CCN3	1b - 3	-0.677	0.000	0.013
ACTN1	1b - 3	-0.556	0.000	0.014
KANK3	1b - 3	0.569	0.001	0.017
MYH9	1b - 3	-0.678	0.001	0.018
ITGA8	1b - 3	-0.505	0.001	0.018
RHOB	1b - 3	-0.695	0.001	0.018
DUSP1	1b - 3	-0.614	0.001	0.018
S100A6	1b - 3	-0.643	0.001	0.019
SPARC	1b - 3	-0.634	0.001	0.019
NR4A1	1b - 3	-0.718	0.001	0.026
RPL37A	1b - 3	-0.612	0.002	0.032
OGN	1b - 3	-0.507	0.003	0.039
CALD1	1b - 3	-0.505	0.003	0.041
C11orf96	1b - 3	-0.610	0.003	0.043
HLA-B	1b - 3	0.566	0.003	0.043
RPLP1	1b - 3	-0.554	0.005	0.049
SIRT6	1b - 3	0.514	0.005	0.053
LYRM2	1b - 3	0.551	0.005	0.053
FCF1	1b - 3	0.575	0.007	0.058
IGHG3	1b - 3	0.563	0.007	0.058
HSPB1	1b - 3	-0.693	0.008	0.064
DHRS1	1b - 3	0.881	0.009	0.067
SPARCL1	1b - 3	-0.625	0.013	0.080
IGHG4	1b - 3	0.889	0.013	0.080
BGN	1b - 3	-0.888	0.014	0.082
VIM	1b - 3	-0.695	0.016	0.087
IGFBP7	1b - 3	-0.615	0.021	0.103
IGHG1	1b - 3	0.705	0.026	0.114
TMEM106C	1b - 3	-0.672	0.030	0.118
PITX1	1b - 3	0.801	0.033	0.124
IGHG2	1b - 3	0.688	0.033	0.124
PTBP1	1b - 3	0.732	0.040	0.139
IGKC	1b - 3	0.641	0.044	0.146
MZT2B	1b - 3	0.803	0.046	0.150
ANK1	1b - 3	0.575	0.050	0.156

Volcano Plots

A canonical visualization for interpreting differential gene expression results is the volcano plot.

Plotting Genes of Interest

A simple and effective plot to view individual genes is the violin plot. I selected a significant differentially expressed gene 'TPM2' as example.

First let's review the model results for these targets:

Gene	Subset	Contrast	Estimate	Pr(>\|t\|)	FDR	Color
TPM2	Tissue	1b - 3	-1.050616	1.69e-05	0.0038044	FDR < 0.05

Let's look at the distribution for TPM2.

Heatmap of Significant Genes

In addition to generating individual gene boxplots or volcano plots, we can again create a heatmap from our data. Here, we plot with all genes with a P < 0.05.

Analysis Between Group Analysis: 1b vs 2+3

We can show the top differentially expressed genes. The unfiltered table (including all genes) with the statistics is saved in differential_analysis.xls/1b_vs_2+3.

DE results for top differentially expressed genes (`P<0.05 & abs(Estimate)>=0.5)
Gene	Contrast	Estimate	Pr(>\|t\|)	FDR
MYH11	1b - 2+3	-0.754	0.001	0.061
CNN1	1b - 2+3	-0.557	0.001	0.061
PITX1	1b - 2+3	1.122	0.000	0.061
TPM2	1b - 2+3	-0.748	0.001	0.061
ANK1	1b - 2+3	0.793	0.001	0.061
IGHG3	1b - 2+3	0.515	0.002	0.061
MGARP	1b - 2+3	0.714	0.004	0.109
IGHG4	1b - 2+3	0.757	0.006	0.156
PPIG	1b - 2+3	0.604	0.008	0.169
MYL9	1b - 2+3	-0.639	0.011	0.169
IGHG1	1b - 2+3	0.612	0.011	0.169
FLNA	1b - 2+3	-0.757	0.018	0.175
ACTA2	1b - 2+3	-0.718	0.017	0.175
PTBP1	1b - 2+3	0.672	0.016	0.175
TAGLN	1b - 2+3	-0.658	0.017	0.175
IGHG2	1b - 2+3	0.565	0.021	0.185
IGKC	1b - 2+3	0.529	0.034	0.237
HSPB1	1b - 2+3	-0.558	0.035	0.238

Volcano Plots

A canonical visualization for interpreting differential gene expression results is the volcano plot.

Plotting Genes of Interest

A simple and effective plot to view individual genes is the violin plot. I selected a significant differentially expressed gene 'PITX1' as example.

First let's review the model results for these targets:

Gene	Subset	Contrast	Estimate	Pr(>\|t\|)	FDR	Color
PITX1	Tissue	1b - 2+3	1.121588	0.0004347	0.0609458	P < 0.05

Let's look at the distribution for PITX1.

Heatmap of Significant Genes

In addition to generating individual gene boxplots or volcano plots, we can again create a heatmap from our data. Here, we plot with all genes with a P < 0.05.

Analyzing Spatial Transcriptomics Data M666_UKE_Hamburg

21 Oktober, 2021

Introduction

Steps & Scope

Getting started

Loading Data

Study Design

Modules Used

Sample Overview

QC & Pre-processing

Segment QC

Select Segment QC

Remove flagged segments

Probe QC

Set Probe QC Flags

Exclude Outlier Probes

Create Gene-level Count Data

Limit of Quantification

Filtering

Segment Gene Detection

Gene Detection Rate

Gene Filtering

Normalization

Unsupervised Analysis

UMAP & t-SNE

Clustering high CV Genes

Differential Expression

Analysis Between Group Analysis: 1 vs 2

Volcano Plots

Plotting Genes of Interest

Heatmap of Significant Genes

Analysis Between Group Analysis: 1 vs 3

Volcano Plots

Plotting Genes of Interest

Heatmap of Significant Genes

Analysis Between Group Analysis: 1 vs 2+3

Volcano Plots

Plotting Genes of Interest

Heatmap of Significant Genes

Analysis Between Group Analysis: 1a vs 2

Volcano Plots

Plotting Genes of Interest

Heatmap of Significant Genes

Analysis Between Group Analysis: 1a vs 3

Volcano Plots

Plotting Genes of Interest

Heatmap of Significant Genes

Analysis Between Group Analysis: 1a vs 2+3

Volcano Plots

Plotting Genes of Interest

Heatmap of Significant Genes

Analysis Between Group Analysis: 1b vs 2

Volcano Plots

Plotting Genes of Interest

Heatmap of Significant Genes

Analysis Between Group Analysis: 1b vs 3

Volcano Plots

Plotting Genes of Interest

Heatmap of Significant Genes

Analysis Between Group Analysis: 1b vs 2+3

Volcano Plots

Plotting Genes of Interest

Heatmap of Significant Genes