R version: R version 4.1.1 (2021-08-10)
Bioconductor version: 3.13
Package: 1.1.4 <>

Introduction

The GeoMx Digital Spatial Profiler (DSP) is a platform for capturing spatially resolved high-plex gene (or protein) expression data from tissue. In particular, formalin-fixed paraffin-embedded (FFPE) or fresh-frozen (FF) tissue sections are stained with barcoded in-situ hybridization probes that bind to endogenous mRNA transcripts. The user then selects Locations of the interest (ROI) to profile; if desired, each ROI segment can be further sub-divided into areas of illumination (AOI) based on tissue morphology. The GeoMx then photo-cleaves and collects expression barcodes for each AOI segment separately for downstream sequencing and data processing.

The final results are spatially resolved unique expression datasets for every protein-coding gene (>18,000 genes) from every individual segments profiled from tissue.

Steps & Scope

We start with raw gene expression count files. Using open source R packages, we evaluate samples and expression targets and prepare gene-level count data for downstream analysis. To understand our spatial data, we perform unsupervised clustering, dimension reduction, and differential gene expression analyses and visually explore the results.

Our specific objectives:

  • Load GeoMx raw count files and metadata (DCC, PKC, and annotation file)
  • Perform quality control (QC), filtering, and normalization to prepare the data
  • Perform downstream visualizations and statistical analyses including:
    • Dimension reduction with UMAP or t-SNE
    • Heatmaps and other visualizations of gene expression
    • Differential expression analyses with linear mixed effect models

Getting started

Loading Data

In this analysis, we will analyze a dataset created with the human whole transcriptome atlas (WTA) assay. The dataset includes 8x3 PCR positive Covid (Group1), 3x3 PCR negative Covid (Group2), and 4x3 PCR negative control (Group3) samples. Regions of interest (ROI) were spatially profiled to focus on two different structures: lumen or internal.

The key data files are:

  • DCCs files - expression count data and sequencing quality metadata
  • PKCs file(s) - probe assay metadata describing the gene targets present in the data
  • Annotation file - useful tissue information.

We then load the data to create a data object using the readNanoStringGeoMxSet function.

All of the expression, annotation, and probe information are now linked and stored together into a single data object.

Study Design

Modules Used

First let's access the PKC files, to ensure that the expected PKCs have been loaded for this study. For the data we are using the file Hs_R_NGS_WTA_v1.0.pkc.

Sample Overview

Now that we have loaded the data, we can visually summarize the experimental design for our dataset to look at the different types of samples and ROI/AOI segments that have been profiled. We present this information in a Sankey diagram.

QC & Pre-processing

The steps above encompass the standard pre-processing workflow for GeoMx data. In short, they represent the selection of ROI/AOI segments and genes based on quality control (QC) or limit of quantification (LOQ) metrics and data normalization.

Before we begin, we will shift any expression counts with a value of 0 to 1 to enable in downstream transformations.

Segment QC

We first assess sequencing quality and adequate tissue sampling for every ROI/AOI segment.

Every ROI/AOI segment will be tested for:

  • Raw sequencing reads: segments with <1000 raw reads are removed.
  • % Aligned,% Trimmed, or % Stitched sequencing reads: segments below ~80% for one or more of these QC parameters are removed.
  • % Sequencing saturation ([1-deduplicated reads/aligned reads]%): segments below ~50% require additional sequencing to capture full sample diversity and are not typically analyzed until improved.
  • Negative Count: this is the geometric mean of the several unique negative probes in the GeoMx panel that do not target mRNA and establish the background count level per segment; segments with low negative counts (<5-10) are not necessarily removed but may be studied closer for low endogenous gene signal and/or insufficient tissue sampling.
  • Nuclei: >100 nuclei per segment is generally recommended; however, this cutoff is highly study/tissue dependent and may need to be reduced; what is most important is consistency in the nuclei distribution for segments within the study.
  • Area: generally correlates with nuclei; a strict cutoff is not generally applied based on area.

Select Segment QC

First, we select the QC parameter cutoffs, against which our ROI/AOI segments will be tested and flagged appropriately. We have selected the appropriate study-specific parameters for this study.

# Default QC cutoffs are commented in () adjacent to the respective parameters
# study-specific values were selected after visualizing the QC results in more
# detail below
QC_params <-
  list(minSegmentReads = 1000, # Minimum number of reads (1000)
       percentTrimmed = 80,    # Minimum % of reads trimmed (80%)
       percentStitched = 80,   # Minimum % of reads stitched (80%)
       percentAligned = 75,    # Minimum % of reads aligned to known targets (80%)
       percentSaturation = 50, # Minimum sequencing saturation (50%)
       minNegativeCount = 1,   # Minimum negative control counts (10)
       minNuclei = 20,         # Minimum # of cells observed in a segment (100)
       minArea = 1000)         # Minimum segment area (5000)
m666Data <-
  setSegmentQCFlags(m666Data, qcCutoffs = QC_params)        

# Collate QC Results
QCResults <- protocolData(m666Data)[["QCFlags"]]
flag_columns <- colnames(QCResults)
QC_Summary <- data.frame(Pass = colSums(!QCResults[, flag_columns]),
                         Warning = colSums(QCResults[, flag_columns]))
QCResults$QCStatus <- apply(QCResults, 1L, function(x) {
  ifelse(sum(x) == 0L, "PASS", "WARNING")
})
QC_Summary["TOTAL FLAGS", ] <-
  c(sum(QCResults[, "QCStatus"] == "PASS"),
    sum(QCResults[, "QCStatus"] == "WARNING"))

We plot all of the QC Summary information in a table.

QC Summary Table for each Segment
Pass Warning
LowReads 45 0
LowTrimmed 45 0
LowStitched 45 0
LowAligned 45 0
LowSaturation 45 0
LowNegatives 45 0
TOTAL FLAGS 45 0

Remove flagged segments

As the final step in Segment QC, we don't need to remove any flagged segments since all segments pass QC.

#> Features  Samples 
#>    18815       45

Probe QC

Before we summarize our data into gene-level count data, we will remove low-performing probes. In short, this QC is an outlier removal process, whereby probes are either removed entirely from the study (global) or from specific segments (local). The QC applies to gene targets for which there are multiple distinct probes representing the count for a gene per segment. In WTA data, only one probe exists per target gene; thus, Probe QC does not apply to the endogenous genes in the panel. Rather, it is performed on the negative control probes; there are multiple probes representing our negative controls, which do not target any sequence in the genome. These probes enable calculation of the background per segment and will be important for determining gene detection downstream.

After Probe QC, there will always remain at least one probe representing every gene target. In other words, Probe QC never removes genes from your data.

Set Probe QC Flags

A probe is removed globally from the dataset if either of the following is true:

  • the geometric mean of that probe's counts from all segments divided by the geometric mean of all probe counts representing the target from all segments is less than 0.1
  • the probe is an outlier according to the Grubb's test in at least 20% of the segments

A probe is removed locally (from a given segment) if the probe is an outlier according to the Grubb's test in that segment.

We report the number of global and local outlier probes.

Probes flagged or passed as outliers
Passed Global Local
18813 1 1

Exclude Outlier Probes

#Subset object to exclude all that did not pass Ratio & Global testing
ProbeQCPassed <- 
  subset(m666Data, 
         fData(m666Data)[["QCFlags"]][,c("LowProbeRatio")] == FALSE &
           fData(m666Data)[["QCFlags"]][,c("GlobalGrubbsOutlier")] == FALSE)
dim(ProbeQCPassed)
#> Features  Samples 
#>    18814       45
m666Data <- ProbeQCPassed 

Create Gene-level Count Data

With our Probe QC steps complete, we will generate a gene-level count matrix. The count for any gene with multiple probes per segment is calculated as the geometric mean of those probes (see gene_level_counts.xlsx).

Limit of Quantification

In addition to Segment and Probe QC, we also determine the limit of quantification (LOQ) per segment. The LOQ is calculated based on the distribution of negative control probes and is intended to approximate the quantifiable limit of gene expression per segment. The formula for calculating the LOQ in a \(i^{th}\) segment is:

\[LOQ_{i} = geomean(NegProbe_{i}) * geoSD(NegProbe_{i})^{n}\]

We typically use 2 geometric standard deviations (\(n = 2\)) above the geometric mean as the LOQ, which is reasonable for most studies. We keep the segments if their LOQ > 2.

Filtering

After determining the limit of quantification (LOQ) per segment, we filter out either segments and/or genes with abnormally low signal. Filtering is an important step to focus on the true biological data of interest.

We determine the number of genes detected in each segment across the dataset.

Segment Gene Detection

We first filter out segments with exceptionally low signal. These segments will have a small fraction of panel genes detected above the LOQ relative to the other segments in the study. Let's visualize the distribution of segments with respect to their % genes detected:

We can also create a table to review what Group (1, 1a, 2 vs 3) is going to be impacted by each threshold:

1a 1b 2 3
<1% 1 3 1 1
1-5% 5 15 8 11
5-10% 0 0 0 0
10-15% 0 0 0 0
>15% 0 0 0 0

In this example, we choose to remove segments with less than 0.1% of the genes detected.

target_m666Data <-
  target_m666Data[, pData(target_m666Data)$GeneDetectionRate >= .001]

dim(target_m666Data)
#> Features  Samples 
#>    18677       45

Gene Detection Rate

Next, we determine the detection rate for genes across the study. To illustrate this idea, we create a small gene list including all genes in which DetectionRate>=0.97 to review.

Detection rate for Genes of Interest
Gene Detection, # Segments Detection Rate, % of Segments
NFKB2 44 97.8%
CKS1B 44 97.8%
CUX1 44 97.8%
RNPEP 44 97.8%
GDAP1L1 44 97.8%
FNDC11 44 97.8%
ZFHX4 44 97.8%
WLS 44 97.8%
CHODL 44 97.8%
MINDY3 44 97.8%
MPIG6B 44 97.8%
TMEM30A 45 100.0%
CRABP2 45 100.0%
CPS1 45 100.0%
CSNK2A2 45 100.0%
DR1 45 100.0%
DSC3 45 100.0%
MAZ 45 100.0%
TMA7 45 100.0%
TMEM106C 45 100.0%
SNIP1 45 100.0%
MZT2B 45 100.0%
IFT74 45 100.0%
PTBP1 45 100.0%
ADAM15 45 100.0%
DHRS1 45 100.0%

We can see that individual genes are detected to varying degrees in the segments, which leads us to the next QC we will perform across the dataset.

Gene Filtering

We will graph the total number of genes detected in different percentages of segments. Based on the visualization below, we can better understand global gene detection in our study and select how many low detected genes to filter out of the dataset. Gene filtering increases performance of downstream statistical tests and improves interpretation of true biological signal.

We typically set a % Segment cutoff ranging from 5-20% based on the biological diversity of our dataset. For this study, we selected 5% as our cutoff. In other words, we focused on the genes detected in at least 5% of our segments; we filtered out the remainder of the targets.

x
Features 674
Samples 45

Normalization

We will now normalize the GeoMx data for downstream visualizations and differential expression. Q3 normalization is typically the preferred normalization strategy for most DSP-NGS RNA studies.

Before normalization, we will explore the relationship between the upper quartile (Q3) of the counts in each segment with the geometric mean of the negative control probes in the data. Ideally, there should be a separation between these two values to ensure we have stable measure of Q3 signal.

As expected, we see separation of the Q3 and negative probe counts at both the distribution (A) and per segment (B) levels. Next, we normalize our data.

We use the normalize function from NanoStringNCTools to create normalization factors reflecting each data type. Upper quartile (Q3) normalization is performed using norm_method = "quant" setting the desiredQuantile flag to 0.75. Other quantiles could be specified by changing that value. We save the normalized data to a specific slot using toELT = "q_norm".

# Q3 norm (75th percentile) for WTA/CTA  with or without custom spike-ins
target_m666Data <- normalize(target_m666Data , data_type = "RNA",
                             norm_method = "quant", 
                             desiredQuantile = .75,
                             toElt = "q_norm")

To demonstrate the effects of normalization, we graph a representative boxplots of the data for individual segments before and after normalization.

Unsupervised Analysis

UMAP & t-SNE

One common approach to understanding high-plex data is dimension reduction. Two common methods are UMAP and tSNE, which are non-orthogonally constrained projections that cluster samples based on overall gene expression. In this study, we see by either UMAP (from the umap package) or tSNE (from the Rtsne package), clusters of segments related to structure (glomeruli or internals) and disease status (normal or diabetic kidney disease).

Clustering high CV Genes

Another approach to explore the data is to calculate the coefficient of variation (CV) for each gene (\(g\)) using the formula \(CV_g = SD_g/mean_g\). We then identify for genes with high CVs that should have large differences across the various profiled segments. This unbiased approach can reveal highly variable genes across the study (see high_CV_Genes.xlsx).

We plot the results in a heatmap.

#>     IGHG1     IGHG4     IGHG2      IGKC     IGHG3 
#> 0.2565193 0.2427206 0.2280666 0.2238702 0.1992845

Differential Expression

A central method for exploring differences between groups of segments or samples is to perform differential gene expression analysis. A common statistical approach is to use a linear mixed-effect model (LMM).

Analysis Between Group Analysis: 1 vs 2

We can show the top differentially expressed genes. The unfiltered table (including all genes) with the statistics is saved in differential_analysis.xls/1_vs_2.

DE results for top differentially expressed genes (`P<0.05 & abs(Estimate)>=0.5)
Gene Contrast Estimate Pr(>|t|) FDR
USP17L11 1 - 2 -0.518 0.000 0.010
USP17L3 1 - 2 -0.532 0.001 0.136
PITX1 1 - 2 1.562 0.001 0.157
ANK1 1 - 2 1.123 0.006 0.312
MGARP 1 - 2 1.079 0.011 0.499
ATP5MC2 1 - 2 0.529 0.022 0.565
AREG 1 - 2 0.773 0.026 0.565
PPIG 1 - 2 0.918 0.024 0.565
SUPT7L 1 - 2 0.711 0.027 0.565
DHRS1 1 - 2 -0.528 0.022 0.565

Volcano Plots

A canonical visualization for interpreting differential gene expression results is the volcano plot.

Plotting Genes of Interest

A simple and effective plot to view individual genes is the violin plot. I selected a significant differentially expressed gene 'ATP5MC2' as example.

First let's review the model results for these targets:

Gene Subset Contrast Estimate Pr(>|t|) FDR Color
ATP5MC2 Tissue 1 - 2 0.5293242 0.0215988 0.5645755 P < 0.05

Let's look at the distribution for ATP5MC2.

Heatmap of Significant Genes

In addition to generating individual gene boxplots or volcano plots, we can again create a heatmap from our data. Here, we plot with all genes with a P < 0.05.

Analysis Between Group Analysis: 1 vs 3

We can show the top differentially expressed genes. The unfiltered table (including all genes) with the statistics is saved in differential_analysis.xls/1_vs_3.

DE results for top differentially expressed genes (`P<0.05 & abs(Estimate)>=0.5)
Gene Contrast Estimate Pr(>|t|) FDR
TPM2 1 - 3 -1.059 0.000 0.001
CSRP1 1 - 3 -0.658 0.000 0.001
MYH11 1 - 3 -1.144 0.000 0.001
CNN1 1 - 3 -0.723 0.000 0.001
MFAP4 1 - 3 -0.723 0.000 0.001
TAGLN 1 - 3 -1.261 0.000 0.001
ACTA2 1 - 3 -1.246 0.000 0.001
MYL9 1 - 3 -1.181 0.000 0.001
CCN3 1 - 3 -0.730 0.000 0.002
FLNA 1 - 3 -1.270 0.000 0.003
ACTN1 1 - 3 -0.577 0.000 0.003
RGS5 1 - 3 -0.819 0.000 0.003
ITGA8 1 - 3 -0.541 0.000 0.004
SPARC 1 - 3 -0.652 0.000 0.005
C4orf46 1 - 3 0.529 0.000 0.005
MYH9 1 - 3 -0.743 0.000 0.006
CALD1 1 - 3 -0.555 0.000 0.010
DUSP1 1 - 3 -0.618 0.000 0.010
S100A6 1 - 3 -0.725 0.000 0.010
MYL6 1 - 3 -0.523 0.001 0.012
NR4A1 1 - 3 -0.710 0.001 0.012
RHOB 1 - 3 -0.848 0.001 0.015
DSTN 1 - 3 -0.507 0.001 0.015
RPL37A 1 - 3 -0.600 0.001 0.018
KANK3 1 - 3 0.532 0.001 0.018
SIRT6 1 - 3 0.565 0.001 0.018
OGN 1 - 3 -0.573 0.001 0.020
C11orf96 1 - 3 -0.651 0.002 0.022
HLA-B 1 - 3 0.645 0.002 0.028
COL6A2 1 - 3 -0.546 0.003 0.032
RPLP1 1 - 3 -0.571 0.003 0.033
HSPB1 1 - 3 -0.734 0.003 0.033
VIM 1 - 3 -0.807 0.004 0.043
DHRS1 1 - 3 0.885 0.006 0.047
BGN 1 - 3 -0.910 0.006 0.048
PTMA 1 - 3 -0.544 0.007 0.053
SPARCL1 1 - 3 -0.670 0.008 0.053
LYRM2 1 - 3 0.531 0.008 0.056
IGFBP7 1 - 3 -0.623 0.016 0.085
FCF1 1 - 3 0.530 0.021 0.096
MZT2B 1 - 3 0.825 0.034 0.130
PTBP1 1 - 3 0.712 0.037 0.138
IGHG4 1 - 3 1.233 0.038 0.140

Volcano Plots

A canonical visualization for interpreting differential gene expression results is the volcano plot.

Plotting Genes of Interest

A simple and effective plot to view individual genes is the violin plot. I selected a significant differentially expressed gene 'TPM2' as example.

First let's review the model results for these targets:

Gene Subset Contrast Estimate Pr(>|t|) FDR Color
TPM2 Tissue 1 - 3 -1.0594 1.2e-06 0.0005436 FDR < 0.001

Let's look at the distribution for TPM2.

Heatmap of Significant Genes

In addition to generating individual gene boxplots or volcano plots, we can again create a heatmap from our data. Here, we plot with all genes with a P < 0.05.

Analysis Between Group Analysis: 1 vs 2+3

We can show the top differentially expressed genes. The unfiltered table (including all genes) with the statistics is saved in differential_analysis.xls/1_vs_2+3.

DE results for top differentially expressed genes (`P<0.05 & abs(Estimate)>=0.5)
Gene Contrast Estimate Pr(>|t|) FDR
MYH11 1 - 2+3 -0.822 0.000 0.016
CNN1 1 - 2+3 -0.553 0.000 0.016
TPM2 1 - 2+3 -0.768 0.000 0.016
MYL9 1 - 2+3 -0.776 0.001 0.046
RGS5 1 - 2+3 -0.525 0.002 0.056
MYH9 1 - 2+3 -0.537 0.002 0.068
PITX1 1 - 2+3 1.135 0.002 0.068
RHOB 1 - 2+3 -0.600 0.003 0.068
TAGLN 1 - 2+3 -0.779 0.002 0.068
FLNA 1 - 2+3 -0.871 0.003 0.072
ACTA2 1 - 2+3 -0.748 0.005 0.088
HLA-B 1 - 2+3 0.510 0.005 0.090
ANK1 1 - 2+3 0.832 0.006 0.095
SPARCL1 1 - 2+3 -0.527 0.008 0.100
HSPB1 1 - 2+3 -0.614 0.011 0.110
MGARP 1 - 2+3 0.767 0.015 0.127
PTBP1 1 - 2+3 0.652 0.015 0.127
IGHG4 1 - 2+3 1.098 0.015 0.127
AREG 1 - 2+3 0.584 0.016 0.136
IGHG2 1 - 2+3 0.920 0.026 0.182
IGHG3 1 - 2+3 0.827 0.027 0.182
PPIG 1 - 2+3 0.644 0.030 0.193
TRIM36 1 - 2+3 0.518 0.034 0.208
VIM 1 - 2+3 -0.527 0.038 0.215
IGHG1 1 - 2+3 0.910 0.043 0.231

Volcano Plots

A canonical visualization for interpreting differential gene expression results is the volcano plot.

Plotting Genes of Interest

A simple and effective plot to view individual genes is the violin plot. I selected a significant differentially expressed gene 'CNN1' as example.

First let's review the model results for these targets:

Gene Subset Contrast Estimate Pr(>|t|) FDR Color
CNN1 Tissue 1 - 2+3 -0.5525309 0.0001523 0.0159073 FDR < 0.05

Let's look at the distribution for CNN1.

Heatmap of Significant Genes

In addition to generating individual gene boxplots or volcano plots, we can again create a heatmap from our data. Here, we plot with all genes with a P < 0.05.

Analysis Between Group Analysis: 1a vs 2

We can show the top differentially expressed genes. The unfiltered table (including all genes) with the statistics is saved in differential_analysis.xls/1a_vs_2.

DE results for top differentially expressed genes (`P<0.05 & abs(Estimate)>=0.5)
Gene Contrast Estimate Pr(>|t|) FDR
NKX1-2 1a - 2 -0.727 0.000 0.089
HBA1 1a - 2 0.701 0.014 0.379
GYPC 1a - 2 -0.620 0.018 0.416
CNN3 1a - 2 -0.731 0.022 0.426
HBA2 1a - 2 0.822 0.025 0.437
FBLN5 1a - 2 -0.547 0.030 0.437
USP17L3 1a - 2 -0.509 0.025 0.437
BTG2 1a - 2 0.519 0.038 0.459

Volcano Plots

A canonical visualization for interpreting differential gene expression results is the volcano plot.

Plotting Genes of Interest

A simple and effective plot to view individual genes is the violin plot. I selected a significant differentially expressed gene 'KDM4B' as example.

First let's review the model results for these targets:

Gene Subset Contrast Estimate Pr(>|t|) FDR Color
KDM4B Tissue 1a - 2 0.4334072 0.0277851 0.4365136 NS or FC < 0.5

Let's look at the distribution for KDM4B.

Heatmap of Significant Genes

In addition to generating individual gene boxplots or volcano plots, we can again create a heatmap from our data. Here, we plot with all genes with a P < 0.05.

Analysis Between Group Analysis: 1a vs 3

We can show the top differentially expressed genes. The unfiltered table (including all genes) with the statistics is saved in differential_analysis.xls/1a_vs_3.

DE results for top differentially expressed genes (`P<0.05 & abs(Estimate)>=0.5)
Gene Contrast Estimate Pr(>|t|) FDR
MYH11 1a - 3 -1.447 0.000 0.015
TPM2 1a - 3 -1.205 0.000 0.015
TPM1 1a - 3 -0.595 0.000 0.022
TAGLN 1a - 3 -1.622 0.000 0.022
SMIM17 1a - 3 -0.588 0.000 0.023
FLNA 1a - 3 -1.673 0.000 0.023
ANTXR1 1a - 3 -0.605 0.000 0.023
MYL9 1a - 3 -1.591 0.000 0.023
NKX1-2 1a - 3 -0.818 0.000 0.023
CSRP1 1a - 3 -0.720 0.000 0.023
CNN1 1a - 3 -0.753 0.001 0.029
ACTN1 1a - 3 -0.693 0.001 0.040
ACTA2 1a - 3 -1.362 0.001 0.040
ITGA8 1a - 3 -0.710 0.001 0.040
CCN3 1a - 3 -0.889 0.001 0.041
HLA-B 1a - 3 1.027 0.001 0.045
LTBP1 1a - 3 -0.763 0.001 0.049
S100A6 1a - 3 -0.971 0.002 0.057
DUSP1 1a - 3 -0.796 0.002 0.057
IGHA1 1a - 3 0.683 0.002 0.057
MFAP4 1a - 3 -0.817 0.002 0.058
MYH9 1a - 3 -0.936 0.002 0.058
COL6A2 1a - 3 -0.882 0.002 0.059
ELOVL5 1a - 3 0.543 0.003 0.078
TUBA1C 1a - 3 -0.522 0.003 0.078
ID2 1a - 3 -0.523 0.004 0.081
ITGB1 1a - 3 -0.763 0.004 0.084
SPARC 1a - 3 -0.707 0.004 0.084
S100A4 1a - 3 -0.540 0.004 0.085
RPS21 1a - 3 -0.621 0.006 0.101
RHOB 1a - 3 -1.306 0.006 0.101
VIM 1a - 3 -1.200 0.006 0.103
FBLN5 1a - 3 -0.680 0.007 0.103
HBA1 1a - 3 0.726 0.006 0.103
CCL2 1a - 3 0.511 0.008 0.124
COL4A2 1a - 3 -0.604 0.008 0.124
CALD1 1a - 3 -0.753 0.009 0.126
GYPC 1a - 3 -0.627 0.010 0.135
RGS5 1a - 3 -0.969 0.010 0.136
GSN 1a - 3 -0.678 0.012 0.136
PFN1 1a - 3 -0.585 0.013 0.136
OGN 1a - 3 -0.772 0.012 0.136
MYL6 1a - 3 -0.670 0.011 0.136
PRMT8 1a - 3 0.585 0.014 0.137
BTG2 1a - 3 0.533 0.016 0.148
MFGE8 1a - 3 -0.593 0.018 0.160
H2BC5 1a - 3 0.593 0.020 0.167
CNN3 1a - 3 -0.607 0.024 0.185
RPLP1 1a - 3 -0.622 0.025 0.186
HSPB1 1a - 3 -0.945 0.025 0.186
DSTN 1a - 3 -0.540 0.024 0.186
RPL22 1a - 3 -0.550 0.026 0.188
FHL1 1a - 3 -0.548 0.026 0.189
PTMA 1a - 3 -0.819 0.026 0.189
C11orf96 1a - 3 -0.814 0.027 0.189
SIRT6 1a - 3 0.717 0.028 0.191
IGHG2 1a - 3 2.111 0.028 0.192
CCN1 1a - 3 -0.500 0.031 0.199
IGHG4 1a - 3 2.259 0.031 0.199
HBA2 1a - 3 0.793 0.033 0.207
TIMP2 1a - 3 -0.615 0.035 0.212
EFEMP1 1a - 3 -0.592 0.035 0.212
RPL9 1a - 3 -0.570 0.038 0.226
A2M 1a - 3 -0.548 0.045 0.253

Volcano Plots

A canonical visualization for interpreting differential gene expression results is the volcano plot.

Plotting Genes of Interest

A simple and effective plot to view individual genes is the violin plot. I selected a significant differentially expressed gene 'PDLIM3' as example.

First let's review the model results for these targets:

Gene Subset Contrast Estimate Pr(>|t|) FDR Color
PDLIM3 Tissue 1a - 3 -0.4703381 0.0003127 0.0228538 NS or FC < 0.5

Let's look at the distribution for PDLIM3.

Heatmap of Significant Genes

In addition to generating individual gene boxplots or volcano plots, we can again create a heatmap from our data. Here, we plot with all genes with a P < 0.05.

Analysis Between Group Analysis: 1a vs 2+3

We can show the top differentially expressed genes. The unfiltered table (including all genes) with the statistics is saved in differential_analysis.xls/1a_vs_2+3.

DE results for top differentially expressed genes (`P<0.05 & abs(Estimate)>=0.5)
Gene Contrast Estimate Pr(>|t|) FDR
NKX1-2 1a - 2+3 -0.787 0.000 0.005
SMIM17 1a - 2+3 -0.533 0.000 0.085
FBLN5 1a - 2+3 -0.623 0.001 0.094
GYPC 1a - 2+3 -0.656 0.001 0.094
HBA1 1a - 2+3 0.715 0.001 0.094
MYH11 1a - 2+3 -1.101 0.002 0.096
CNN3 1a - 2+3 -0.660 0.002 0.096
HBA2 1a - 2+3 0.805 0.004 0.099
LTBP1 1a - 2+3 -0.616 0.002 0.099
ACTN1 1a - 2+3 -0.593 0.004 0.099
COL6A2 1a - 2+3 -0.735 0.004 0.099
RHOB 1a - 2+3 -1.058 0.004 0.099
MYL9 1a - 2+3 -1.185 0.004 0.099
BTG2 1a - 2+3 0.527 0.003 0.099
HLA-B 1a - 2+3 0.871 0.003 0.099
H2BC5 1a - 2+3 0.553 0.004 0.099
IGHG2 1a - 2+3 1.987 0.006 0.111
IGHG4 1a - 2+3 2.124 0.006 0.121
TPM2 1a - 2+3 -0.873 0.007 0.124
PFN1 1a - 2+3 -0.515 0.007 0.124
ITGB1 1a - 2+3 -0.572 0.009 0.138
CCN3 1a - 2+3 -0.642 0.009 0.138
TAGLN 1a - 2+3 -1.140 0.009 0.138
CSRP1 1a - 2+3 -0.528 0.012 0.158
IGHG3 1a - 2+3 1.763 0.012 0.158
FLNA 1a - 2+3 -1.233 0.013 0.163
MYH9 1a - 2+3 -0.730 0.014 0.165
ITGA8 1a - 2+3 -0.535 0.013 0.165
GSN 1a - 2+3 -0.652 0.015 0.168
CNN1 1a - 2+3 -0.552 0.016 0.168
S100A6 1a - 2+3 -0.695 0.016 0.168
MFAP4 1a - 2+3 -0.574 0.017 0.171
CALD1 1a - 2+3 -0.544 0.020 0.185
IGKC 1a - 2+3 1.700 0.023 0.205
IGHG1 1a - 2+3 1.801 0.027 0.219
COL4A2 1a - 2+3 -0.502 0.029 0.224
RGS5 1a - 2+3 -0.674 0.029 0.225
VIM 1a - 2+3 -0.905 0.036 0.260
PTMA 1a - 2+3 -0.620 0.037 0.261
TIMP2 1a - 2+3 -0.535 0.040 0.270
SPARC 1a - 2+3 -0.505 0.044 0.278
IGFBP5 1a - 2+3 -0.560 0.044 0.278
SPARCL1 1a - 2+3 -0.679 0.047 0.291

Volcano Plots

A canonical visualization for interpreting differential gene expression results is the volcano plot.

Plotting Genes of Interest

A simple and effective plot to view individual genes is the violin plot. I selected a significant differentially expressed gene 'NKX1-2' as example.

First let's review the model results for these targets:

Gene Subset Contrast Estimate Pr(>|t|) FDR Color
NKX1-2 Tissue 1a - 2+3 -0.7874966 7.8e-06 0.0052511 FDR < 0.05

Let's look at the distribution for NKX1-2.

Heatmap of Significant Genes

In addition to generating individual gene boxplots or volcano plots, we can again create a heatmap from our data. Here, we plot with all genes with a P < 0.05.

Analysis Between Group Analysis: 1b vs 2

We can show the top differentially expressed genes. The unfiltered table (including all genes) with the statistics is saved in differential_analysis.xls/1b_vs_2.

DE results for top differentially expressed genes (`P<0.05 & abs(Estimate)>=0.5)
Gene Contrast Estimate Pr(>|t|) FDR
PITX1 1b - 2 1.549 0.000 0.001
USP17L11 1b - 2 -0.525 0.000 0.008
ANK1 1b - 2 1.084 0.000 0.008
MGARP 1b - 2 1.026 0.000 0.017
USP17L3 1b - 2 -0.531 0.000 0.040
PPIG 1b - 2 0.879 0.001 0.050
AREG 1b - 2 0.681 0.001 0.064
TRIM36 1b - 2 0.577 0.004 0.185
ATP5MC2 1b - 2 0.548 0.010 0.298
DHRS1 1b - 2 -0.532 0.011 0.298
SUPT7L 1b - 2 0.764 0.015 0.359

Volcano Plots

A canonical visualization for interpreting differential gene expression results is the volcano plot.

Plotting Genes of Interest

A simple and effective plot to view individual genes is the violin plot. I selected a significant differentially expressed gene 'PITX1' as example.

First let's review the model results for these targets:

Gene Subset Contrast Estimate Pr(>|t|) FDR Color
PITX1 Tissue 1b - 2 1.54882 2e-06 0.001315 FDR < 0.05

Let's look at the distribution for PITX1.

Heatmap of Significant Genes

In addition to generating individual gene boxplots or volcano plots, we can again create a heatmap from our data. Here, we plot with all genes with a P < 0.05.

Analysis Between Group Analysis: 1b vs 3

We can show the top differentially expressed genes. The unfiltered table (including all genes) with the statistics is saved in differential_analysis.xls/1b_vs_3.

DE results for top differentially expressed genes (`P<0.05 & abs(Estimate)>=0.5)
Gene Contrast Estimate Pr(>|t|) FDR
MYH11 1b - 3 -1.083 0.000 0.004
TPM2 1b - 3 -1.051 0.000 0.004
CNN1 1b - 3 -0.736 0.000 0.005
CSRP1 1b - 3 -0.638 0.000 0.005
MFAP4 1b - 3 -0.692 0.000 0.007
TAGLN 1b - 3 -1.141 0.000 0.007
ACTA2 1b - 3 -1.231 0.000 0.008
MYL9 1b - 3 -1.044 0.000 0.008
C4orf46 1b - 3 0.554 0.000 0.009
RGS5 1b - 3 -0.769 0.000 0.011
FLNA 1b - 3 -1.178 0.000 0.013
CCN3 1b - 3 -0.677 0.000 0.013
ACTN1 1b - 3 -0.556 0.000 0.014
KANK3 1b - 3 0.569 0.001 0.017
MYH9 1b - 3 -0.678 0.001 0.018
ITGA8 1b - 3 -0.505 0.001 0.018
RHOB 1b - 3 -0.695 0.001 0.018
DUSP1 1b - 3 -0.614 0.001 0.018
S100A6 1b - 3 -0.643 0.001 0.019
SPARC 1b - 3 -0.634 0.001 0.019
NR4A1 1b - 3 -0.718 0.001 0.026
RPL37A 1b - 3 -0.612 0.002 0.032
OGN 1b - 3 -0.507 0.003 0.039
CALD1 1b - 3 -0.505 0.003 0.041
C11orf96 1b - 3 -0.610 0.003 0.043
HLA-B 1b - 3 0.566 0.003 0.043
RPLP1 1b - 3 -0.554 0.005 0.049
SIRT6 1b - 3 0.514 0.005 0.053
LYRM2 1b - 3 0.551 0.005 0.053
FCF1 1b - 3 0.575 0.007 0.058
IGHG3 1b - 3 0.563 0.007 0.058
HSPB1 1b - 3 -0.693 0.008 0.064
DHRS1 1b - 3 0.881 0.009 0.067
SPARCL1 1b - 3 -0.625 0.013 0.080
IGHG4 1b - 3 0.889 0.013 0.080
BGN 1b - 3 -0.888 0.014 0.082
VIM 1b - 3 -0.695 0.016 0.087
IGFBP7 1b - 3 -0.615 0.021 0.103
IGHG1 1b - 3 0.705 0.026 0.114
TMEM106C 1b - 3 -0.672 0.030 0.118
PITX1 1b - 3 0.801 0.033 0.124
IGHG2 1b - 3 0.688 0.033 0.124
PTBP1 1b - 3 0.732 0.040 0.139
IGKC 1b - 3 0.641 0.044 0.146
MZT2B 1b - 3 0.803 0.046 0.150
ANK1 1b - 3 0.575 0.050 0.156

Volcano Plots

A canonical visualization for interpreting differential gene expression results is the volcano plot.

Plotting Genes of Interest

A simple and effective plot to view individual genes is the violin plot. I selected a significant differentially expressed gene 'TPM2' as example.

First let's review the model results for these targets:

Gene Subset Contrast Estimate Pr(>|t|) FDR Color
TPM2 Tissue 1b - 3 -1.050616 1.69e-05 0.0038044 FDR < 0.05

Let's look at the distribution for TPM2.

Heatmap of Significant Genes

In addition to generating individual gene boxplots or volcano plots, we can again create a heatmap from our data. Here, we plot with all genes with a P < 0.05.

Analysis Between Group Analysis: 1b vs 2+3

We can show the top differentially expressed genes. The unfiltered table (including all genes) with the statistics is saved in differential_analysis.xls/1b_vs_2+3.

DE results for top differentially expressed genes (`P<0.05 & abs(Estimate)>=0.5)
Gene Contrast Estimate Pr(>|t|) FDR
MYH11 1b - 2+3 -0.754 0.001 0.061
CNN1 1b - 2+3 -0.557 0.001 0.061
PITX1 1b - 2+3 1.122 0.000 0.061
TPM2 1b - 2+3 -0.748 0.001 0.061
ANK1 1b - 2+3 0.793 0.001 0.061
IGHG3 1b - 2+3 0.515 0.002 0.061
MGARP 1b - 2+3 0.714 0.004 0.109
IGHG4 1b - 2+3 0.757 0.006 0.156
PPIG 1b - 2+3 0.604 0.008 0.169
MYL9 1b - 2+3 -0.639 0.011 0.169
IGHG1 1b - 2+3 0.612 0.011 0.169
FLNA 1b - 2+3 -0.757 0.018 0.175
ACTA2 1b - 2+3 -0.718 0.017 0.175
PTBP1 1b - 2+3 0.672 0.016 0.175
TAGLN 1b - 2+3 -0.658 0.017 0.175
IGHG2 1b - 2+3 0.565 0.021 0.185
IGKC 1b - 2+3 0.529 0.034 0.237
HSPB1 1b - 2+3 -0.558 0.035 0.238

Volcano Plots

A canonical visualization for interpreting differential gene expression results is the volcano plot.

Plotting Genes of Interest

A simple and effective plot to view individual genes is the violin plot. I selected a significant differentially expressed gene 'PITX1' as example.

First let's review the model results for these targets:

Gene Subset Contrast Estimate Pr(>|t|) FDR Color
PITX1 Tissue 1b - 2+3 1.121588 0.0004347 0.0609458 P < 0.05

Let's look at the distribution for PITX1.

Heatmap of Significant Genes

In addition to generating individual gene boxplots or volcano plots, we can again create a heatmap from our data. Here, we plot with all genes with a P < 0.05.