R version: R version 4.1.1 (2021-08-10)
Bioconductor version: 3.13
Package: 1.1.4 <>
The GeoMx Digital Spatial Profiler (DSP) is a platform for capturing spatially resolved high-plex gene (or protein) expression data from tissue. In particular, formalin-fixed paraffin-embedded (FFPE) or fresh-frozen (FF) tissue sections are stained with barcoded in-situ hybridization probes that bind to endogenous mRNA transcripts. The user then selects Locations of the interest (ROI) to profile; if desired, each ROI segment can be further sub-divided into areas of illumination (AOI) based on tissue morphology. The GeoMx then photo-cleaves and collects expression barcodes for each AOI segment separately for downstream sequencing and data processing.
The final results are spatially resolved unique expression datasets for every protein-coding gene (>18,000 genes) from every individual segments profiled from tissue.
We start with raw gene expression count files. Using open source R packages, we evaluate samples and expression targets and prepare gene-level count data for downstream analysis. To understand our spatial data, we perform unsupervised clustering, dimension reduction, and differential gene expression analyses and visually explore the results.
Our specific objectives:
In this analysis, we will analyze a dataset created with the human whole transcriptome atlas (WTA) assay. The dataset includes 8x3 PCR positive Covid (Group1), 3x3 PCR negative Covid (Group2), and 4x3 PCR negative control (Group3) samples. Regions of interest (ROI) were spatially profiled to focus on two different structures: lumen or internal.
The key data files are:
We then load the data to create a data object using the readNanoStringGeoMxSet
function.
All of the expression, annotation, and probe information are now linked and stored together into a single data object.
First let's access the PKC files, to ensure that the expected PKCs have been loaded for this study. For the data we are using the file Hs_R_NGS_WTA_v1.0.pkc
.
Now that we have loaded the data, we can visually summarize the experimental design for our dataset to look at the different types of samples and ROI/AOI segments that have been profiled. We present this information in a Sankey diagram.
The steps above encompass the standard pre-processing workflow for GeoMx data. In short, they represent the selection of ROI/AOI segments and genes based on quality control (QC) or limit of quantification (LOQ) metrics and data normalization.
Before we begin, we will shift any expression counts with a value of 0 to 1 to enable in downstream transformations.
We first assess sequencing quality and adequate tissue sampling for every ROI/AOI segment.
Every ROI/AOI segment will be tested for:
First, we select the QC parameter cutoffs, against which our ROI/AOI segments will be tested and flagged appropriately. We have selected the appropriate study-specific parameters for this study.
# Default QC cutoffs are commented in () adjacent to the respective parameters
# study-specific values were selected after visualizing the QC results in more
# detail below
QC_params <-
list(minSegmentReads = 1000, # Minimum number of reads (1000)
percentTrimmed = 80, # Minimum % of reads trimmed (80%)
percentStitched = 80, # Minimum % of reads stitched (80%)
percentAligned = 75, # Minimum % of reads aligned to known targets (80%)
percentSaturation = 50, # Minimum sequencing saturation (50%)
minNegativeCount = 1, # Minimum negative control counts (10)
minNuclei = 20, # Minimum # of cells observed in a segment (100)
minArea = 1000) # Minimum segment area (5000)
m666Data <-
setSegmentQCFlags(m666Data, qcCutoffs = QC_params)
# Collate QC Results
QCResults <- protocolData(m666Data)[["QCFlags"]]
flag_columns <- colnames(QCResults)
QC_Summary <- data.frame(Pass = colSums(!QCResults[, flag_columns]),
Warning = colSums(QCResults[, flag_columns]))
QCResults$QCStatus <- apply(QCResults, 1L, function(x) {
ifelse(sum(x) == 0L, "PASS", "WARNING")
})
QC_Summary["TOTAL FLAGS", ] <-
c(sum(QCResults[, "QCStatus"] == "PASS"),
sum(QCResults[, "QCStatus"] == "WARNING"))
We plot all of the QC Summary information in a table.
Pass | Warning | |
---|---|---|
LowReads | 45 | 0 |
LowTrimmed | 45 | 0 |
LowStitched | 45 | 0 |
LowAligned | 45 | 0 |
LowSaturation | 45 | 0 |
LowNegatives | 45 | 0 |
TOTAL FLAGS | 45 | 0 |
As the final step in Segment QC, we don't need to remove any flagged segments since all segments pass QC.
#> Features Samples
#> 18815 45
Before we summarize our data into gene-level count data, we will remove low-performing probes. In short, this QC is an outlier removal process, whereby probes are either removed entirely from the study (global) or from specific segments (local). The QC applies to gene targets for which there are multiple distinct probes representing the count for a gene per segment. In WTA data, only one probe exists per target gene; thus, Probe QC does not apply to the endogenous genes in the panel. Rather, it is performed on the negative control probes; there are multiple probes representing our negative controls, which do not target any sequence in the genome. These probes enable calculation of the background per segment and will be important for determining gene detection downstream.
After Probe QC, there will always remain at least one probe representing every gene target. In other words, Probe QC never removes genes from your data.
A probe is removed globally from the dataset if either of the following is true:
A probe is removed locally (from a given segment) if the probe is an outlier according to the Grubb's test in that segment.
We report the number of global and local outlier probes.
Passed | Global | Local |
---|---|---|
18813 | 1 | 1 |
#Subset object to exclude all that did not pass Ratio & Global testing
ProbeQCPassed <-
subset(m666Data,
fData(m666Data)[["QCFlags"]][,c("LowProbeRatio")] == FALSE &
fData(m666Data)[["QCFlags"]][,c("GlobalGrubbsOutlier")] == FALSE)
dim(ProbeQCPassed)
#> Features Samples
#> 18814 45
m666Data <- ProbeQCPassed
With our Probe QC steps complete, we will generate a gene-level count matrix. The count for any gene with multiple probes per segment is calculated as the geometric mean of those probes (see gene_level_counts.xlsx).
In addition to Segment and Probe QC, we also determine the limit of quantification (LOQ) per segment. The LOQ is calculated based on the distribution of negative control probes and is intended to approximate the quantifiable limit of gene expression per segment. The formula for calculating the LOQ in a \(i^{th}\) segment is:
\[LOQ_{i} = geomean(NegProbe_{i}) * geoSD(NegProbe_{i})^{n}\]
We typically use 2 geometric standard deviations (\(n = 2\)) above the geometric mean as the LOQ, which is reasonable for most studies. We keep the segments if their LOQ > 2.
After determining the limit of quantification (LOQ) per segment, we filter out either segments and/or genes with abnormally low signal. Filtering is an important step to focus on the true biological data of interest.
We determine the number of genes detected in each segment across the dataset.
We first filter out segments with exceptionally low signal. These segments will have a small fraction of panel genes detected above the LOQ relative to the other segments in the study. Let's visualize the distribution of segments with respect to their % genes detected:
We can also create a table to review what Group (1, 1a, 2 vs 3) is going to be impacted by each threshold:
1a | 1b | 2 | 3 | |
---|---|---|---|---|
<1% | 1 | 3 | 1 | 1 |
1-5% | 5 | 15 | 8 | 11 |
5-10% | 0 | 0 | 0 | 0 |
10-15% | 0 | 0 | 0 | 0 |
>15% | 0 | 0 | 0 | 0 |
In this example, we choose to remove segments with less than 0.1% of the genes detected.
target_m666Data <-
target_m666Data[, pData(target_m666Data)$GeneDetectionRate >= .001]
dim(target_m666Data)
#> Features Samples
#> 18677 45
Next, we determine the detection rate for genes across the study. To illustrate this idea, we create a small gene list including all genes in which DetectionRate>=0.97
to review.
Gene | Detection, # Segments | Detection Rate, % of Segments |
---|---|---|
NFKB2 | 44 | 97.8% |
CKS1B | 44 | 97.8% |
CUX1 | 44 | 97.8% |
RNPEP | 44 | 97.8% |
GDAP1L1 | 44 | 97.8% |
FNDC11 | 44 | 97.8% |
ZFHX4 | 44 | 97.8% |
WLS | 44 | 97.8% |
CHODL | 44 | 97.8% |
MINDY3 | 44 | 97.8% |
MPIG6B | 44 | 97.8% |
TMEM30A | 45 | 100.0% |
CRABP2 | 45 | 100.0% |
CPS1 | 45 | 100.0% |
CSNK2A2 | 45 | 100.0% |
DR1 | 45 | 100.0% |
DSC3 | 45 | 100.0% |
MAZ | 45 | 100.0% |
TMA7 | 45 | 100.0% |
TMEM106C | 45 | 100.0% |
SNIP1 | 45 | 100.0% |
MZT2B | 45 | 100.0% |
IFT74 | 45 | 100.0% |
PTBP1 | 45 | 100.0% |
ADAM15 | 45 | 100.0% |
DHRS1 | 45 | 100.0% |
We can see that individual genes are detected to varying degrees in the segments, which leads us to the next QC we will perform across the dataset.
We will graph the total number of genes detected in different percentages of segments. Based on the visualization below, we can better understand global gene detection in our study and select how many low detected genes to filter out of the dataset. Gene filtering increases performance of downstream statistical tests and improves interpretation of true biological signal.
We typically set a % Segment cutoff ranging from 5-20% based on the biological diversity of our dataset. For this study, we selected 5% as our cutoff. In other words, we focused on the genes detected in at least 5% of our segments; we filtered out the remainder of the targets.
x | |
---|---|
Features | 674 |
Samples | 45 |
We will now normalize the GeoMx data for downstream visualizations and differential expression. Q3 normalization is typically the preferred normalization strategy for most DSP-NGS RNA studies.
Before normalization, we will explore the relationship between the upper quartile (Q3) of the counts in each segment with the geometric mean of the negative control probes in the data. Ideally, there should be a separation between these two values to ensure we have stable measure of Q3 signal.
As expected, we see separation of the Q3 and negative probe counts at both the distribution (A) and per segment (B) levels. Next, we normalize our data.
We use the normalize
function from NanoStringNCTools
to create normalization factors reflecting each data type. Upper quartile (Q3) normalization is performed using norm_method = "quant"
setting the desiredQuantile
flag to 0.75. Other quantiles could be specified by changing that value. We save the normalized data to a specific slot using toELT = "q_norm"
.
# Q3 norm (75th percentile) for WTA/CTA with or without custom spike-ins
target_m666Data <- normalize(target_m666Data , data_type = "RNA",
norm_method = "quant",
desiredQuantile = .75,
toElt = "q_norm")
To demonstrate the effects of normalization, we graph a representative boxplots of the data for individual segments before and after normalization.
One common approach to understanding high-plex data is dimension reduction. Two common methods are UMAP and tSNE, which are non-orthogonally constrained projections that cluster samples based on overall gene expression. In this study, we see by either UMAP (from the umap
package) or tSNE (from the Rtsne
package), clusters of segments related to structure (glomeruli or internals) and disease status (normal or diabetic kidney disease).
Another approach to explore the data is to calculate the coefficient of variation (CV) for each gene (\(g\)) using the formula \(CV_g = SD_g/mean_g\). We then identify for genes with high CVs that should have large differences across the various profiled segments. This unbiased approach can reveal highly variable genes across the study (see high_CV_Genes.xlsx).
We plot the results in a heatmap.
#> IGHG1 IGHG4 IGHG2 IGKC IGHG3
#> 0.2565193 0.2427206 0.2280666 0.2238702 0.1992845
A central method for exploring differences between groups of segments or samples is to perform differential gene expression analysis. A common statistical approach is to use a linear mixed-effect model (LMM).
We can show the top differentially expressed genes. The unfiltered table (including all genes) with the statistics is saved in differential_analysis.xls/1_vs_2.
Gene | Contrast | Estimate | Pr(>|t|) | FDR |
---|---|---|---|---|
USP17L11 | 1 - 2 | -0.518 | 0.000 | 0.010 |
USP17L3 | 1 - 2 | -0.532 | 0.001 | 0.136 |
PITX1 | 1 - 2 | 1.562 | 0.001 | 0.157 |
ANK1 | 1 - 2 | 1.123 | 0.006 | 0.312 |
MGARP | 1 - 2 | 1.079 | 0.011 | 0.499 |
ATP5MC2 | 1 - 2 | 0.529 | 0.022 | 0.565 |
AREG | 1 - 2 | 0.773 | 0.026 | 0.565 |
PPIG | 1 - 2 | 0.918 | 0.024 | 0.565 |
SUPT7L | 1 - 2 | 0.711 | 0.027 | 0.565 |
DHRS1 | 1 - 2 | -0.528 | 0.022 | 0.565 |
A canonical visualization for interpreting differential gene expression results is the volcano plot.
A simple and effective plot to view individual genes is the violin plot. I selected a significant differentially expressed gene 'ATP5MC2' as example.
First let's review the model results for these targets:
Gene | Subset | Contrast | Estimate | Pr(>|t|) | FDR | Color |
---|---|---|---|---|---|---|
ATP5MC2 | Tissue | 1 - 2 | 0.5293242 | 0.0215988 | 0.5645755 | P < 0.05 |
Let's look at the distribution for ATP5MC2.
In addition to generating individual gene boxplots or volcano plots, we can again create a heatmap from our data. Here, we plot with all genes with a P < 0.05.
We can show the top differentially expressed genes. The unfiltered table (including all genes) with the statistics is saved in differential_analysis.xls/1_vs_3.
Gene | Contrast | Estimate | Pr(>|t|) | FDR |
---|---|---|---|---|
TPM2 | 1 - 3 | -1.059 | 0.000 | 0.001 |
CSRP1 | 1 - 3 | -0.658 | 0.000 | 0.001 |
MYH11 | 1 - 3 | -1.144 | 0.000 | 0.001 |
CNN1 | 1 - 3 | -0.723 | 0.000 | 0.001 |
MFAP4 | 1 - 3 | -0.723 | 0.000 | 0.001 |
TAGLN | 1 - 3 | -1.261 | 0.000 | 0.001 |
ACTA2 | 1 - 3 | -1.246 | 0.000 | 0.001 |
MYL9 | 1 - 3 | -1.181 | 0.000 | 0.001 |
CCN3 | 1 - 3 | -0.730 | 0.000 | 0.002 |
FLNA | 1 - 3 | -1.270 | 0.000 | 0.003 |
ACTN1 | 1 - 3 | -0.577 | 0.000 | 0.003 |
RGS5 | 1 - 3 | -0.819 | 0.000 | 0.003 |
ITGA8 | 1 - 3 | -0.541 | 0.000 | 0.004 |
SPARC | 1 - 3 | -0.652 | 0.000 | 0.005 |
C4orf46 | 1 - 3 | 0.529 | 0.000 | 0.005 |
MYH9 | 1 - 3 | -0.743 | 0.000 | 0.006 |
CALD1 | 1 - 3 | -0.555 | 0.000 | 0.010 |
DUSP1 | 1 - 3 | -0.618 | 0.000 | 0.010 |
S100A6 | 1 - 3 | -0.725 | 0.000 | 0.010 |
MYL6 | 1 - 3 | -0.523 | 0.001 | 0.012 |
NR4A1 | 1 - 3 | -0.710 | 0.001 | 0.012 |
RHOB | 1 - 3 | -0.848 | 0.001 | 0.015 |
DSTN | 1 - 3 | -0.507 | 0.001 | 0.015 |
RPL37A | 1 - 3 | -0.600 | 0.001 | 0.018 |
KANK3 | 1 - 3 | 0.532 | 0.001 | 0.018 |
SIRT6 | 1 - 3 | 0.565 | 0.001 | 0.018 |
OGN | 1 - 3 | -0.573 | 0.001 | 0.020 |
C11orf96 | 1 - 3 | -0.651 | 0.002 | 0.022 |
HLA-B | 1 - 3 | 0.645 | 0.002 | 0.028 |
COL6A2 | 1 - 3 | -0.546 | 0.003 | 0.032 |
RPLP1 | 1 - 3 | -0.571 | 0.003 | 0.033 |
HSPB1 | 1 - 3 | -0.734 | 0.003 | 0.033 |
VIM | 1 - 3 | -0.807 | 0.004 | 0.043 |
DHRS1 | 1 - 3 | 0.885 | 0.006 | 0.047 |
BGN | 1 - 3 | -0.910 | 0.006 | 0.048 |
PTMA | 1 - 3 | -0.544 | 0.007 | 0.053 |
SPARCL1 | 1 - 3 | -0.670 | 0.008 | 0.053 |
LYRM2 | 1 - 3 | 0.531 | 0.008 | 0.056 |
IGFBP7 | 1 - 3 | -0.623 | 0.016 | 0.085 |
FCF1 | 1 - 3 | 0.530 | 0.021 | 0.096 |
MZT2B | 1 - 3 | 0.825 | 0.034 | 0.130 |
PTBP1 | 1 - 3 | 0.712 | 0.037 | 0.138 |
IGHG4 | 1 - 3 | 1.233 | 0.038 | 0.140 |
A canonical visualization for interpreting differential gene expression results is the volcano plot.
A simple and effective plot to view individual genes is the violin plot. I selected a significant differentially expressed gene 'TPM2' as example.
First let's review the model results for these targets:
Gene | Subset | Contrast | Estimate | Pr(>|t|) | FDR | Color |
---|---|---|---|---|---|---|
TPM2 | Tissue | 1 - 3 | -1.0594 | 1.2e-06 | 0.0005436 | FDR < 0.001 |
Let's look at the distribution for TPM2.
In addition to generating individual gene boxplots or volcano plots, we can again create a heatmap from our data. Here, we plot with all genes with a P < 0.05.
We can show the top differentially expressed genes. The unfiltered table (including all genes) with the statistics is saved in differential_analysis.xls/1_vs_2+3.
Gene | Contrast | Estimate | Pr(>|t|) | FDR |
---|---|---|---|---|
MYH11 | 1 - 2+3 | -0.822 | 0.000 | 0.016 |
CNN1 | 1 - 2+3 | -0.553 | 0.000 | 0.016 |
TPM2 | 1 - 2+3 | -0.768 | 0.000 | 0.016 |
MYL9 | 1 - 2+3 | -0.776 | 0.001 | 0.046 |
RGS5 | 1 - 2+3 | -0.525 | 0.002 | 0.056 |
MYH9 | 1 - 2+3 | -0.537 | 0.002 | 0.068 |
PITX1 | 1 - 2+3 | 1.135 | 0.002 | 0.068 |
RHOB | 1 - 2+3 | -0.600 | 0.003 | 0.068 |
TAGLN | 1 - 2+3 | -0.779 | 0.002 | 0.068 |
FLNA | 1 - 2+3 | -0.871 | 0.003 | 0.072 |
ACTA2 | 1 - 2+3 | -0.748 | 0.005 | 0.088 |
HLA-B | 1 - 2+3 | 0.510 | 0.005 | 0.090 |
ANK1 | 1 - 2+3 | 0.832 | 0.006 | 0.095 |
SPARCL1 | 1 - 2+3 | -0.527 | 0.008 | 0.100 |
HSPB1 | 1 - 2+3 | -0.614 | 0.011 | 0.110 |
MGARP | 1 - 2+3 | 0.767 | 0.015 | 0.127 |
PTBP1 | 1 - 2+3 | 0.652 | 0.015 | 0.127 |
IGHG4 | 1 - 2+3 | 1.098 | 0.015 | 0.127 |
AREG | 1 - 2+3 | 0.584 | 0.016 | 0.136 |
IGHG2 | 1 - 2+3 | 0.920 | 0.026 | 0.182 |
IGHG3 | 1 - 2+3 | 0.827 | 0.027 | 0.182 |
PPIG | 1 - 2+3 | 0.644 | 0.030 | 0.193 |
TRIM36 | 1 - 2+3 | 0.518 | 0.034 | 0.208 |
VIM | 1 - 2+3 | -0.527 | 0.038 | 0.215 |
IGHG1 | 1 - 2+3 | 0.910 | 0.043 | 0.231 |
A canonical visualization for interpreting differential gene expression results is the volcano plot.
A simple and effective plot to view individual genes is the violin plot. I selected a significant differentially expressed gene 'CNN1' as example.
First let's review the model results for these targets:
Gene | Subset | Contrast | Estimate | Pr(>|t|) | FDR | Color |
---|---|---|---|---|---|---|
CNN1 | Tissue | 1 - 2+3 | -0.5525309 | 0.0001523 | 0.0159073 | FDR < 0.05 |
Let's look at the distribution for CNN1.
In addition to generating individual gene boxplots or volcano plots, we can again create a heatmap from our data. Here, we plot with all genes with a P < 0.05.
We can show the top differentially expressed genes. The unfiltered table (including all genes) with the statistics is saved in differential_analysis.xls/1a_vs_2.
Gene | Contrast | Estimate | Pr(>|t|) | FDR |
---|---|---|---|---|
NKX1-2 | 1a - 2 | -0.727 | 0.000 | 0.089 |
HBA1 | 1a - 2 | 0.701 | 0.014 | 0.379 |
GYPC | 1a - 2 | -0.620 | 0.018 | 0.416 |
CNN3 | 1a - 2 | -0.731 | 0.022 | 0.426 |
HBA2 | 1a - 2 | 0.822 | 0.025 | 0.437 |
FBLN5 | 1a - 2 | -0.547 | 0.030 | 0.437 |
USP17L3 | 1a - 2 | -0.509 | 0.025 | 0.437 |
BTG2 | 1a - 2 | 0.519 | 0.038 | 0.459 |
A canonical visualization for interpreting differential gene expression results is the volcano plot.
A simple and effective plot to view individual genes is the violin plot. I selected a significant differentially expressed gene 'KDM4B' as example.
First let's review the model results for these targets:
Gene | Subset | Contrast | Estimate | Pr(>|t|) | FDR | Color |
---|---|---|---|---|---|---|
KDM4B | Tissue | 1a - 2 | 0.4334072 | 0.0277851 | 0.4365136 | NS or FC < 0.5 |
Let's look at the distribution for KDM4B.
In addition to generating individual gene boxplots or volcano plots, we can again create a heatmap from our data. Here, we plot with all genes with a P < 0.05.
We can show the top differentially expressed genes. The unfiltered table (including all genes) with the statistics is saved in differential_analysis.xls/1a_vs_3.
Gene | Contrast | Estimate | Pr(>|t|) | FDR |
---|---|---|---|---|
MYH11 | 1a - 3 | -1.447 | 0.000 | 0.015 |
TPM2 | 1a - 3 | -1.205 | 0.000 | 0.015 |
TPM1 | 1a - 3 | -0.595 | 0.000 | 0.022 |
TAGLN | 1a - 3 | -1.622 | 0.000 | 0.022 |
SMIM17 | 1a - 3 | -0.588 | 0.000 | 0.023 |
FLNA | 1a - 3 | -1.673 | 0.000 | 0.023 |
ANTXR1 | 1a - 3 | -0.605 | 0.000 | 0.023 |
MYL9 | 1a - 3 | -1.591 | 0.000 | 0.023 |
NKX1-2 | 1a - 3 | -0.818 | 0.000 | 0.023 |
CSRP1 | 1a - 3 | -0.720 | 0.000 | 0.023 |
CNN1 | 1a - 3 | -0.753 | 0.001 | 0.029 |
ACTN1 | 1a - 3 | -0.693 | 0.001 | 0.040 |
ACTA2 | 1a - 3 | -1.362 | 0.001 | 0.040 |
ITGA8 | 1a - 3 | -0.710 | 0.001 | 0.040 |
CCN3 | 1a - 3 | -0.889 | 0.001 | 0.041 |
HLA-B | 1a - 3 | 1.027 | 0.001 | 0.045 |
LTBP1 | 1a - 3 | -0.763 | 0.001 | 0.049 |
S100A6 | 1a - 3 | -0.971 | 0.002 | 0.057 |
DUSP1 | 1a - 3 | -0.796 | 0.002 | 0.057 |
IGHA1 | 1a - 3 | 0.683 | 0.002 | 0.057 |
MFAP4 | 1a - 3 | -0.817 | 0.002 | 0.058 |
MYH9 | 1a - 3 | -0.936 | 0.002 | 0.058 |
COL6A2 | 1a - 3 | -0.882 | 0.002 | 0.059 |
ELOVL5 | 1a - 3 | 0.543 | 0.003 | 0.078 |
TUBA1C | 1a - 3 | -0.522 | 0.003 | 0.078 |
ID2 | 1a - 3 | -0.523 | 0.004 | 0.081 |
ITGB1 | 1a - 3 | -0.763 | 0.004 | 0.084 |
SPARC | 1a - 3 | -0.707 | 0.004 | 0.084 |
S100A4 | 1a - 3 | -0.540 | 0.004 | 0.085 |
RPS21 | 1a - 3 | -0.621 | 0.006 | 0.101 |
RHOB | 1a - 3 | -1.306 | 0.006 | 0.101 |
VIM | 1a - 3 | -1.200 | 0.006 | 0.103 |
FBLN5 | 1a - 3 | -0.680 | 0.007 | 0.103 |
HBA1 | 1a - 3 | 0.726 | 0.006 | 0.103 |
CCL2 | 1a - 3 | 0.511 | 0.008 | 0.124 |
COL4A2 | 1a - 3 | -0.604 | 0.008 | 0.124 |
CALD1 | 1a - 3 | -0.753 | 0.009 | 0.126 |
GYPC | 1a - 3 | -0.627 | 0.010 | 0.135 |
RGS5 | 1a - 3 | -0.969 | 0.010 | 0.136 |
GSN | 1a - 3 | -0.678 | 0.012 | 0.136 |
PFN1 | 1a - 3 | -0.585 | 0.013 | 0.136 |
OGN | 1a - 3 | -0.772 | 0.012 | 0.136 |
MYL6 | 1a - 3 | -0.670 | 0.011 | 0.136 |
PRMT8 | 1a - 3 | 0.585 | 0.014 | 0.137 |
BTG2 | 1a - 3 | 0.533 | 0.016 | 0.148 |
MFGE8 | 1a - 3 | -0.593 | 0.018 | 0.160 |
H2BC5 | 1a - 3 | 0.593 | 0.020 | 0.167 |
CNN3 | 1a - 3 | -0.607 | 0.024 | 0.185 |
RPLP1 | 1a - 3 | -0.622 | 0.025 | 0.186 |
HSPB1 | 1a - 3 | -0.945 | 0.025 | 0.186 |
DSTN | 1a - 3 | -0.540 | 0.024 | 0.186 |
RPL22 | 1a - 3 | -0.550 | 0.026 | 0.188 |
FHL1 | 1a - 3 | -0.548 | 0.026 | 0.189 |
PTMA | 1a - 3 | -0.819 | 0.026 | 0.189 |
C11orf96 | 1a - 3 | -0.814 | 0.027 | 0.189 |
SIRT6 | 1a - 3 | 0.717 | 0.028 | 0.191 |
IGHG2 | 1a - 3 | 2.111 | 0.028 | 0.192 |
CCN1 | 1a - 3 | -0.500 | 0.031 | 0.199 |
IGHG4 | 1a - 3 | 2.259 | 0.031 | 0.199 |
HBA2 | 1a - 3 | 0.793 | 0.033 | 0.207 |
TIMP2 | 1a - 3 | -0.615 | 0.035 | 0.212 |
EFEMP1 | 1a - 3 | -0.592 | 0.035 | 0.212 |
RPL9 | 1a - 3 | -0.570 | 0.038 | 0.226 |
A2M | 1a - 3 | -0.548 | 0.045 | 0.253 |
A canonical visualization for interpreting differential gene expression results is the volcano plot.
A simple and effective plot to view individual genes is the violin plot. I selected a significant differentially expressed gene 'PDLIM3' as example.
First let's review the model results for these targets:
Gene | Subset | Contrast | Estimate | Pr(>|t|) | FDR | Color |
---|---|---|---|---|---|---|
PDLIM3 | Tissue | 1a - 3 | -0.4703381 | 0.0003127 | 0.0228538 | NS or FC < 0.5 |
Let's look at the distribution for PDLIM3.
In addition to generating individual gene boxplots or volcano plots, we can again create a heatmap from our data. Here, we plot with all genes with a P < 0.05.
We can show the top differentially expressed genes. The unfiltered table (including all genes) with the statistics is saved in differential_analysis.xls/1a_vs_2+3.
Gene | Contrast | Estimate | Pr(>|t|) | FDR |
---|---|---|---|---|
NKX1-2 | 1a - 2+3 | -0.787 | 0.000 | 0.005 |
SMIM17 | 1a - 2+3 | -0.533 | 0.000 | 0.085 |
FBLN5 | 1a - 2+3 | -0.623 | 0.001 | 0.094 |
GYPC | 1a - 2+3 | -0.656 | 0.001 | 0.094 |
HBA1 | 1a - 2+3 | 0.715 | 0.001 | 0.094 |
MYH11 | 1a - 2+3 | -1.101 | 0.002 | 0.096 |
CNN3 | 1a - 2+3 | -0.660 | 0.002 | 0.096 |
HBA2 | 1a - 2+3 | 0.805 | 0.004 | 0.099 |
LTBP1 | 1a - 2+3 | -0.616 | 0.002 | 0.099 |
ACTN1 | 1a - 2+3 | -0.593 | 0.004 | 0.099 |
COL6A2 | 1a - 2+3 | -0.735 | 0.004 | 0.099 |
RHOB | 1a - 2+3 | -1.058 | 0.004 | 0.099 |
MYL9 | 1a - 2+3 | -1.185 | 0.004 | 0.099 |
BTG2 | 1a - 2+3 | 0.527 | 0.003 | 0.099 |
HLA-B | 1a - 2+3 | 0.871 | 0.003 | 0.099 |
H2BC5 | 1a - 2+3 | 0.553 | 0.004 | 0.099 |
IGHG2 | 1a - 2+3 | 1.987 | 0.006 | 0.111 |
IGHG4 | 1a - 2+3 | 2.124 | 0.006 | 0.121 |
TPM2 | 1a - 2+3 | -0.873 | 0.007 | 0.124 |
PFN1 | 1a - 2+3 | -0.515 | 0.007 | 0.124 |
ITGB1 | 1a - 2+3 | -0.572 | 0.009 | 0.138 |
CCN3 | 1a - 2+3 | -0.642 | 0.009 | 0.138 |
TAGLN | 1a - 2+3 | -1.140 | 0.009 | 0.138 |
CSRP1 | 1a - 2+3 | -0.528 | 0.012 | 0.158 |
IGHG3 | 1a - 2+3 | 1.763 | 0.012 | 0.158 |
FLNA | 1a - 2+3 | -1.233 | 0.013 | 0.163 |
MYH9 | 1a - 2+3 | -0.730 | 0.014 | 0.165 |
ITGA8 | 1a - 2+3 | -0.535 | 0.013 | 0.165 |
GSN | 1a - 2+3 | -0.652 | 0.015 | 0.168 |
CNN1 | 1a - 2+3 | -0.552 | 0.016 | 0.168 |
S100A6 | 1a - 2+3 | -0.695 | 0.016 | 0.168 |
MFAP4 | 1a - 2+3 | -0.574 | 0.017 | 0.171 |
CALD1 | 1a - 2+3 | -0.544 | 0.020 | 0.185 |
IGKC | 1a - 2+3 | 1.700 | 0.023 | 0.205 |
IGHG1 | 1a - 2+3 | 1.801 | 0.027 | 0.219 |
COL4A2 | 1a - 2+3 | -0.502 | 0.029 | 0.224 |
RGS5 | 1a - 2+3 | -0.674 | 0.029 | 0.225 |
VIM | 1a - 2+3 | -0.905 | 0.036 | 0.260 |
PTMA | 1a - 2+3 | -0.620 | 0.037 | 0.261 |
TIMP2 | 1a - 2+3 | -0.535 | 0.040 | 0.270 |
SPARC | 1a - 2+3 | -0.505 | 0.044 | 0.278 |
IGFBP5 | 1a - 2+3 | -0.560 | 0.044 | 0.278 |
SPARCL1 | 1a - 2+3 | -0.679 | 0.047 | 0.291 |
A canonical visualization for interpreting differential gene expression results is the volcano plot.
A simple and effective plot to view individual genes is the violin plot. I selected a significant differentially expressed gene 'NKX1-2' as example.
First let's review the model results for these targets:
Gene | Subset | Contrast | Estimate | Pr(>|t|) | FDR | Color |
---|---|---|---|---|---|---|
NKX1-2 | Tissue | 1a - 2+3 | -0.7874966 | 7.8e-06 | 0.0052511 | FDR < 0.05 |
Let's look at the distribution for NKX1-2.
In addition to generating individual gene boxplots or volcano plots, we can again create a heatmap from our data. Here, we plot with all genes with a P < 0.05.
We can show the top differentially expressed genes. The unfiltered table (including all genes) with the statistics is saved in differential_analysis.xls/1b_vs_2.
Gene | Contrast | Estimate | Pr(>|t|) | FDR |
---|---|---|---|---|
PITX1 | 1b - 2 | 1.549 | 0.000 | 0.001 |
USP17L11 | 1b - 2 | -0.525 | 0.000 | 0.008 |
ANK1 | 1b - 2 | 1.084 | 0.000 | 0.008 |
MGARP | 1b - 2 | 1.026 | 0.000 | 0.017 |
USP17L3 | 1b - 2 | -0.531 | 0.000 | 0.040 |
PPIG | 1b - 2 | 0.879 | 0.001 | 0.050 |
AREG | 1b - 2 | 0.681 | 0.001 | 0.064 |
TRIM36 | 1b - 2 | 0.577 | 0.004 | 0.185 |
ATP5MC2 | 1b - 2 | 0.548 | 0.010 | 0.298 |
DHRS1 | 1b - 2 | -0.532 | 0.011 | 0.298 |
SUPT7L | 1b - 2 | 0.764 | 0.015 | 0.359 |
A canonical visualization for interpreting differential gene expression results is the volcano plot.
A simple and effective plot to view individual genes is the violin plot. I selected a significant differentially expressed gene 'PITX1' as example.
First let's review the model results for these targets:
Gene | Subset | Contrast | Estimate | Pr(>|t|) | FDR | Color |
---|---|---|---|---|---|---|
PITX1 | Tissue | 1b - 2 | 1.54882 | 2e-06 | 0.001315 | FDR < 0.05 |
Let's look at the distribution for PITX1.
In addition to generating individual gene boxplots or volcano plots, we can again create a heatmap from our data. Here, we plot with all genes with a P < 0.05.
We can show the top differentially expressed genes. The unfiltered table (including all genes) with the statistics is saved in differential_analysis.xls/1b_vs_3.
Gene | Contrast | Estimate | Pr(>|t|) | FDR |
---|---|---|---|---|
MYH11 | 1b - 3 | -1.083 | 0.000 | 0.004 |
TPM2 | 1b - 3 | -1.051 | 0.000 | 0.004 |
CNN1 | 1b - 3 | -0.736 | 0.000 | 0.005 |
CSRP1 | 1b - 3 | -0.638 | 0.000 | 0.005 |
MFAP4 | 1b - 3 | -0.692 | 0.000 | 0.007 |
TAGLN | 1b - 3 | -1.141 | 0.000 | 0.007 |
ACTA2 | 1b - 3 | -1.231 | 0.000 | 0.008 |
MYL9 | 1b - 3 | -1.044 | 0.000 | 0.008 |
C4orf46 | 1b - 3 | 0.554 | 0.000 | 0.009 |
RGS5 | 1b - 3 | -0.769 | 0.000 | 0.011 |
FLNA | 1b - 3 | -1.178 | 0.000 | 0.013 |
CCN3 | 1b - 3 | -0.677 | 0.000 | 0.013 |
ACTN1 | 1b - 3 | -0.556 | 0.000 | 0.014 |
KANK3 | 1b - 3 | 0.569 | 0.001 | 0.017 |
MYH9 | 1b - 3 | -0.678 | 0.001 | 0.018 |
ITGA8 | 1b - 3 | -0.505 | 0.001 | 0.018 |
RHOB | 1b - 3 | -0.695 | 0.001 | 0.018 |
DUSP1 | 1b - 3 | -0.614 | 0.001 | 0.018 |
S100A6 | 1b - 3 | -0.643 | 0.001 | 0.019 |
SPARC | 1b - 3 | -0.634 | 0.001 | 0.019 |
NR4A1 | 1b - 3 | -0.718 | 0.001 | 0.026 |
RPL37A | 1b - 3 | -0.612 | 0.002 | 0.032 |
OGN | 1b - 3 | -0.507 | 0.003 | 0.039 |
CALD1 | 1b - 3 | -0.505 | 0.003 | 0.041 |
C11orf96 | 1b - 3 | -0.610 | 0.003 | 0.043 |
HLA-B | 1b - 3 | 0.566 | 0.003 | 0.043 |
RPLP1 | 1b - 3 | -0.554 | 0.005 | 0.049 |
SIRT6 | 1b - 3 | 0.514 | 0.005 | 0.053 |
LYRM2 | 1b - 3 | 0.551 | 0.005 | 0.053 |
FCF1 | 1b - 3 | 0.575 | 0.007 | 0.058 |
IGHG3 | 1b - 3 | 0.563 | 0.007 | 0.058 |
HSPB1 | 1b - 3 | -0.693 | 0.008 | 0.064 |
DHRS1 | 1b - 3 | 0.881 | 0.009 | 0.067 |
SPARCL1 | 1b - 3 | -0.625 | 0.013 | 0.080 |
IGHG4 | 1b - 3 | 0.889 | 0.013 | 0.080 |
BGN | 1b - 3 | -0.888 | 0.014 | 0.082 |
VIM | 1b - 3 | -0.695 | 0.016 | 0.087 |
IGFBP7 | 1b - 3 | -0.615 | 0.021 | 0.103 |
IGHG1 | 1b - 3 | 0.705 | 0.026 | 0.114 |
TMEM106C | 1b - 3 | -0.672 | 0.030 | 0.118 |
PITX1 | 1b - 3 | 0.801 | 0.033 | 0.124 |
IGHG2 | 1b - 3 | 0.688 | 0.033 | 0.124 |
PTBP1 | 1b - 3 | 0.732 | 0.040 | 0.139 |
IGKC | 1b - 3 | 0.641 | 0.044 | 0.146 |
MZT2B | 1b - 3 | 0.803 | 0.046 | 0.150 |
ANK1 | 1b - 3 | 0.575 | 0.050 | 0.156 |
A canonical visualization for interpreting differential gene expression results is the volcano plot.
A simple and effective plot to view individual genes is the violin plot. I selected a significant differentially expressed gene 'TPM2' as example.
First let's review the model results for these targets:
Gene | Subset | Contrast | Estimate | Pr(>|t|) | FDR | Color |
---|---|---|---|---|---|---|
TPM2 | Tissue | 1b - 3 | -1.050616 | 1.69e-05 | 0.0038044 | FDR < 0.05 |
Let's look at the distribution for TPM2.
In addition to generating individual gene boxplots or volcano plots, we can again create a heatmap from our data. Here, we plot with all genes with a P < 0.05.
We can show the top differentially expressed genes. The unfiltered table (including all genes) with the statistics is saved in differential_analysis.xls/1b_vs_2+3.
Gene | Contrast | Estimate | Pr(>|t|) | FDR |
---|---|---|---|---|
MYH11 | 1b - 2+3 | -0.754 | 0.001 | 0.061 |
CNN1 | 1b - 2+3 | -0.557 | 0.001 | 0.061 |
PITX1 | 1b - 2+3 | 1.122 | 0.000 | 0.061 |
TPM2 | 1b - 2+3 | -0.748 | 0.001 | 0.061 |
ANK1 | 1b - 2+3 | 0.793 | 0.001 | 0.061 |
IGHG3 | 1b - 2+3 | 0.515 | 0.002 | 0.061 |
MGARP | 1b - 2+3 | 0.714 | 0.004 | 0.109 |
IGHG4 | 1b - 2+3 | 0.757 | 0.006 | 0.156 |
PPIG | 1b - 2+3 | 0.604 | 0.008 | 0.169 |
MYL9 | 1b - 2+3 | -0.639 | 0.011 | 0.169 |
IGHG1 | 1b - 2+3 | 0.612 | 0.011 | 0.169 |
FLNA | 1b - 2+3 | -0.757 | 0.018 | 0.175 |
ACTA2 | 1b - 2+3 | -0.718 | 0.017 | 0.175 |
PTBP1 | 1b - 2+3 | 0.672 | 0.016 | 0.175 |
TAGLN | 1b - 2+3 | -0.658 | 0.017 | 0.175 |
IGHG2 | 1b - 2+3 | 0.565 | 0.021 | 0.185 |
IGKC | 1b - 2+3 | 0.529 | 0.034 | 0.237 |
HSPB1 | 1b - 2+3 | -0.558 | 0.035 | 0.238 |
A canonical visualization for interpreting differential gene expression results is the volcano plot.
A simple and effective plot to view individual genes is the violin plot. I selected a significant differentially expressed gene 'PITX1' as example.
First let's review the model results for these targets:
Gene | Subset | Contrast | Estimate | Pr(>|t|) | FDR | Color |
---|---|---|---|---|---|---|
PITX1 | Tissue | 1b - 2+3 | 1.121588 | 0.0004347 | 0.0609458 | P < 0.05 |
Let's look at the distribution for PITX1.
In addition to generating individual gene boxplots or volcano plots, we can again create a heatmap from our data. Here, we plot with all genes with a P < 0.05.