I’ve heard biologists ask the question above countless times. (You may replace ChIP-seq with ATAC-seq, bisulphite-seq or any other epigenomic data type.)
It makes a lot of sense to ask that question.
Generating and analyzing data from a single NGS-based assay such as RNA-seq or ChIP-seq is not as rare a skill as it was a few years ago. This is due to a new "NGS native" generation of biologists who have acquired basic ‘omics data analysis skills early in their training, largely obviating the need for biologists to walk to the department of statistics or — god forbid! — computer science to knock on the doors of code-savvy researchers, suggesting a “collaboration” to get their data analyzed.
However, integrating different data modalities is a different matter, and this is the phase where research projects often stall.
The idea is simple: if you see, smell and taste a wine, your brain may be able to integrate these multi-sensory inputs and infer just which river valley the grapes originate from, way better than if it had to rely on just one sense.
So, what is the multiomic brain that takes all possible NGS data you generate and spits out insight?
The wrong answer, I have learned, is “it really depends on your research question”. The correct one is “correlation”. That is the short answer — the longer one is “careful analysis of individual data types, correlation, filtering, visualization, interpretation — iterate a couple times — and you might arrive at some very fine results!”
For a walk-through of this data integration, let us assume an experiment visualized below, with RNA-sequencing and an epigenomic sequencing assay performed at a few timepoints, and a treatment administered after the first time point. The epigenomic assay could be ChIP-seq (or CUT&Tag) for one or more histone modifications, or a chromatin accessibility assay such as ATAC-seq.
(The integrative analyses discussed here do not require time series data; one could analyze the expression and epigenetic states across a pseudotime trajectory using single-cell data or, simply, comparing single-timepoint data from bulk experiments in different conditions.)
The question is, what are the molecular mechanisms between the treatment and an altered cellular state at the end? Can we give a multi-step description of events cascading through the network of genes and gene products that reprograms the cell to adapt to the perturbation?
In the context of translational research, identifying the critical elements, such as transcription factors or enhancers that enable a cell's progression to a diseased state, offer possible targets for new therapies.
Below we see a workflow for identifying active cis- and trans-regulatory paths in such a cascade. It begins with processing epigenomic and transcriptomic data separately, and brings the two modalities together by correlating the expression of each gene to the signal from its putative cis-regulatory elements (CREs). It then proceeds to identify transcription factors (TFs) which drive the chromatin changes, through identifying TF binding motifs within the CREs and correlating TF expression to the state of these putative binding sites.
The central steps in the workflow include:
The approach described above results in a rich description of regulatory programs involved in a studied process. There are several ways to further enrich and validate the findings, such as:
Above we presented an example approach to integrating epigenomic and gene expression data, specifically to uncover cis- and trans-regulatory interactions. Learn more about NGS data analyses:
© 2023 XGenes.com Impressum