cfDNA Sequencing: Technological Approaches and Bioinformatic Issues

gene_x 0 like s 256 view s

Tags: research

Cell free circulating DNA (cfDNA) refers to DNA fragments present outside of cells in body fluids such as plasma, urine, and cerebrospinal fluid (CSF). CfDNA was first identified in 1948 from plasma of healthy individuals [1]. Afterward, studies showed that the quantity of this cfDNA in the blood was increased under pathological conditions such as auto-immune diseases [2] but also cancers [3]. In 1989, Philippe Anker and Maurice Stroun, from the University of Geneva, demonstrated that this cfDNA from cancer patients carries the characteristics of the DNA from tumoral cells [4]. Next, using the recently developed technique of PCR, David Sidransky and his team found the same mutations of TP53 in bladder tumoral samples and urine pellets from patients [5]. Then, the research and identification of genomic anomalies specific of a cancer type in the circulating DNA, such as NRAS and KRAS mutations or HER-2 amplifications [6,7,8], started to expand, and for the first time, the term of circulating tumor DNA (ctDNA) appeared.

Since the highlighting of this circulating DNA of tumoral origin, technological developments in molecular biology, from quantitative and digital PCR to Next Generation Sequencing, turned it into a powerful liquid biopsy tool. At the era of precision medicine, it seems crucial to identify molecular alterations that will be able to guide the therapeutic management of patients. As tumors release DNA in the blood or other body fluids such as urine, this circulating tumoral DNA, containing the molecular characteristics of the tumor, can be collected with a simple body fluid sample. Since it is minimally invasive, this liquid biopsy is easily repeatable during follow up and in case of relapse. It is also of major interest in some particular cancers where a tumoral biopsy is difficult to obtain such as primary central nervous system lymphoma [9] or cancer subtypes with tissue biopsy containing very little tumoral cells such as Hodgkin lymphoma (HL) for which Reed–Sternbeg cells represent only 0.1 to 2% of the tumoral mass [10,11]. In these particular conditions and malignancies, the sequencing of ctDNA in body fluids could serve as a surrogate for a tumor biopsy. Other body fluids than blood are often used according to the localization of the tumor, such as urine for bladder cancers or cerebrospinal fluid for cerebral tumors [9,12] but blood is the body fluid most often used in studies.

In blood, average cfDNA concentration in healthy individuals can range between 0 and 100 ng/mL of plasma with an average of 30 ng/mL of plasma and is significantly higher in blood of cancer patients, varying between 0 and 1000 ng/mL, with an average of 180 ng/mL [13]. This concentration is correlated with the stage of the cancer, increasing with higher stages, and the size of the tumor. Circulating DNA of tumoral origin represents from 0.01 to more than 90% of the total cell free DNA found in blood [14]. In different types of cancers, a large scale ctDNA sequencing study has shown an association between ctDNA levels and mutational tumor burden [15]. Moreover, given the spatial heterogeneity observed in tumor tissue, ctDNA analysis can determine the complete molecular landscape of a patient’s tumor and give supplementary information on drug targetable alterations and resistant variants [16]. ctDNA kinetics during follow up is correlated with prognosis, as a drastic reduction in its level after treatment is associated with better prognosis, whereas an increase usually means the evolution of drug resistant clones and an ultimate therapeutic failure [17,18,19,20].

Detection of ctDNA during MRD follow up to predict early relapse and at diagnosis in early stages of cancer continues to be a challenge, as the fraction of tumoral DNA contents in total circulating DNA may be <0.01% [21,22]. The development of sequencing technologies being more and more sensitive allows the detection of alterations present in cfDNA at very low variant allele frequencies (VAF), not only for mutational profiling at diagnosis but also for the early detection of disease recurrence and monitoring for therapy response. However, several parameters can affect the sensitivity of ctDNA detection. First, adequate handling of the blood sample, from blood collection to the quality control of the cfDNA extracted, is crucial in analysis. Next, an important step is the choice of the biomarker (s) and the sequencing technology used to detect it. Then, bioinformatic analysis, using error suppression algorithms, is the ultimate tool to discriminate the true variant from false positives.

无细胞循环DNA（cfDNA）指的是体液中细胞外的DNA碎片，如血浆、尿液和脑脊液（CSF）。cfDNA最早在1948年从健康个体的血浆中被发现[1]。此后，研究表明，这种cfDNA在血液中的数量在如自身免疫性疾病[2]等病理状态下增加，以及癌症[3]。1989年，日内瓦大学的Philippe Anker和Maurice Stroun展示了癌症患者的cfDNA携带了肿瘤细胞DNA的特征[4]。接下来，使用新开发的PCR技术，David Sidransky及其团队在膀胱肿瘤样本和患者的尿沉渣中发现了相同的TP53突变[5]。然后，对循环DNA中特定癌症类型的基因组异常的研究和识别，如NRAS和KRAS突变或HER-2扩增[6,7,8]开始扩展，首次出现了循环肿瘤DNA（ctDNA）这一术语。

自从突出了这种肿瘤来源的循环DNA以来，分子生物学的技术发展，从定量和数字PCR到下一代测序，使其成为了一个强大的液体活检工具。在精准医学时代，识别能够指导患者治疗管理的分子改变似乎至关重要。由于肿瘤释放DNA到血液或其他体液如尿液，这种含有肿瘤分子特征的循环肿瘤DNA可以通过简单的体液样本收集。由于它是微创的，这种液体活检在随访和复发时容易重复。它在某些难以获得肿瘤活检的特定癌症中也非常重要，如原发性中枢神经系统淋巴瘤[9]或组织活检中肿瘤细胞很少的癌症亚型，如霍奇金淋巴瘤（HL），其Reed–Sternbeg细胞仅占肿瘤质量的0.1到2%[10,11]。在这些特殊条件和恶性病中，体液中的ctDNA测序可以作为肿瘤活检的替代品。除了血液外，根据肿瘤的位置，常常使用其他体液，如膀胱癌的尿液或脑瘤的脑脊液[9,12]，但血液是研究中最常用的体液。

在血液中，健康个体的平均cfDNA浓度可以在0至100 ng/mL血浆之间，平均为30 ng/mL血浆，而癌症患者的血液中则显著更高，变化在0至1000 ng/mL，平均为180 ng/mL [13]。这个浓度与癌症的阶段相关，随着阶段的提高和肿瘤大小的增加而增加。肿瘤来源的循环DNA占血液中发现的总无细胞DNA的0.01%到90%以上[14]。在不同类型的癌症中，一个大规模ctDNA测序研究显示ctDNA水平与突变肿瘤负担之间存在关联[15]。此外，鉴于在肿瘤组织中观察到的空间异质性，ctDNA分析可以确定患者肿瘤的完整分子景观，并提供关于可药物靶向的改变和耐药变异的补充信息[16]。随访期间ctDNA动态与预后相关，治疗后其水平的急剧减少与更好的预后相关，而增加通常意味着耐药克隆的发展和最终的治疗失败[17,18,19,20]。

在MRD随访期间检测ctDNA以预测早期复发，在癌症早期阶段诊断中继续是一个挑战，因为总循环DNA中的肿瘤DNA比例可能小于0.01%[21,22]。越来越敏感的测序技术的发展允许在非常低的变异等位基因频率（VAF）下检测cfDNA中存在的改变，不仅用于诊断时的突变分析，也用于疾病复发的早期检测和治疗响应的监测。然而，几个参数可能影响ctDNA检测的敏感性。首先，从采血到提取的cfDNA的质量控制，血液样本的适当处理在分析中至关重要。接下来，选择生物标志物和用于检测它的测序技术是一个重要步骤。然后，使用错误抑制算法的生物信息学分析是区分真变异和假阳性的最终工具。

using DAMIAN to analyse the cfDNA sequencing data

cd ~/Tools/damian/databases/blast

damian.rb --host human3 --type dna -1 ./231114_VH00358_62_AACYCYWM5_cfDNA/p20428/neg_control_S2_R1_001.fastq.gz -2 ./231114_VH00358_62_AACYCYWM5_cfDNA/p20428/neg_control_S2_R2_001.fastq.gz --sample neg_control_S2_megablast --blastn never --blastp never --min_contiglength 200 --threads 55 --force
damian_report.rb
zip -r neg_control_S2_megablast.zip neg_control_S2_megablast/
echo -e "xxxx" | mutt -a "./neg_control_S2_megablast.zip" -s "New results from DAMIAN" -- "xxx@xxx.com"

damian.rb --host human3 --type dna -1 ./231114_VH00358_62_AACYCYWM5_cfDNA/p20429/635724976_S_aureus_epidermidis_S3_R1_001.fastq.gz -2 ./231114_VH00358_62_AACYCYWM5_cfDNA/p20429/635724976_S_aureus_epidermidis_S3_R2_001.fastq.gz --sample 635724976_S_aureus_epidermidis_S3_megablast --blastn never --blastp never --min_contiglength 200 --threads 55 --force
damian_report.rb
zip -r 635724976_S_aureus_epidermidis_S3_megablast.zip 635724976_S_aureus_epidermidis_S3_megablast/
echo -e "xxxx" | mutt -a "./635724976_S_aureus_epidermidis_S3_megablast.zip" -s "New results from DAMIAN" -- "xxx@xxx.com"

damian.rb --host human3 --type dna -1 ./231114_VH00358_62_AACYCYWM5_cfDNA/p20430/635290002_CMV_S4_R1_001.fastq.gz -2 ./231114_VH00358_62_AACYCYWM5_cfDNA/p20430/635290002_CMV_S4_R2_001.fastq.gz --sample 635290002_CMV_S4_megablast --blastn never --blastp never --min_contiglength 200 --threads 55 --force
damian_report.rb
zip -r 635290002_CMV_S4_megablast.zip 635290002_CMV_S4_megablast/
echo -e "xxxx " | mutt -a "./635290002_CMV_S4_megablast.zip" -s "New results from DAMIAN" -- "xxx@xxx.com"

damian.rb --host human3 --type dna -1 ./231114_VH00358_62_AACYCYWM5_cfDNA/p20431/635850623_EBV_S5_R1_001.fastq.gz -2 ./231114_VH00358_62_AACYCYWM5_cfDNA/p20431/635850623_EBV_S5_R2_001.fastq.gz --sample 635850623_EBV_S5_megablast --blastn never --blastp never --min_contiglength 200 --threads 55 --force
damian_report.rb
zip -r 635850623_EBV_S5_megablast.zip 635850623_EBV_S5_megablast/
echo -e "xxxx " | mutt -a "./635850623_EBV_S5_megablast.zip" -s "New results from DAMIAN" -- "xxx@xxx.com"

damian.rb --host human3 --type dna -1 ./231114_VH00358_62_AACYCYWM5_cfDNA/p20427/635031018_E_faecium_S1_R1_001.fastq.gz -2 ./231114_VH00358_62_AACYCYWM5_cfDNA/p20427/635031018_E_faecium_S1_R2_001.fastq.gz --sample 635031018_E_faecium_S1_megablast --blastn never --blastp never --min_contiglength 200 --threads 55 --force
damian_report.rb
zip -r 635031018_E_faecium_S1_megablast.zip 635031018_E_faecium_S1_megablast/
echo -e "xxxx" | mutt -a "./635031018_E_faecium_S1_megablast.zip" -s "New results from DAMIAN" -- "xxx@xxx.com"

#END

using vrap to analyse the cfDNA sequencing data

conda activate vrap
cd vrap_outputs
ln -s ~/Tools/vrap .

vrap/vrap.py -1 ../231114_VH00358_62_AACYCYWM5_cfDNA/p20427/635031018_E_faecium_S1_R1_001.fastq.gz -2 ../231114_VH00358_62_AACYCYWM5_cfDNA/p20427/635031018_E_faecium_S1_R2_001.fastq.gz -o E_faecium_S1_vrap_out --host /home/jhuang/REFs/genome.fa -n /mnt/h1/jhuang/blast/nt -a /mnt/h1/jhuang/blast/nr -t 40 -l 200
vrap/vrap.py -1 ../231114_VH00358_62_AACYCYWM5_cfDNA/p20428/neg_control_S2_R1_001.fastq.gz -2 ../231114_VH00358_62_AACYCYWM5_cfDNA/p20428/neg_control_S2_R2_001.fastq.gz -o neg_control_S2_vrap_out --host /home/jhuang/REFs/genome.fa -n /mnt/h1/jhuang/blast/nt -a /mnt/h1/jhuang/blast/nr -t 40 -l 200
vrap/vrap.py -1 ../231114_VH00358_62_AACYCYWM5_cfDNA/p20429/635724976_S_aureus_epidermidis_S3_R1_001.fastq.gz -2 ../231114_VH00358_62_AACYCYWM5_cfDNA/p20429/635724976_S_aureus_epidermidis_S3_R2_001.fastq.gz -o S_aureus_epidermidis_S3_vrap_out --host /home/jhuang/REFs/genome.fa -n /mnt/h1/jhuang/blast/nt -a /mnt/h1/jhuang/blast/nr -t 40 -l 200

#txid10358 (https://www.ncbi.nlm.nih.gov/nuccore/?term=Cytomegalovirus)
#txid10376 https://www.ncbi.nlm.nih.gov/nuccore/?term=Epstein-Barr-Virus
sed -i -e 's/txid10239/txid10358/g' vrap/download_db.py
sed -i -e 's/retmax=100000/retmax=10000000/g' vrap/download_db.py
vrap/vrap.py -u
vrap/vrap.py -1 ../231114_VH00358_62_AACYCYWM5_cfDNA/p20430/635290002_CMV_S4_R1_001.fastq.gz -2 ../231114_VH00358_62_AACYCYWM5_cfDNA/p20430/635290002_CMV_S4_R2_001.fastq.gz -o CMV_S4_vrap_out --host /home/jhuang/REFs/genome.fa -n /mnt/h1/jhuang/blast/nt -a /mnt/h1/jhuang/blast/nr -t 40 -l 200 #[--virus=Cytomegalovirus.fasta]
mv vrap/database/viral_db vrap/database/viral_db_CMV
sed -i -e 's/txid10358/txid10376/g' vrap/download_db.py
vrap/vrap.py -1 ../231114_VH00358_62_AACYCYWM5_cfDNA/p20431/635850623_EBV_S5_R1_001.fastq.gz -2 ../231114_VH00358_62_AACYCYWM5_cfDNA/p20431/635850623_EBV_S5_R2_001.fastq.gz -o EBV_S5_vrap_out --host /home/jhuang/REFs/genome.fa -n /mnt/h1/jhuang/blast/nt -a /mnt/h1/jhuang/blast/nr -t 40 -l 200 #[--virus=Epstein-Barr-Virus.fasta]
mv vrap/database/viral_db vrap/database/viral_db_EBV

mv vrap/database/viral_db_orig vrap/database/viral_db
vrap/vrap.py -1 ../231114_VH00358_62_AACYCYWM5_cfDNA/p20430/635290002_CMV_S4_R1_001.fastq.gz -2 ../231114_VH00358_62_AACYCYWM5_cfDNA/p20430/635290002_CMV_S4_R2_001.fastq.gz -o CMV_S4_vrap_out_host_CMV --host vrap/database/viral_db_CMV/nucleotide.fa -n /mnt/h1/jhuang/blast/nt -a /mnt/h1/jhuang/blast/nr -t 40 -l 200
vrap/vrap.py -1 ../231114_VH00358_62_AACYCYWM5_cfDNA/p20431/635850623_EBV_S5_R1_001.fastq.gz -2 ../231114_VH00358_62_AACYCYWM5_cfDNA/p20431/635850623_EBV_S5_R2_001.fastq.gz -o EBV_S5_vrap_out_host_EBV --host vrap/database/viral_db_EBV/nucleotide.fa -n /mnt/h1/jhuang/blast/nt -a /mnt/h1/jhuang/blast/nr -t 40 -l 200
#show samtools flagstat mapped and screen of mapped on IGV, the bam and fasta files to her.  
#END

like unlike

点赞本文的读者

还没有人对此文章表态

本文有评论

没有评论

cfDNA Sequencing: Technological Approaches and Bioinformatic Issues

本文有评论

看文章，发评论，不要沉默

最受欢迎文章

最新文章

最多评论文章

推荐相似文章