What is Combined Annotation Dependent Depletion (CADD)?
CADD is a tool for scoring the deleteriousness of single nucleotide variants as well as insertion/deletions variants in the human genome.
While many variant annotation and scoring tools are around, most annotations tend to exploit a single information type (e.g. conservation) and/or are restricted in scope (e.g. to missense changes). Thus, a broadly applicable metric that objectively weights and integrates diverse information is needed. Combined Annotation Dependent Depletion (CADD) is a framework that integrates multiple annotations into one metric by contrasting variants that survived natural selection with simulated mutations.
C-scores strongly correlate with allelic diversity, pathogenicity of both coding and non-coding variants, and experimentally measured regulatory effects, and also highly rank causal variants within individual genome sequences. Finally, C-scores of complex trait-associated variants from genome-wide association studies (GWAS) are significantly higher than matched controls and correlate with study sample size, likely reflecting the increased accuracy of larger GWAS.
CADD can quantitatively prioritize functional, deleterious, and disease causal variants across a wide range of functional categories, effect sizes and genetic architectures and can be used prioritize causal variation in both research and clinical settings.
In addition to this website, CADD has been described in three publications. The most recent manuscript describes CADD-Splice (CADD v1.6), the latest extension of CADD to improve its predictions of splicing effects:
Rentzsch P, Schubach M, Shendure J, Kircher M.Our second manuscript describes the updates between the initial publication and CADD v1.4, introduces CADD for GRCh38 and explains how we envision the use of CADD. It was published by Nucleic Acids Research in 2018:
CADD-Splice—improving genome-wide variant effect prediction using deep learning-derived splice scores.
Genome Med. 2021 Feb 22. doi: 10.1186/s13073-021-00835-9.
PubMed PMID: 33618777.
Rentzsch P, Witten D, Cooper GM, Shendure J, Kircher M.The original manuscript describing the method was published by Nature Genetics in 2014:
CADD: predicting the deleteriousness of variants throughout the human genome.
Nucleic Acids Res. 2018 Oct 29. doi: 10.1093/nar/gky1016.
PubMed PMID: 30371827.
Kircher M, Witten DM, Jain P, O'Roak BJ, Cooper GM, Shendure J.
A general framework for estimating the relative pathogenicity of human genetic variants.
Nat Genet. 2014 Feb 2. doi: 10.1038/ng.2892.
PubMed PMID: 24487276.
How can I obtain CADD scores?
CADD scores are freely available for all non-commercial applications. If you are planning on using them in a commercial application, you can obtain a license through the UW CoMotion Express Licensing System. If in doubt about whether you need a license for your application, please contact Jay Shendure and Gregory M. Cooper. CADD is currently developed by Martin Kircher, Philipp Rentzsch, Daniela M. Witten, Gregory M. Cooper, and Jay Shendure.
We have pre-computed CADD-based scores (C-scores) for all approximately 9 billion possible single nucleotide variants (SNVs) of the reference genome, a selection of short insertion/deletions as well as some large variant sets (e.g. gnomAD, ExAC, 1000 Genomes, ESP). We also provide a simple lookup for SNVs and enable scoring of short insertions/deletions. Ranges of scores can be natively visualized in UCSC Genome Browser or using our custom tracks (for hg19/GRCh37 and hg38/GRCh38).