Curated variation benchmarks for challenging medically relevant autosomal genes

AbstractThe repetitive nature and complexity of some medically relevant genes poses a challenge for their accurate analysis in a clinical setting. The Genome in a Bottle Consortium has provided variant benchmark sets, but these exclude nearly 400 medically relevant genes due to their repetitiveness or polymorphic complexity. Here, we characterize 273 of these 395 challenging…

Abstract

The repetitive nature and complexity of some medically relevant genes poses a scenario for his or her excellent analysis in a clinical environment. The Genome in a Bottle Consortium has offered variant benchmark sets, however these exclude nearly 400 medically relevant genes ensuing from their repetitiveness or polymorphic complexity. Right here, we train 273 of these 395 sharp autosomal genes the utilize of a haplotype-resolved total-genome assembly. This curated benchmark experiences over 17,000 single-nucleotide variations, 3,600 insertions and deletions and 200 structural variations every for human genome reference GRCh37 and GRCh38 loyal through HG002. We stamp that flawed duplications in both GRCh37 or GRCh38 lead to reference-particular, overlooked variants for short- and lengthy-read applied sciences in medically relevant genes, alongside with CBS, CRYAA and KCNE1. When masking these flawed duplications, variant purchase can support from 8% to 100%. Forming benchmarks from a haplotype-resolved total-genome assembly would maybe presumably turn into a prototype for future benchmarks masking the total genome.

Right here is a preview of subscription snort

Rating entry to alternatives

Subscribe to Journal

Rating plump journal access for 1 one year

99,00 €

good 8,25 € per scenario

Tax calculation shall be finalised within the course of checkout.

Purchase article

Rating time miniature or plump article access on ReadCube.

$32.00

All costs are NET costs.

Records availability

The PacBio HiFi reads historical to generate the hifiasm assembly for the benchmark are within the NCBI Sequence Read Archive with accession numbers SRR10382245, SRR10382244, SRR10382249, SRR10382248, SRR10382247 and SRR10382246. The v1.00 benchmark VCF and BED files, as well to Liftoff gene annotations, assembly–assembly alignments and variant calls, are readily obtainable at https://label.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG002_NA24385_son/CMRG_v1.00/, and as a DOI at https://doi.org/10.18434/mds2-2475. Right here is released as a separate benchmark from v4.2.1, because it contains a microscopic part of the genome, it has different traits from the mapping-primarily primarily based v4.2.1 and v4.2.1 good contains microscopic variants. The utilize of v4.2.1 and the CMRG benchmarks as two separate benchmarks enables customers to receive broader efficiency metrics for plenty of of the genome and for a microscopic online page online of seriously sharp genes, respectively. The masked GRCh38 reference, now not too lengthy within the past updated to v2 with extra flawed duplications from the Telomere-to-Telomere Consortium, is below https://label.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/references. We point out the utilize of v3.0 GA4GH/GIAB stratification mattress files supposed to be used with hap.py when benchmarking, which would maybe presumably per chance be readily obtainable at https://label.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/genome-stratifications/. These stratifications consist of mattress files equivalent to flawed duplications and collapsed duplications in GRCh38. All records don’t receive any restrictions, as the HG002 sample has an begin consent from the Non-public Genome Mission.

Code availability

Scripts historical to manufacture the CMRG benchmark and generate figures and tables for the manuscript are readily obtainable at https://github.com/usnistgov/cmrg-benchmarkset-manuscript. The beforehand developed assembly, which modified into historical as the premise of this benchmark, modified into from hifiasm v0.11.

A quantity of begin supply scheme modified into historical for variant calling for the critiques of the benchmark, alongside with NextDenovo2.2-beta.0, DRAGEN 3.6.3, NeuSomatic’s submission for the PrecisionFDA fact scenario v2 (ref. 12) (BWA-MEM50 version 0.7.17-r1188 (https://github.com/lh3/bwa) and GATK version gatk-4.1.4.1 (https://gatk.broadinstitute.org/hc/en-us)), Parabricks_DeepVariant (Parabricks Pipelines DeepVariant v3.0.0_2 (https://developer.nvidia.com/clara-parabricks)), Sentieon (DNAscope) version sentieon_release_201911 (https://www.sentieon.com/products/#dnaseq), BWA-MEM and Strelka2 (BWA-MEM version 0.7.17-r1188 (https://github.com/lh3/bwa) and Strelka2 version 2.9.10 (https://github.com/Illumina/strelka)), BWA-MEM50(v0.7.8), Picard tools (https://broadinstitute.github.io/picard/) (ver. 1.83), GATK52 (v3.4-0), GATK (v3.5), BWA-MEM v0.7.15-r1140, SAMtools53 v1.3, Picard v2.10.10, GATK v3.8, DELLY54 v0.8.5, GRIDSS55 v2.9.4, LUMPY56 v0.3.1, Manta57 v1.6.0, Wham58 v1.7.0, NanoPlot60 v1.27.0, Filtlong v0.2.0, minimap2 (refs. 40,60) v2.17-r941, cuteSV v1.0.8, Sniffles61 v1.0.12, SURVIVOR59 v1.0.7, BWA v0.7.15, GATK v3.6, Java v1.8.0_74 (OpenJDK), Picard Tools v2.6.0, Sambamba63 v0.6.7, Samblaster64 v0.1.24, Samtools v1.9, DeepVariant v1.0 and Liftoff32 v1.4.0.

References

  1. 1.

    Wenger, A. M. et al. Trustworthy spherical consensus lengthy-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37, 1155–1162 (2019).

    CAS  Article  Google Scholar 

  2. 2.

    Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly the utilize of phased assembly graphs with hifiasm. Nat. Systems 18, 170–175 (2021).

    CAS  Article  Google Scholar 

  3. 3.

    Nurk, S. et al. HiCanu: excellent assembly of segmental duplications, satellites, and allelic variants from excessive-fidelity lengthy reads. Genome Res. 30, 1291–1305 (2020).

    CAS  Article  Google Scholar 

  4. 4.

    Shafin, K. et al. Nanopore sequencing and the Shasta toolkit allow ambiance friendly de novo assembly of 11 human genomes. Nat. Biotechnol. 38, 1044–1053 (2020).

    CAS  Article  Google Scholar 

  5. 5.

    Mahmoud, M. et al. Structural variant calling: the lengthy and the wanting it. Genome Biol. 20, 246 (2019).

    Article  Google Scholar 

  6. 6.

    De Coster, W., Weissensteiner, M. H. & Sedlazeck, F. J. In direction of inhabitants-scale lengthy-read sequencing. Nat. Rev. Genet. 22, 572–587 (2021).

    Article  Google Scholar 

  7. 7.

    Mandelker, D. et al. Navigating extremely homologous genes in a molecular diagnostic environment: a resource for clinical subsequent-generation sequencing. Genet. Med. 18, 1282–1289 (2016).

    CAS  Article  Google Scholar 

  8. 8.

    Ebbert, M. T. W. et al. Systematic analysis of darkish and camouflaged genes exhibits disease-relevant genes hiding in straightforward uncover. Genome Biol. 20, 1–23 (2019).

    Article  Google Scholar 

  9. 9.

    Lincoln, S. E. et al. One in seven pathogenic variants would be sharp to detect by NGS: an analysis of 450,000 patients with implications for clinical sensitivity and genetic take a look at implementation. Genet. Med. 23, 1673–1680 (2021).

  10. 10.

    Zook, J. M. et al. An begin resource for precisely benchmarking microscopic variant and reference calls. Nat. Biotechnol. 37, 561–566 (2019).

    CAS  Article  Google Scholar 

  11. 11.

    Zook, J. M. et al. A unparalleled benchmark for detection of germline ample deletions and insertions. Nat. Biotechnol. 38, 1347–1355 (2020) ; erratum 38, 1357 (2020).

    CAS  Article  Google Scholar 

  12. 12.

    Olson, N. D. et al. precisionFDA Fact Bid V2: calling variants from short- and lengthy-reads in refined-to-plan regions. Preprint at bioRxiv https://doi.org/10.1101/2020.11.13.380741 (2020).

  13. 13.

    Wagner, J. et al. Benchmarking sharp microscopic variants with linked and lengthy reads. Preprint at bioRxiv https://doi.org/10.1101/2020.07.24.212712 (2020).

  14. 14.

    Chin, C.-S. et al. A diploid assembly-primarily primarily based benchmark for variants within the considerable histocompatibility advanced. Nat. Commun. 11, 4794 (2020).

    CAS  Article  Google Scholar 

  15. 15.

    Goldfeder, R. L. et al. Clinical implications of technical accuracy in genome sequencing. Genome Med. 8, 24 (2016).

    Article  Google Scholar 

  16. 16.

    Ball, M. P. et al. A public resource facilitating clinical utilize of genomes. Proc. Natl Acad. Sci. USA 109, 11920–11927 (2012).

    CAS  Article  Google Scholar 

  17. 17.

    Tate, J. G. et al. COSMIC: the Catalogue Of Somatic Mutations In Cancer. Nucleic Acids Res. 47, D941–D947 (2019).

    CAS  Article  Google Scholar 

  18. 18.

    Ross, M. G. et al. Characterizing and measuring bias in sequence records. Genome Biol. 14, R51 (2013).

    Article  Google Scholar 

  19. 19.

    Prior, T. W., Leach, M. E. & Finanger, E. Spinal muscular atrophy. In GeneReviews [Internet] (University of Washington, 2020).

  20. 20.

    Biros, I. & Forrest, S. Spinal muscular atrophy: untangling the knot? J. Med. Genet. 36, 1–8 (1999).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. 21.

    Leiding, J. W. & Holland, S. M. Power granulomatous disease. In GeneReviews [Internet] (University of Washington, 2016).

  22. 22.

    Innan, H. A two-locus gene conversion mannequin with quite quite rather a lot of and its application to the human RHCE and RHD genes. Proc. Natl. Acad. Sci. USA 100, 8793–8798 (2003).

    CAS  Article  Google Scholar 

  23. 23.

    Hayakawa, T. et al. Coevolution of Siglec-11 and Siglec-16 by technique of gene conversion in primates. BMC Evol. Biol. 17, 228 (2017).

    Article  Google Scholar 

  24. 24.

    Garg, P. et al. Pervasive cis results of variation in reproduction quite quite rather a lot of of ample tandem repeats on native DNA methylation and gene expression. Am. J. Hum. Genet. https://doi.org/10.1016/j.ajhg.2021.03.016 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Lennerz, J. K. et al. Addition of H19 ‘lack of methylation checking out’ for Beckwith-Wiedemann syndrome (BWS) will increase the diagnostic yield. J. Mol. Diagn. 12, 576–588 (2010).

    CAS  Article  Google Scholar 

  26. 26.

    Nurk, S. et al. The total sequence of a human genome. Preprint at bioRxiv https://doi.org/10.1101/2021.05.26.445798 (2021).

  27. 27.

    Aganezov, S. et al. A total reference genome improves analysis of human genetic variation. Preprint at bioRxiv https://doi.org/10.1101/2021.07.12.452063 (2021).

  28. 28.

    Boisson, B. et al. Rescue of recurrent deep intronic mutation underlying cell form–dependent quantitative NEMO deficiency. J. Clin. Invest. 129, 583–597 (2018).

    Article  Google Scholar 

  29. 29.

    1000 Genomes Mission Consortium et al. A world reference for human genetic variation. Nature 526, 68–74 (2015).

    Article  Google Scholar 

  30. 30.

    Schmidt, K., Noureen, A., Kronenberg, F. & Utermann, G. Structure, operate, and genetics of lipoprotein (a). J. Lipid Res. 57, 1339–1359 (2016).

    CAS  Article  Google Scholar 

  31. 31.

    Li, H., Feng, X. & Chu, C. The form and building of reference pangenome graphs with minigraph. Genome Biol. 21, 265 (2020).

    Article  Google Scholar 

  32. 32.

    Shumate, A. & Salzberg, S. L. Liftoff: excellent mapping of gene annotations. Bioinform. 37, 1639–1643 (2020).

  33. 33.

    Theunissen, F. et al. Structural variants would be a supply of missing heritability in sALS. Front. Neurosci. 14, 47 (2020).

    Article  Google Scholar 

  34. 34.

    Guo, Y. et al. Enhancements and impacts of GRCh38 human reference on excessive throughput sequencing records analysis.Genomics 109, 83–90 (2017).

    CAS  Article  Google Scholar 

  35. 35.

    Pan, B. et al. Similarities and differences between variants known as with human reference genome HG19 or HG38. BMC Bioinform. 20, 101 (2019).

  36. 36.

    Miller, C. A. et al. Failure to detect mutations in U2AF1 ensuing from modifications within the GRCh38 reference sequence. Preprint at bioRxiv https://doi.org/10.1101/2021.05.07.442430 (2021).

  37. 37.

    Li, H. et al. Exome variant discrepancies ensuing from reference-genome differences. Am. J. Hum. Genet. 108, 1239–1250 (2021).

    CAS  Article  Google Scholar 

  38. 38.

    Collins, R. L. et al. A structural variation reference for clinical and inhabitants genetics. Nature 590, E55 (2021).

    CAS  Article  Google Scholar 

  39. 39.

    Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic aspects. Bioinform. 26, 841–842 (2010).

  40. 40.

    Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinform. 34, 3094–3100 (2018).

  41. 41.

    Krusche, P. et al. Most efficient practices for benchmarking germline microscopic-variant calls in human genomes. Nat. Biotechnol. 37, 555–560 (2019).

    CAS  Article  Google Scholar 

  42. 42.

    Van der Auwera, G. A. & O’Connor, B. D. Genomics within the Cloud: The utilize of Docker, GATK, and WDL in Terra (O’Reilly Media, 2020).

  43. 43.

    Farek, J. et al. xAtlas: scalable microscopic variant calling loyal through heterogeneous subsequent-generation sequencing experiments. Preprint at bioRxiv https://doi.org/10.1101/295071 (2018).

  44. 44.

    Edge, P. & Bansal, V. Longshot enables excellent variant calling in diploid genomes from single-molecule lengthy read sequencing. Nat. Commun. 10, 4660 (2019).

    Article  Google Scholar 

  45. 45.

    Shafin, K. et al. Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables excessive accuracy in nanopore lengthy-reads. Nat. Meth. 18, 1322–1332 (2021).

  46. 46.

    Sahraeian, S. M. E. et al. Deep convolutional neural networks for excellent somatic mutation detection. Nat. Commun. 10, 1041 (2019).

    Article  Google Scholar 

  47. 47.

    Walker, B. J. et al. Pilon: an integrated instrument for comprehensive microbial variant detection and genome assembly development. PLoS One 9, e112963 (2014).

    Article  Google Scholar 

  48. 48.

    Patterson, M. et al. WhatsHap: weighted haplotype assembly for future-generation sequencing reads. J. Comput. Biol. 6, 498–509 (2015).

  49. 49.

    Zook, J. M. et al. Huge sequencing of seven human genomes to train benchmark reference materials. Sci. Records 3, 160025 (2016).

    CAS  Article  Google Scholar 

  50. 50.

    Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).

  51. 51.

    Regier, A. A. et al. Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling loyal through human genetics projects.Nat. Commun. 9, 4038 (2018).

    Article  Google Scholar 

  52. 52.

    Poplin, R. et al. Scaling excellent genetic variant discovery to tens of thousands of samples. Preprint at bioRxiv https://doi.org/10.1101/201178 (2018).

  53. 53.

    Li, H. et al. The Sequence Alignment/Blueprint layout and SAMtools. Bioinform. 25, 2078–2079 (2009).

  54. 54.

    Rausch, T. et al. DELLY: structural variant discovery by integrated paired-close and break up-read analysis. Bioinform. 28, 333–339 (2012).

  55. 55.

    Cameron, D. L. et al. GRIDSS: sensitive and particular genomic rearrangement detection the utilize of positional de Bruijn graph assembly. Genome Res. 27, 2050–2060 (2017).

    CAS  Article  Google Scholar 

  56. 56.

    Layer, R. M., Chiang, C., Quinlan, A. R. & Hall, I. M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).

    Article  Google Scholar 

  57. 57.

    Chen, X. et al. Manta: quick detection of structural variants and indels for germline and most cancers sequencing positive aspects. Bioinform. 32, 1220–1222 (2016).

  58. 58.

    Kronenberg, Z. N. et al. Wham: figuring out structural variants of natural . PLoS Comput. Biol. 11, e1004572 (2015).

    Article  Google Scholar 

  59. 59.

    Jeffares, D. C. et al. Transient structural variations receive solid results on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 14061 (2017).

    CAS  Article  Google Scholar 

  60. 60.

    De Coster, W., D’Hert, S., Schultz, D. T., Cruts, M. & Van Broeckhoven, C. NanoPack: visualizing and processing lengthy-read sequencing records. Bioinform. 34, 2666–2669 (2018).

  61. 61.

    Sedlazeck, F. J. et al. Trustworthy detection of advanced structural variations the utilize of single-molecule sequencing. Nat. Systems 15, 461–468 (2018).

    CAS  Article  Google Scholar 

  62. 62.

    Jiang, T. et al. Long-read-primarily primarily based human genomic structural variation detection with cuteSV. Genome Biol. 21, 189 (2020).

    CAS  Article  Google Scholar 

  63. 63.

    Tarasov, A., Vilella, A. J., Cuppen, E., Nijman, I. J. & Prins, P. Sambamba: quick processing of NGS alignment formats. Bioinform. 31, 2032–2034 (2015).

  64. 64.

    Faust, G. G. & Hall, I. M. SAMBLASTER: quick reproduction marking and structural variant read extraction. Bioinform. 30, 2503–2505 (2014).

  65. 65.

    Poplin, R. et al. A universal SNP and microscopic-indel variant caller the utilize of deep neural networks. Nat. Biotechnol. 36, 983–987 (2018).

    CAS  Article  Google Scholar 

Secure references

Acknowledgements

We thank the Genome Reference Consortium for his or her curation efforts of GRCh37 and GRCh38 (https://www.genomereference.org), especially V.A. Schneider and P.A. Kitts from the Nationwide Institutes of Health (NIH)/NCBI for increasing the falsely duplicated regions that ought to be masked in GRCh38. We thank S. Miller at NIST for serving to type readily obtainable benchmark sets and READMEs. Certain industrial tools, devices or materials are identified to adequately specify experimental prerequisites or reported results. Such identification doesn’t mean advice or endorsement by NIST, nor does it point out that the tools, devices or materials identified are necessarily the most efficient readily obtainable for the intention. C.F. modified into funded by Instituto de Salud Carlos III (PI20/00876) and Ministerio de Ciencia e Innovación (RTC-2017-6471-1; AEI/FEDER, UE), cofinanced by the European Regional Style Fund ‘A Strategy of Making Europe’ from the European Union, and Cabildo Insular de Tenerife (CGIEU0000219140). J.M.L.-S. modified into funded by Consejería de Educación-Gobierno de Canarias and Cabildo Insular de Tenerife (BOC 163, 24/08/2017). F.J.S. and M.M. modified into supported by the NIH (UM1 HG008898). C.X. modified into supported by the Intramural Evaluate Program of the Nationwide Library of Treatment, NIH. K.H.M. modified into supported by the NIH/Nationwide Human Genome Evaluate Institute (R01 1R01HG011274-01 and U01 1U01HG010971). H.L. modified into supported by the NIH (R01 HG010040 and U01 HG010961). C.E.M. thanks funding from the WorldQuant Foundation, NASA (NNX14AH50G), the Nationwide Institutes of Health (R01MH117406, R01CA249054, R01AI151059, P01CA214274) and the Leukemia and Lymphoma Society (LLS) (MCL7001-18, LLS 9238-16, LLS-MCL7001-18).

Writer records

Writer notes

  1. These authors contributed equally: Chen-Shan Chin, Justin M. Zook, Fritz J. Sedlazeck.

Affiliations

  1. Field materials Measurement Laboratory, Nationwide Institute of Standards and Skills, Gaithersburg, MD, USA

    Justin Wagner, Nathan D. Olson, Lindsay Harris, Jennifer McDaniel & Justin M. Zook

  2. Department of Records Science, Dana-Farber Cancer Institute, Boston, MA, USA

    Haoyu Cheng & Heng Li

  3. DNAnexus, Inc., Mountain Leer, CA, USA

    Arkarachai Fungtammasan, Yih-Chii Hwang, Richa Gupta & Chen-Shan Chin

  4. Pacific Biosciences, Menlo Park, CA, USA

    Aaron M. Wenger & William J. Rowell

  5. Human Genome Sequencing Heart, Baylor School of Treatment, Houston, TX, USA

    Ziad M. Khan, Jesse Farek, Yiming Zhu, Aishwarya Pisupati, Medhat Mahmoud & Fritz J. Sedlazeck

  6. Nationwide Heart for Biotechnology Records, Nationwide Library of Treatment, Nationwide Institutes of Health, Bethesda, MD, USA

    Chunlin Xiao

  7. Genomic Treatment Heart, Kids’s Mercy Kansas City, Kansas City, MO, USA

    Byunggil Yoo

  8. Roche Sequencing Choices, Santa Clara, CA, USA

    Sayed Mohammad Ebrahim Sahraeian

  9. Department of Pediatrics, Division of Genetic Treatment, University of Washington and Seattle Kids’s Clinical institution, Seattle, WA, USA

    Danny E. Miller

  10. Department of Genome Sciences, University of Washington, Seattle, WA, USA

    Danny E. Miller

  11. Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain

    David Jáspez, José M. Lorenzo-Salazar, Adrián Muñoz-Barrera, Luis A. Rubio-Rodríguez & Carlos Flores

  12. CIBER de Enfermedades Respiratorias, Instituto de Salud Carlos III, Madrid, Spain

    Carlos Flores

  13. Evaluate Unit, Clinical institution Universitario N.S. de Candelaria, Santa Cruz de Tenerife, Spain

    Carlos Flores

  14. Unique York Genome Heart, Unique York, NY, USA

    Giuseppe Narzisi, Uday Shanker Evani & Wayne E. Clarke

  15. Bionano Genomics, San Diego, CA, USA

    Joyce Lee

  16. Department of Physiology and Biophysics, Weill Cornell Treatment, Unique York, NY, USA

    Christopher E. Mason

  17. Invitae, San Francisco, CA, USA

    Stephen E. Lincoln

  18. UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA

    Karen H. Miga

  19. Sanders-Brown Heart on Growing outdated, University of Kentucky, Lexington, KY, USA

    Tag T. W. Ebbert

  20. Department of Interior Treatment, Division of Biomedical Informatics, University of Kentucky, Lexington, KY, USA

    Tag T. W. Ebbert

  21. Department of Neuroscience, University of Kentucky, Lexington, KY, USA

    Tag T. W. Ebbert

  22. Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA

    Alaina Shumate

  23. Heart for Computational Biology, Whiting College of Engineering, Johns Hopkins University, Baltimore, MD, USA

    Alaina Shumate

Contributions

Conceptualization: J.W., N.D.O., A.F., K.H.M., S.E.L., M.T.W.E., H.L., C.-S.C., J.M.Z. and F.J.S. Records curation: J.W., N.D.O. and J.M. Formal analysis – benchmark: J.W., N.D.O., J.M. and J.M.Z. Formal analysis – assembly: H.C., A.S., H.L. and C.-S.C. Methodology: J.W., H.C., H.L., C.-S.C., J.M.Z. and F.J.S. Mission administration: J.W., J.M.Z. and F.J.S. Resources: C.X. Application: J.W. and N.D.O. Supervision: C.-S.C., J.M.Z. and F.J.S. Validation: J.W., N.D.O., L.H., J.M., H.C., A.F., Y.-C.H., R.G., A.M.W., W.J.R., Z.M.K., J.F., Y.Z., A.P., M.M., C.X., B.Y., S.M.E.S., D.J., J.M.L.-S., A.M.-B., L.A.R.-R., C.F., G.N., U.S.E., S.E.C., J.L., H.L., C.-S.C., J.M.Z. and F.J.S. Visualization: J.W., N.D.O., H.C., H.L. and C.-S.C. Writing – fashioned draft: J.W., L.H., C.-S.C., J.M.Z. and F.J.S. Writing – evaluation and editing: J.W., N.D.O., D.E.M., J.L., C.E.M., S.E.L., M.T.W.E., C.-S.C., J.M.Z. and F.J.S.

Corresponding authors

Correspondence to Chen-Shan Chin, Justin M. Zook or Fritz J. Sedlazeck.

Ethics declarations

Competing pursuits

A.M.W. and W.J.R. are workers and shareholders of Pacific Biosciences. A.F., Y.-C.H, R.G., and C.-S.C. are workers and shareholders of DNAnexus. S.M.E.S. is an employee of Roche. J.L. is a outdated employee and shareholder of Bionano Genomics. S.E.L. modified into an employee of Invitae. F.J.S. has backed shuttle from Pacific Biosciences and Oxford Nanopore Technologies. The last authors uncover no competing pursuits.

Look evaluation

Look evaluation records

Nature Biotechnology thanks Adam Ameur, Christian Marshall and different, anonymous, reviewer(s) for his or her contribution to the undercover agent evaluation of this work.

Additional records

Writer’s stamp Springer Nature stays neutral with regards to jurisdictional claims in printed maps and institutional affiliations.

Supplementary records

Supplementary Records

Supplementary Figures 1–17, Notes 1–5 and Desk 1.

Supplementary Records 1

Additional traits of excessive-priority clinical genes.

Supplementary Records 2

Overlaps of the 5,038 genes on GRCh38 predominant assembly between both HG002 GRCh38 v4.2.1 and HG002 hifiasm v0.11.

Supplementary Records 3

Benchmarking of the hifiasm v0.11 assembly-primarily primarily based variants known as with dipcall in opposition to the GIAB v4.2.1 benchmark for HG002.

Supplementary Records 4

Benchmarking statistics in opposition to CMRG benchmark and overview callsets.

Supplementary Records 5

Ebook curation results for overview and fashioned errors in v0.02.03 microscopic variant benchmark.

Supplementary Records 6

Primer designs and response prerequisites for Long-Vary PCR and Sanger confirmation.

Supplementary Records 7

Genes excluded from the CMRG benchmarks, with in all probability causes for exclusion annotated for GRCh38 within the closing column.

Supplementary Records 8

Instructions for BWA-GATK variant calling on fashioned GRCh38 reference.

Supplementary Records 9

Instructions for BWA-GATK variant calling on v1 masked GRCh38 reference.

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wagner, J., Olson, N.D., Harris, L. et al. Curated variation benchmarks for sharp medically relevant autosomal genes. Nat Biotechnol (2022). https://doi.org/10.1038/s41587-021-01158-1

Secure citation

  • Got:

  • Permitted:

  • Published:

  • DOI: https://doi.org/10.1038/s41587-021-01158-1