Abstract
Purpose: Widespread application of next-generation sequencing, combined with data exchange platforms, has provided molecular diagnoses for countless families. To maximize diagnostic yield, we implemented an unbiased semi-automated genematching algorithm based on genotype and phenotype matching. Methods: Rare homozygous variants identified in 2 or more affected individuals, but not in healthy individuals, were extracted from our local database of ∼12,000 exomes. Phenotype similarity scores (PSS), based on human phenotype ontology terms, were assigned to each pair of individuals matched at the genotype level using HPOsim. Results: 33,792 genotype-matched pairs were discovered, representing variants in 7567 unique genes. There was an enrichment of PSS ≥0.1 among pathogenic/likely pathogenic variant-level pairs (94.3% in pathogenic/likely pathogenic variant-level matches vs 34.75% in all matches). We highlighted founder or region-specific variants as an internal positive control and proceeded to identify candidate disease genes. Variant-level matches were particularly helpful in cases involving inframe indels and splice region variants beyond the canonical splice sites, which may otherwise have been disregarded, allowing for detection of candidate disease genes, such as KAT2A, RPAIN, and LAMP3. Conclusion: Semi-automated genotype matching combined with PSS is a powerful tool to resolve variants of uncertain significance and to identify candidate disease genes.
Original language | English |
---|---|
Article number | 101068 |
Journal | Genetics in Medicine |
Volume | 26 |
Issue number | 4 |
DOIs | |
State | Published - Apr 2024 |
Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2024 American College of Medical Genetics and Genomics
Funding
The authors thank the families who participated in this study and the referring physicians. No external funding to be declared. Conceptualization of study and methodology: J.R. T.H.; Analysis of exome data: J.R. O.E. H.M.-S. T.H.; Molecular experiments and data analysis: O.H. A.F. S.Y.D. S.E. I.S.; Bioinformatics and computational approaches: Z.L. T.S. S.G.-N.; Clinical and genomic data: B.A.-L. S.E. S.S. O.B. M.H. M.S. N.S.D. A.A. M.E.S. O.S.B.; NGS experiments: C.R.; In silico modeling: J.V. O.S.-F.; Writing-original draft: J.R. T.H.; Writing-review and editing: all authors. The study was conducted in accordance with Hadassah Medical Organization's IRB-approved protocol (No. 0306-10-HMO). All families signed informed consent for genomic testing. Families providing photographs provided additional consent, for photo publication. DNA numbers were coded.
Keywords
- Exome sequencing
- Genotype matching
- HPO terms
- KAT2A
- Phenotype similarity scores
- RPAIN
- Variants of uncertain significance