Combining globally search for a regular expression and print matching lines with bibliographic monitoring of genomic database improves diagnosis

Front Genet. 2023 Apr 20:14:1122985. doi: 10.3389/fgene.2023.1122985. eCollection 2023.

Abstract

Introduction: Exome sequencing has a diagnostic yield ranging from 25% to 70% in rare diseases and regularly implicates genes in novel disorders. Retrospective data reanalysis has demonstrated strong efficacy in improving diagnosis, but poses organizational difficulties for clinical laboratories. Patients and methods: We applied a reanalysis strategy based on intensive prospective bibliographic monitoring along with direct application of the GREP command-line tool (to "globally search for a regular expression and print matching lines") in a large ES database. For 18 months, we submitted the same five keywords of interest [(intellectual disability, (neuro)developmental delay, and (neuro)developmental disorder)] to PubMed on a daily basis to identify recently published novel disease-gene associations or new phenotypes in genes already implicated in human pathology. We used the Linux GREP tool and an in-house script to collect all variants of these genes from our 5,459 exome database. Results: After GREP queries and variant filtration, we identified 128 genes of interest and collected 56 candidate variants from 53 individuals. We confirmed causal diagnosis for 19/128 genes (15%) in 21 individuals and identified variants of unknown significance for 19/128 genes (15%) in 23 individuals. Altogether, GREP queries for only 128 genes over a period of 18 months permitted a causal diagnosis to be established in 21/2875 undiagnosed affected probands (0.7%). Conclusion: The GREP query strategy is efficient and less tedious than complete periodic reanalysis. It is an interesting reanalysis strategy to improve diagnosis.

Keywords: GREP; data reanalysis; developmental anomalies; diagnostic improvement; exome sequencing (ES); genomic database; intellectual disability.