An efficient algorithm for the extraction of HGVS variant descriptions from sequences

Jonathan K Vis; Martijn Vermaat; Peter E M Taschner; Joost N Kok; Jeroen F J Laros

doi:10.1093/bioinformatics/btv443

An efficient algorithm for the extraction of HGVS variant descriptions from sequences

Bioinformatics. 2015 Dec 1;31(23):3751-7. doi: 10.1093/bioinformatics/btv443. Epub 2015 Jul 31.

Authors

Jonathan K Vis¹, Martijn Vermaat², Peter E M Taschner³, Joost N Kok¹, Jeroen F J Laros⁴

Affiliations

¹ Department of Molecular Epidemiology, Leiden University Medical Center, Leiden, The Netherlands, Leiden Institute of Advanced Computer Science, Leiden University, Leiden, The Netherlands.
² Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands.
³ Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands, Generade Center of Expertise Genomics, University of Applied Sciences Leiden, Leiden, The Netherlands and.
⁴ Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands, Leiden Genome Technology Center, Leiden University Medical Center, Leiden, The Netherlands.

PMID: 26231427
DOI: 10.1093/bioinformatics/btv443

Abstract

Motivation: Unambiguous sequence variant descriptions are important in reporting the outcome of clinical diagnostic DNA tests. The standard nomenclature of the Human Genome Variation Society (HGVS) describes the observed variant sequence relative to a given reference sequence. We propose an efficient algorithm for the extraction of HGVS descriptions from two sequences with three main requirements in mind: minimizing the length of the resulting descriptions, minimizing the computation time and keeping the unambiguous descriptions biologically meaningful.

Results: Our algorithm is able to compute the HGVS descriptions of complete chromosomes or other large DNA strings in a reasonable amount of computation time and its resulting descriptions are relatively small. Additional applications include updating of gene variant database contents and reference sequence liftovers.

Availability: The algorithm is accessible as an experimental service in the Mutalyzer program suite (https://mutalyzer.nl). The C++ source code and Python interface are accessible at: https://github.com/mutalyzer/description-extractor.

Contact: j.k.vis@lumc.nl.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Genetic Variation*
Genome, Human
Humans
Sequence Analysis, DNA / methods*