Heart sounds have been widely studied and have been demonstrated to have value for detecting pathologies in clinical applications. Over the last few decades, the use of heart sound signals has become increasingly uncommon and its practice in modern medicine somewhat diminished, although research into automated analysis has continued. Unfortunately, a comparative analyses of algorithms in the literature have been hindered by the lack of high-quality, rigorously validated, and standardized open databases of heart sound recordings. The 2016 PhysioNet/Computing in Cardiology (CinC) Challenge addressed this issue by assembling the largest public heart sound database, aggregated from eight sources obtained by seven independent research groups around the world. The database comprises a total of 4,430 recordings collected from 1,072 healthy subjects and patients with a variety of conditions, including heart valve disease and coronary artery disease.
This editorial reviews the background issues for this Challenge, the design of the Challenge itself, the key achievements, and the follow-up research generated as a result of the Challenge, published in the concurrent special issue of Physiological Measurement. Additionally we make some recommendations for future changes in this the field of heart sound signal processing as a result of the Challenge.
In the Challenge, participants were asked to classify recordings as normal, abnormal, or unsure. The overall score for an entry was based on a weighted sensitivity and specificity score with respect to manual expert annotations. To aid researchers, we provided a simple baseline classification method and a complex open source code base for segmenting the heart sounds, based on a hidden semi-Markov model.
During the official phase of the Challenge, a total of 48 teams submitted 348 open source entries, with a highest score of 0.860 (Se=0.942, Sp=0.778). Subsequently, for this special issue, researchers reported the new highest score of 0.855 (Se=0.890, Sp=0.816) in the follow-up phase of the Challenge, indicating that the Challenge entrants achieved exceptional results which were extremely dicult to improve (even when there is a trade-off between Sp and Se) upon in the 4 months available post-Challenge. We expect that future researchers will be able to use the extensive database generated for the Challenge to significantly improve on the approaches detailed here.