The proposed method focuses on speaker disentanglement in the context of depression detection from speech signals. Previous approaches require patient/speaker labels, encounter instability due to loss maximization, and introduce unnecessary parameters for adversarial domain prediction. In contrast, the proposed unsupervised approach reduces cosine similarity between latent spaces of depression and pre-trained speaker classification models. This method outperforms baseline models, matches or exceeds adversarial methods in performance, and does so without relying on speaker labels or introducing additional model parameters, leading to a reduction in model complexity. The higher the speaker de-identification score (DeID), the better the depression detection system is in masking a patient's identity thereby enhancing the privacy attributes of depression detection systems. On the DAIC-WOZ dataset with ComparE16 features and an LSTM-only model, our method achieves an F1-Score of 0.776 and a DeID score of 92.87%, outperforming its adversarial counterpart which has an F1Score of 0.762 and 68.37% DeID, respectively. Furthermore, we demonstrate that speaker-disentanglement methods are complementary to text-based approaches, and a score-level fusion with a Word2vec-based depression detection model further enhances the overall performance to an F1-Score of 0.830.
Keywords: DAIC-WOZ; Depression detection; Healthcare AI; Privacy; Speaker disentanglement.