Sparse canonical correlation analysis between an alcohol biomarker and self-reported alcohol consumption

Commun Stat Simul Comput. 2017;46(10):7924-7941. doi: 10.1080/03610918.2016.1255971. Epub 2017 May 9.

Abstract

In investigating the correlation between an alcohol biomarker and self-report, we developed a method to estimate the canonical correlation between two high-dimensional random vectors with a small sample size. In reviewing the relevant literature, we found that our method is somewhat similar to an existing method, but that the existing method has been criticized as lacking theoretical grounding in comparison with an alternative approach. We provide theoretical and empirical grounding for our method, and we customize it for our application to produce a novel method, which selects linear combinations that are step functions with a sparse number of steps.

Keywords: L1 penalty; Partial canonical correlation; Primary 62H20; Regularized canonical correlation analysis; Repeated measures; Secondary 62G08.