Exploring statistical approaches to diminish subjectivity of cluster analysis to derive dietary patterns: The Tomorrow Project

Geraldine Lo Siou; Yutaka Yasui; Ilona Csizmadi; S Elizabeth McGregor; Paula J Robson

doi:10.1093/aje/kwq458

Exploring statistical approaches to diminish subjectivity of cluster analysis to derive dietary patterns: The Tomorrow Project

Am J Epidemiol. 2011 Apr 15;173(8):956-67. doi: 10.1093/aje/kwq458. Epub 2011 Mar 18.

Authors

Geraldine Lo Siou¹, Yutaka Yasui, Ilona Csizmadi, S Elizabeth McGregor, Paula J Robson

Affiliation

¹ Department of Population Health Research, Alberta Health Services—Cancer Care, c/o Holy Cross Site, Box ACB, 2210 2nd Street SW, Calgary, Alberta, Canada T2S 3C3. geraldine.losiou@albertahealthservices.ca

PMID: 21421742
DOI: 10.1093/aje/kwq458

Abstract

Dietary patterns derived by cluster analysis are commonly reported with little information describing how decisions are made at each step of the analytical process. Using food frequency questionnaire data obtained in 2001-2007 on Albertan men (n = 6,445) and women (n = 10,299) aged 35-69 years, the authors explored the use of statistical approaches to diminish the subjectivity inherent in cluster analysis. Reproducibility of cluster solutions, defined as agreement between 2 cluster assignments, by 3 clustering methods (Ward's minimum variance, flexible beta, K means) was evaluated. Ratios of between- versus within-cluster variances were examined, and health-related variables across clusters in the final solution were described. K means produced cluster solutions with the highest reproducibility. For men, 4 clusters were chosen on the basis of ratios of between- versus within-cluster variances, but for women, 3 clusters were chosen on the basis of interpretability of cluster labels and descriptive statistics. In comparison with those in other clusters, men and women in the "healthy" clusters by greater proportions reported normal body mass index, smaller waist circumference, and lower energy intakes. The authors' approach appeared helpful when choosing the clustering method for both sexes and the optimal number of clusters for men, but additional analyses are required to understand why it performed differently for women.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Adult
Age Factors
Aged
Canada / epidemiology
Cluster Analysis
Data Interpretation, Statistical*
Diet*
Female
Humans
Male
Middle Aged
Nutrition Surveys
Sex Factors