Exploring statistical approaches to diminish subjectivity of cluster analysis to derive dietary patterns: The Tomorrow Project

Am J Epidemiol. 2011 Apr 15;173(8):956-67. doi: 10.1093/aje/kwq458. Epub 2011 Mar 18.

Abstract

Dietary patterns derived by cluster analysis are commonly reported with little information describing how decisions are made at each step of the analytical process. Using food frequency questionnaire data obtained in 2001-2007 on Albertan men (n = 6,445) and women (n = 10,299) aged 35-69 years, the authors explored the use of statistical approaches to diminish the subjectivity inherent in cluster analysis. Reproducibility of cluster solutions, defined as agreement between 2 cluster assignments, by 3 clustering methods (Ward's minimum variance, flexible beta, K means) was evaluated. Ratios of between- versus within-cluster variances were examined, and health-related variables across clusters in the final solution were described. K means produced cluster solutions with the highest reproducibility. For men, 4 clusters were chosen on the basis of ratios of between- versus within-cluster variances, but for women, 3 clusters were chosen on the basis of interpretability of cluster labels and descriptive statistics. In comparison with those in other clusters, men and women in the "healthy" clusters by greater proportions reported normal body mass index, smaller waist circumference, and lower energy intakes. The authors' approach appeared helpful when choosing the clustering method for both sexes and the optimal number of clusters for men, but additional analyses are required to understand why it performed differently for women.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Age Factors
  • Aged
  • Canada / epidemiology
  • Cluster Analysis
  • Data Interpretation, Statistical*
  • Diet*
  • Female
  • Humans
  • Male
  • Middle Aged
  • Nutrition Surveys
  • Sex Factors