BACKGROUND To review the accuracy of multivariate models for the prediction of ovarian reserve and pregnancy in women undergoing IVF compared with the antral follicle count (AFC) as single test. METHODS We performed a computerized MEDLINE and EMBASE search to identify articles published on multivariate models for ovarian reserve testing in patients undergoing IVF. In order to be selected, articles had to contain data on the outcome of IVF in terms of either pregnancy and/or poor response and on the prediction of these events based on a multivariate model. For the selected studies, sensitivity and specificity of the test in the prediction of poor ovarian response and non-pregnancy were calculated. Overall performance was assessed by estimating a summary receiver operating characteristic (ROC) curve, which was compared with the ROC curve for the AFC as the current best single test. RESULTS We identified 11 studies reporting on the predictive capacity of multivariate models in ovarian reserve testing. All studies reported on the prediction of poor ovarian response, whereas none reported on the occurrence of pregnancy. The sensitivity for prediction of poor ovarian response varied between 39% and 97% and the specificity between 50% and 96%. Logistic regression analysis indicated that cohort studies provided a significantly better discriminative performance than case-control studies. As cohort studies are superior to case-control studies, further analysis was limited to the cohort studies. For the cohort studies, a summary ROC curve could be estimated, which had a shape similar to that previously made for the AFC. CONCLUSIONS The accuracy of multivariate models for the prediction of ovarian response in women undergoing IVF is similar to the accuracy of AFC. No data are available on the capacity of these models to predict pregnancy, let alone live birth. On the basis of these findings, the use of more than one single test for the assessment of ovarian reserve cannot currently be supported.