The evaluation of the quality of segmentations of an image, and the assessment of intra- and inter-expert variability in segmentation performance, has long been recognized as a difficult task. Recently an Expectation Maximization (EM) algorithm for Simultaneous Truth and Performance Level Estimation (STAPLE), Was developed to compute both an estimate of the reference standard segmentation and performance parameters from a set of segmentations of an image. The performance is characterized by the rate of detection of each segmentation label by each expert in comparison to the estimated reference standard. This previous work provides estimates of performance parameters, but does not provide any information regarding their uncertainty. An estimate of this inferential uncertainty, if available, would allow estimation of confidence intervals for the values of the parameters, aid in the interpretation of the performance of segmentation generators, and help determine if sufficient data size and number of segmentations have been obtained to accurately characterize the performance parameters. We present a new algorithm to estimate the inferential uncertainty of the performance parameters for binary segmentations. It is derived for the special case of the STAPLE algorithm based on established theory for general purpose covariance matrix estimation for EM algorithms. The bounds on performance estimates are estimated by the computation of the observed Information Matrix. We use this algorithm to study the bounds on performance estimates from simulated images with specified performance parameters, and from interactive segmentations of neonatal brain MRIs. We demonstrate that confidence intervals for expert segmentation performance parameters can be estimated with our algorithm. We investigate the influence of the number of experts and of the image size on these bounds, showing that it is possible to determine the number of image segmentations and the size of images necessary to achieve a chosen level of accuracy in segmentation performance assessment.