The popularity of direct or systematic social observation as a method to evaluate the mechanisms by which neighborhood environments impact health and contribute to health disparities is growing. The development of measures with adequate inter-rater and test-retest reliability is essential for this research. In this paper, based on our experiences conducting direct observation of neighborhoods in Detroit, MI, we describe strategies to promote high inter-rater and test-retest reliability and methods to evaluate reliability. We then present the results and discuss implications for future research efforts using direct observation in four areas: methods to evaluate reliability, instrument content and design, observer training, and data collection.