Breast cancer is characterized by an important histoclinical heterogeneity that currently hampers the selection of the most appropriate treatment for each case. This problem could be solved by the identification of new parameters that better predict the natural history of the disease and its sensitivity to treatment. A large-scale molecular characterization of breast cancer could help in this context. Using cDNA arrays, we studied the quantitative mRNA expression levels of 176 candidate genes in 34 primary breast carcinomas along three directions: comparison of tumor samples, correlations of molecular data with conventional histoclinical prognostic features and gene correlations. The study evidenced extensive heterogeneity of breast tumors at the transcriptional level. A hierarchical clustering algorithm identified two molecularly distinct subgroups of tumors characterized by a different clinical outcome after chemotherapy. This outcome could not have been predicted by the commonly used histoclinical parameters. No correlation was found with the age of patients, tumor size, histological type and grade. However, expression of genes was differential in tumors with lymph node metastasis and according to the estrogen receptor status; ERBB2 expression was strongly correlated with the lymph node status (P < 0.0001) and that of GATA3 with the presence of estrogen receptors (P < 0.001). Thus, our results identified new ways to group tumors according to outcome and new potential targets of carcinogenesis. They show that the systematic use of cDNA array testing holds great promise to improve the classification of breast cancer in terms of prognosis and chemosensitivity and to provide new potential therapeutic targets.