CAD is a 243-kDa multidomain polypeptide which catalyzes the first three steps in mammalian de novo pyrimidine biosynthesis. The largest cDNA clone obtained thus far, pCAD142 (Shigesada, K., Stark, G.R., Maley, J. A., Niswander, L. A., and Davidson, J. N. (1985) Mol. Cell. Biol. 5, 1735), lacks the 5' end of the mRNA which encodes the amino terminus of CAD. To clone this missing segment, a synthetic oligonucleotide complementary to pCAD142 and poly(A)+ RNA template, isolated from a Syrian hamster cell line which overproduces the CAD mRNA, were used for cDNA synthesis. The resulting clone pKB11, which has a 1369-base pair (bp) cDNA insert, overlapping pCAD142 by 781 bp, was identified by hybridization methods and sequence analysis and found to contain the entire cDNA sequence for the amino end of the CAD polypeptide. The deduced amino acid sequence is homologous to seven carbamyl phosphate synthetases. Primer extension, oligonucleotide-directed RNase H digestion, and RNA sequencing indicated that pKB11 extends to within 68 bases of the 5' end of the CAD mRNA. This conclusion was confirmed by Northern blotting analysis of the 5'-flanking region of CAD gene. The probable 3' end of an unidentified gene which codes for a 1-kilobase (kb) transcript was identified immediately upstream of the CAD gene. Northern analysis using probes complementary to the region between the CAD and the 1-kb genes detected the presence of a small transcript of less than 300 nucleotides. The sequence revealed three potential translation initiation sites raising the possibility of more than one CAD translation product. The major translation start codon was identified as the first ATG in pKB11 by sequence homology, in vitro transcription and translation, and protein studies. Starting from this ATG within pKB11, the clone encodes a 143-residue domain of unknown function. This study completes the determination of the primary structure of the CAD polypeptide. The CAD mRNA is 7.5 kb in length and has 6675 bp of coding sequence and about 200 bp and 600 bp of untranslated sequence at the 5' and 3' ends, respectively.