The pneumococcus is a leading global pathogen and a key virulence factor possessed by the majority of pneumococci is an antigenic polysaccharide capsule ('serotype'), which is encoded by the capsular (cps) locus. Approximately 100 different serotypes are known, but the extent of sequence diversity within the cps loci of individual serotypes is not well understood. Investigating serotype-specific sequence variation is crucial to the design of sequence-based serotyping methodology, understanding pneumococcal conjugate vaccine (PCV) effectiveness and the design of future PCVs. The availability of large genome datasets makes it possible to assess population-level variation among pneumococcal serotypes and in this study 5405 pneumococcal genomes were used to investigate cps locus diversity among 49 different serotypes. Pneumococci had been recovered between 1916 and 2014 from people of all ages living in 51 countries. Serotypes were deduced bioinformatically, cps locus sequences were extracted and variation was assessed within the cps locus, in the context of pneumococcal genetic lineages. Overall, cps locus sequence diversity varied markedly: low to moderate diversity was revealed among serogroups/types 1, 3, 7, 9, 11 and 22; whereas serogroups/types 6, 19, 23, 14, 15, 18, 33 and 35 displayed high diversity. Putative novel and/or hybrid cps loci were identified among all serogroups/types apart from 1, 3 and 9. This study demonstrated that cps locus sequence diversity varied widely between serogroups/types. Investigation of the biochemical structure of the polysaccharide capsule of major variants, particularly PCV-related serotypes and those that appear to be novel or hybrids, is warranted.
Keywords: molecular epidemiology; pneumococcal capsular locus; sequence-based serotyping; vaccine impact.