H37Ra is a virulence attenuated strain of Mycobacterium tuberculosis widely employed as a model to investigate virulence mechanisms. Comparative high-throughput studies have earlier correlated its avirulence to the presence of specific mutations or absence of certain proteins. However, a recent sequencing study of H37Ra, has disproved several genomic differences earlier reported to be associated with virulence. This warrants further investigations on the H37Ra proteome as well. In this study, we carried out an integrated analysis of the genome, transcriptome, and proteome of H37Ra. In addition to confirming single nucleotide variations (SNVs) and insertion-deletions that were reported earlier, our study provides novel insights into the mutation spectrum in the promoter regions of 7 genes. We also provide transcriptional and proteomic evidence for 3,900 genes representing ~80% of the total predicted gene count including 408 proteins that have not been identified previously. We identified 9 genes whose coding potential was hitherto reported to be absent in H37Ra. These include 2 putative virulence factors belonging to ESAT-6 like family of proteins. Furthermore, proteogenomic analysis enabled us to identify 63 novel proteins coding genes and correct 25 existing gene models in H37Ra genome. A majority of these were found to be conserved in the virulent strain H37Rv as well as in other mycobacterial species suggesting that the differences in the virulent and avirulent strains of M. tuberculosis are not entirely dependent on the expression of certain proteins or their absence but may possibly be ascertained to functional changes.
Keywords: genome annotation; multiomics; next-generation sequencing; orbitrap; virulence attenuation.