Many bacterial genomes are highly variable but nonetheless are typically published as a single assembled genome. Experiments tracking bacterial genome evolution have not looked at the variation present at a given point in time. Here, we analyzed the mouse-passaged Helicobacter pylori strain SS1 and its parent PMSS1 to assess intra-and intergenomic variability. Using high sequence coverage depth and experimental validation, we detected extensive genome plasticity within these H. pylori isolates, including movement of the transposable element IS607, large and small inversions, multiple single nucleotide polymorphisms, and variation in cagA copy number. The cagA gene was found as 1 to 4 tandem copies located off the cag island in both SS1 and PMSS1; this copy number variation correlated with protein expression. To gain insight into the changes that occurred during mouse adaptation, we also compared SS1 and PMSS1 and observed 46 differences that were distinct from the within-genome variation. The most substantial was an insertion in cagY, which encodes a protein required for a type IV secretion system function. We detected modifications in genes coding for two proteins known to affect mouse colonization, the HpaA neuraminyllactose-binding protein and the FutB a-1,3 lipopoly-saccharide (LPS) fucosyltransferase, as well as genes predicted to modulate diverse properties. In sum, our work suggests that data from consensus genome assemblies from single colonies may be misleading by failing to represent the variability present. Furthermore, we show that high-depth genomic sequencing data of a population can be analyzed to gain insight into the normal variation within bacterial strains. IMPORTANCE Although it is well known that many bacterial genomes are highly variable, it is nonetheless traditional to refer to, analyze, and publish "the genome" of a bacterial strain. Variability is usually reduced ("only sequence from a single colony"), ignored ("just publish the consensus"), or placed in the "too-hard" basket ("analysis of raw read data is more robust"). Now that whole-genome sequences are regularly used to assess virulence and track outbreaks, a better understanding of the baseline genomic variation present within single strains is needed. Here, we describe the variability seen in typical working stocks and colonies of pathogen Helicobacter pylori model strains SS1 and PMSS1 as revealed by use of high-coverage mate pair next-generation sequencing (NGS) and confirmed by traditional laboratory techniques. This work demonstrates that reliance on a consensus assembly as “the genome” of a bacterial strain may be misleading.
ASJC Scopus subject areas