Fallacy of the unique genome: Sequence diversity within single Helicobacter pylori strains

Jenny L. Draper, Lori M. Hansen, David L. Bernick, Samar Abedrabbo, Jason G. Underwood, Nguyet Kong, Bihua C. Huang, Allison M. Weis, Bart C Weimer, Arnoud H M Van Vliet, Nader Pourmand, Jay V Solnick, Kevin Karplus, Karen M. Ottemann

Research output: Contribution to journalArticle

24 Citations (Scopus)

Abstract

Many bacterial genomes are highly variable but nonetheless are typically published as a single assembled genome. Experiments tracking bacterial genome evolution have not looked at the variation present at a given point in time. Here, we analyzed the mouse-passaged Helicobacter pylori strain SS1 and its parent PMSS1 to assess intra-and intergenomic variability. Using high sequence coverage depth and experimental validation, we detected extensive genome plasticity within these H. pylori isolates, including movement of the transposable element IS607, large and small inversions, multiple single nucleotide polymorphisms, and variation in cagA copy number. The cagA gene was found as 1 to 4 tandem copies located off the cag island in both SS1 and PMSS1; this copy number variation correlated with protein expression. To gain insight into the changes that occurred during mouse adaptation, we also compared SS1 and PMSS1 and observed 46 differences that were distinct from the within-genome variation. The most substantial was an insertion in cagY, which encodes a protein required for a type IV secretion system function. We detected modifications in genes coding for two proteins known to affect mouse colonization, the HpaA neuraminyllactose-binding protein and the FutB a-1,3 lipopoly-saccharide (LPS) fucosyltransferase, as well as genes predicted to modulate diverse properties. In sum, our work suggests that data from consensus genome assemblies from single colonies may be misleading by failing to represent the variability present. Furthermore, we show that high-depth genomic sequencing data of a population can be analyzed to gain insight into the normal variation within bacterial strains. IMPORTANCE Although it is well known that many bacterial genomes are highly variable, it is nonetheless traditional to refer to, analyze, and publish "the genome" of a bacterial strain. Variability is usually reduced ("only sequence from a single colony"), ignored ("just publish the consensus"), or placed in the "too-hard" basket ("analysis of raw read data is more robust"). Now that whole-genome sequences are regularly used to assess virulence and track outbreaks, a better understanding of the baseline genomic variation present within single strains is needed. Here, we describe the variability seen in typical working stocks and colonies of pathogen Helicobacter pylori model strains SS1 and PMSS1 as revealed by use of high-coverage mate pair next-generation sequencing (NGS) and confirmed by traditional laboratory techniques. This work demonstrates that reliance on a consensus assembly as “the genome” of a bacterial strain may be misleading.

Original languageEnglish (US)
Article numbere02321-16
JournalmBio
Volume8
Issue number1
DOIs
StatePublished - Jan 1 2017

Fingerprint

Bacterial Genomes
Helicobacter pylori
Genome
Fucosyltransferases
Genes
Proteins
DNA Transposable Elements
Islands
Single Nucleotide Polymorphism
Disease Outbreaks
Virulence
Carrier Proteins
Population

ASJC Scopus subject areas

  • Microbiology
  • Virology

Cite this

Draper, J. L., Hansen, L. M., Bernick, D. L., Abedrabbo, S., Underwood, J. G., Kong, N., ... Ottemann, K. M. (2017). Fallacy of the unique genome: Sequence diversity within single Helicobacter pylori strains. mBio, 8(1), [e02321-16]. https://doi.org/10.1128/mBio.02321-16

Fallacy of the unique genome : Sequence diversity within single Helicobacter pylori strains. / Draper, Jenny L.; Hansen, Lori M.; Bernick, David L.; Abedrabbo, Samar; Underwood, Jason G.; Kong, Nguyet; Huang, Bihua C.; Weis, Allison M.; Weimer, Bart C; Van Vliet, Arnoud H M; Pourmand, Nader; Solnick, Jay V; Karplus, Kevin; Ottemann, Karen M.

In: mBio, Vol. 8, No. 1, e02321-16, 01.01.2017.

Research output: Contribution to journalArticle

Draper, JL, Hansen, LM, Bernick, DL, Abedrabbo, S, Underwood, JG, Kong, N, Huang, BC, Weis, AM, Weimer, BC, Van Vliet, AHM, Pourmand, N, Solnick, JV, Karplus, K & Ottemann, KM 2017, 'Fallacy of the unique genome: Sequence diversity within single Helicobacter pylori strains', mBio, vol. 8, no. 1, e02321-16. https://doi.org/10.1128/mBio.02321-16
Draper JL, Hansen LM, Bernick DL, Abedrabbo S, Underwood JG, Kong N et al. Fallacy of the unique genome: Sequence diversity within single Helicobacter pylori strains. mBio. 2017 Jan 1;8(1). e02321-16. https://doi.org/10.1128/mBio.02321-16
Draper, Jenny L. ; Hansen, Lori M. ; Bernick, David L. ; Abedrabbo, Samar ; Underwood, Jason G. ; Kong, Nguyet ; Huang, Bihua C. ; Weis, Allison M. ; Weimer, Bart C ; Van Vliet, Arnoud H M ; Pourmand, Nader ; Solnick, Jay V ; Karplus, Kevin ; Ottemann, Karen M. / Fallacy of the unique genome : Sequence diversity within single Helicobacter pylori strains. In: mBio. 2017 ; Vol. 8, No. 1.
@article{0ff2d1c8153e40269276c0a9662993f6,
title = "Fallacy of the unique genome: Sequence diversity within single Helicobacter pylori strains",
abstract = "Many bacterial genomes are highly variable but nonetheless are typically published as a single assembled genome. Experiments tracking bacterial genome evolution have not looked at the variation present at a given point in time. Here, we analyzed the mouse-passaged Helicobacter pylori strain SS1 and its parent PMSS1 to assess intra-and intergenomic variability. Using high sequence coverage depth and experimental validation, we detected extensive genome plasticity within these H. pylori isolates, including movement of the transposable element IS607, large and small inversions, multiple single nucleotide polymorphisms, and variation in cagA copy number. The cagA gene was found as 1 to 4 tandem copies located off the cag island in both SS1 and PMSS1; this copy number variation correlated with protein expression. To gain insight into the changes that occurred during mouse adaptation, we also compared SS1 and PMSS1 and observed 46 differences that were distinct from the within-genome variation. The most substantial was an insertion in cagY, which encodes a protein required for a type IV secretion system function. We detected modifications in genes coding for two proteins known to affect mouse colonization, the HpaA neuraminyllactose-binding protein and the FutB a-1,3 lipopoly-saccharide (LPS) fucosyltransferase, as well as genes predicted to modulate diverse properties. In sum, our work suggests that data from consensus genome assemblies from single colonies may be misleading by failing to represent the variability present. Furthermore, we show that high-depth genomic sequencing data of a population can be analyzed to gain insight into the normal variation within bacterial strains. IMPORTANCE Although it is well known that many bacterial genomes are highly variable, it is nonetheless traditional to refer to, analyze, and publish {"}the genome{"} of a bacterial strain. Variability is usually reduced ({"}only sequence from a single colony{"}), ignored ({"}just publish the consensus{"}), or placed in the {"}too-hard{"} basket ({"}analysis of raw read data is more robust{"}). Now that whole-genome sequences are regularly used to assess virulence and track outbreaks, a better understanding of the baseline genomic variation present within single strains is needed. Here, we describe the variability seen in typical working stocks and colonies of pathogen Helicobacter pylori model strains SS1 and PMSS1 as revealed by use of high-coverage mate pair next-generation sequencing (NGS) and confirmed by traditional laboratory techniques. This work demonstrates that reliance on a consensus assembly as “the genome” of a bacterial strain may be misleading.",
author = "Draper, {Jenny L.} and Hansen, {Lori M.} and Bernick, {David L.} and Samar Abedrabbo and Underwood, {Jason G.} and Nguyet Kong and Huang, {Bihua C.} and Weis, {Allison M.} and Weimer, {Bart C} and {Van Vliet}, {Arnoud H M} and Nader Pourmand and Solnick, {Jay V} and Kevin Karplus and Ottemann, {Karen M.}",
year = "2017",
month = "1",
day = "1",
doi = "10.1128/mBio.02321-16",
language = "English (US)",
volume = "8",
journal = "mBio",
issn = "2161-2129",
publisher = "American Society for Microbiology",
number = "1",

}

TY - JOUR

T1 - Fallacy of the unique genome

T2 - Sequence diversity within single Helicobacter pylori strains

AU - Draper, Jenny L.

AU - Hansen, Lori M.

AU - Bernick, David L.

AU - Abedrabbo, Samar

AU - Underwood, Jason G.

AU - Kong, Nguyet

AU - Huang, Bihua C.

AU - Weis, Allison M.

AU - Weimer, Bart C

AU - Van Vliet, Arnoud H M

AU - Pourmand, Nader

AU - Solnick, Jay V

AU - Karplus, Kevin

AU - Ottemann, Karen M.

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Many bacterial genomes are highly variable but nonetheless are typically published as a single assembled genome. Experiments tracking bacterial genome evolution have not looked at the variation present at a given point in time. Here, we analyzed the mouse-passaged Helicobacter pylori strain SS1 and its parent PMSS1 to assess intra-and intergenomic variability. Using high sequence coverage depth and experimental validation, we detected extensive genome plasticity within these H. pylori isolates, including movement of the transposable element IS607, large and small inversions, multiple single nucleotide polymorphisms, and variation in cagA copy number. The cagA gene was found as 1 to 4 tandem copies located off the cag island in both SS1 and PMSS1; this copy number variation correlated with protein expression. To gain insight into the changes that occurred during mouse adaptation, we also compared SS1 and PMSS1 and observed 46 differences that were distinct from the within-genome variation. The most substantial was an insertion in cagY, which encodes a protein required for a type IV secretion system function. We detected modifications in genes coding for two proteins known to affect mouse colonization, the HpaA neuraminyllactose-binding protein and the FutB a-1,3 lipopoly-saccharide (LPS) fucosyltransferase, as well as genes predicted to modulate diverse properties. In sum, our work suggests that data from consensus genome assemblies from single colonies may be misleading by failing to represent the variability present. Furthermore, we show that high-depth genomic sequencing data of a population can be analyzed to gain insight into the normal variation within bacterial strains. IMPORTANCE Although it is well known that many bacterial genomes are highly variable, it is nonetheless traditional to refer to, analyze, and publish "the genome" of a bacterial strain. Variability is usually reduced ("only sequence from a single colony"), ignored ("just publish the consensus"), or placed in the "too-hard" basket ("analysis of raw read data is more robust"). Now that whole-genome sequences are regularly used to assess virulence and track outbreaks, a better understanding of the baseline genomic variation present within single strains is needed. Here, we describe the variability seen in typical working stocks and colonies of pathogen Helicobacter pylori model strains SS1 and PMSS1 as revealed by use of high-coverage mate pair next-generation sequencing (NGS) and confirmed by traditional laboratory techniques. This work demonstrates that reliance on a consensus assembly as “the genome” of a bacterial strain may be misleading.

AB - Many bacterial genomes are highly variable but nonetheless are typically published as a single assembled genome. Experiments tracking bacterial genome evolution have not looked at the variation present at a given point in time. Here, we analyzed the mouse-passaged Helicobacter pylori strain SS1 and its parent PMSS1 to assess intra-and intergenomic variability. Using high sequence coverage depth and experimental validation, we detected extensive genome plasticity within these H. pylori isolates, including movement of the transposable element IS607, large and small inversions, multiple single nucleotide polymorphisms, and variation in cagA copy number. The cagA gene was found as 1 to 4 tandem copies located off the cag island in both SS1 and PMSS1; this copy number variation correlated with protein expression. To gain insight into the changes that occurred during mouse adaptation, we also compared SS1 and PMSS1 and observed 46 differences that were distinct from the within-genome variation. The most substantial was an insertion in cagY, which encodes a protein required for a type IV secretion system function. We detected modifications in genes coding for two proteins known to affect mouse colonization, the HpaA neuraminyllactose-binding protein and the FutB a-1,3 lipopoly-saccharide (LPS) fucosyltransferase, as well as genes predicted to modulate diverse properties. In sum, our work suggests that data from consensus genome assemblies from single colonies may be misleading by failing to represent the variability present. Furthermore, we show that high-depth genomic sequencing data of a population can be analyzed to gain insight into the normal variation within bacterial strains. IMPORTANCE Although it is well known that many bacterial genomes are highly variable, it is nonetheless traditional to refer to, analyze, and publish "the genome" of a bacterial strain. Variability is usually reduced ("only sequence from a single colony"), ignored ("just publish the consensus"), or placed in the "too-hard" basket ("analysis of raw read data is more robust"). Now that whole-genome sequences are regularly used to assess virulence and track outbreaks, a better understanding of the baseline genomic variation present within single strains is needed. Here, we describe the variability seen in typical working stocks and colonies of pathogen Helicobacter pylori model strains SS1 and PMSS1 as revealed by use of high-coverage mate pair next-generation sequencing (NGS) and confirmed by traditional laboratory techniques. This work demonstrates that reliance on a consensus assembly as “the genome” of a bacterial strain may be misleading.

UR - http://www.scopus.com/inward/record.url?scp=85014784858&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85014784858&partnerID=8YFLogxK

U2 - 10.1128/mBio.02321-16

DO - 10.1128/mBio.02321-16

M3 - Article

C2 - 28223462

AN - SCOPUS:85014784858

VL - 8

JO - mBio

JF - mBio

SN - 2161-2129

IS - 1

M1 - e02321-16

ER -