Functional Biogeography of Ocean Microbes Revealed through Non-Negative Matrix Factorization

Xingpeng Jiang, Morgan G I Langille, Russell Y. Neches, Marie Elliot, Simon A. Levin, Jonathan A Eisen, Joshua S. Weitz, Jonathan Dushoff

Research output: Contribution to journalArticle

31 Citations (Scopus)

Abstract

The direct "metagenomic" sequencing of genomic material from complex assemblages of bacteria, archaea, viruses and microeukaryotes has yielded new insights into the structure of microbial communities. For example, analysis of metagenomic data has revealed the existence of previously unknown microbial taxa whose spatial distributions are limited by environmental conditions, ecological competition, and dispersal mechanisms. However, differences in genotypes that might lead biologists to designate two microbes as taxonomically distinct need not necessarily imply differences in ecological function. Hence, there is a growing need for large-scale analysis of the distribution of microbial function across habitats. Here, we present a framework for investigating the biogeography of microbial function by analyzing the distribution of protein families inferred from environmental sequence data across a global collection of sites. We map over 6,000,000 protein sequences from unassembled reads from the Global Ocean Survey dataset to 8214 protein families, generating a protein family relative abundance matrix that describes the distribution of each protein family across sites. We then use non-negative matrix factorization (NMF) to approximate these protein family profiles as linear combinations of a small number of ecological components. Each component has a characteristic functional profile and site profile. Our approach identifies common functional signatures within several of the components. We use our method as a filter to estimate functional distance between sites, and find that an NMF-filtered measure of functional distance is more strongly correlated with environmental distance than a comparable PCA-filtered measure. We also find that functional distance is more strongly correlated with environmental distance than with geographic distance, in agreement with prior studies. We identify similar protein functions in several components and suggest that functional co-occurrence across metagenomic samples could lead to future methods for de-novo functional prediction. We conclude by discussing how NMF, and other dimension reduction methods, can help enable a macroscopic functional description of marine ecosystems.

Original languageEnglish (US)
Article numbere43866
JournalPLoS One
Volume7
Issue number9
DOIs
StatePublished - Sep 18 2012

Fingerprint

Factorization
Oceans and Seas
biogeography
oceans
microorganisms
Metagenomics
Proteins
proteins
ecological competition
Ecosystem
ecological function
Passive Cutaneous Anaphylaxis
Aquatic ecosystems
Archaea
functional properties
biologists
microbial communities
Viruses
Spatial distribution
amino acid sequences

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Medicine(all)

Cite this

Jiang, X., Langille, M. G. I., Neches, R. Y., Elliot, M., Levin, S. A., Eisen, J. A., ... Dushoff, J. (2012). Functional Biogeography of Ocean Microbes Revealed through Non-Negative Matrix Factorization. PLoS One, 7(9), [e43866]. https://doi.org/10.1371/journal.pone.0043866

Functional Biogeography of Ocean Microbes Revealed through Non-Negative Matrix Factorization. / Jiang, Xingpeng; Langille, Morgan G I; Neches, Russell Y.; Elliot, Marie; Levin, Simon A.; Eisen, Jonathan A; Weitz, Joshua S.; Dushoff, Jonathan.

In: PLoS One, Vol. 7, No. 9, e43866, 18.09.2012.

Research output: Contribution to journalArticle

Jiang, X, Langille, MGI, Neches, RY, Elliot, M, Levin, SA, Eisen, JA, Weitz, JS & Dushoff, J 2012, 'Functional Biogeography of Ocean Microbes Revealed through Non-Negative Matrix Factorization', PLoS One, vol. 7, no. 9, e43866. https://doi.org/10.1371/journal.pone.0043866
Jiang, Xingpeng ; Langille, Morgan G I ; Neches, Russell Y. ; Elliot, Marie ; Levin, Simon A. ; Eisen, Jonathan A ; Weitz, Joshua S. ; Dushoff, Jonathan. / Functional Biogeography of Ocean Microbes Revealed through Non-Negative Matrix Factorization. In: PLoS One. 2012 ; Vol. 7, No. 9.
@article{c5ac1662779f48ccb7774d06c4e15e02,
title = "Functional Biogeography of Ocean Microbes Revealed through Non-Negative Matrix Factorization",
abstract = "The direct {"}metagenomic{"} sequencing of genomic material from complex assemblages of bacteria, archaea, viruses and microeukaryotes has yielded new insights into the structure of microbial communities. For example, analysis of metagenomic data has revealed the existence of previously unknown microbial taxa whose spatial distributions are limited by environmental conditions, ecological competition, and dispersal mechanisms. However, differences in genotypes that might lead biologists to designate two microbes as taxonomically distinct need not necessarily imply differences in ecological function. Hence, there is a growing need for large-scale analysis of the distribution of microbial function across habitats. Here, we present a framework for investigating the biogeography of microbial function by analyzing the distribution of protein families inferred from environmental sequence data across a global collection of sites. We map over 6,000,000 protein sequences from unassembled reads from the Global Ocean Survey dataset to 8214 protein families, generating a protein family relative abundance matrix that describes the distribution of each protein family across sites. We then use non-negative matrix factorization (NMF) to approximate these protein family profiles as linear combinations of a small number of ecological components. Each component has a characteristic functional profile and site profile. Our approach identifies common functional signatures within several of the components. We use our method as a filter to estimate functional distance between sites, and find that an NMF-filtered measure of functional distance is more strongly correlated with environmental distance than a comparable PCA-filtered measure. We also find that functional distance is more strongly correlated with environmental distance than with geographic distance, in agreement with prior studies. We identify similar protein functions in several components and suggest that functional co-occurrence across metagenomic samples could lead to future methods for de-novo functional prediction. We conclude by discussing how NMF, and other dimension reduction methods, can help enable a macroscopic functional description of marine ecosystems.",
author = "Xingpeng Jiang and Langille, {Morgan G I} and Neches, {Russell Y.} and Marie Elliot and Levin, {Simon A.} and Eisen, {Jonathan A} and Weitz, {Joshua S.} and Jonathan Dushoff",
year = "2012",
month = "9",
day = "18",
doi = "10.1371/journal.pone.0043866",
language = "English (US)",
volume = "7",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "9",

}

TY - JOUR

T1 - Functional Biogeography of Ocean Microbes Revealed through Non-Negative Matrix Factorization

AU - Jiang, Xingpeng

AU - Langille, Morgan G I

AU - Neches, Russell Y.

AU - Elliot, Marie

AU - Levin, Simon A.

AU - Eisen, Jonathan A

AU - Weitz, Joshua S.

AU - Dushoff, Jonathan

PY - 2012/9/18

Y1 - 2012/9/18

N2 - The direct "metagenomic" sequencing of genomic material from complex assemblages of bacteria, archaea, viruses and microeukaryotes has yielded new insights into the structure of microbial communities. For example, analysis of metagenomic data has revealed the existence of previously unknown microbial taxa whose spatial distributions are limited by environmental conditions, ecological competition, and dispersal mechanisms. However, differences in genotypes that might lead biologists to designate two microbes as taxonomically distinct need not necessarily imply differences in ecological function. Hence, there is a growing need for large-scale analysis of the distribution of microbial function across habitats. Here, we present a framework for investigating the biogeography of microbial function by analyzing the distribution of protein families inferred from environmental sequence data across a global collection of sites. We map over 6,000,000 protein sequences from unassembled reads from the Global Ocean Survey dataset to 8214 protein families, generating a protein family relative abundance matrix that describes the distribution of each protein family across sites. We then use non-negative matrix factorization (NMF) to approximate these protein family profiles as linear combinations of a small number of ecological components. Each component has a characteristic functional profile and site profile. Our approach identifies common functional signatures within several of the components. We use our method as a filter to estimate functional distance between sites, and find that an NMF-filtered measure of functional distance is more strongly correlated with environmental distance than a comparable PCA-filtered measure. We also find that functional distance is more strongly correlated with environmental distance than with geographic distance, in agreement with prior studies. We identify similar protein functions in several components and suggest that functional co-occurrence across metagenomic samples could lead to future methods for de-novo functional prediction. We conclude by discussing how NMF, and other dimension reduction methods, can help enable a macroscopic functional description of marine ecosystems.

AB - The direct "metagenomic" sequencing of genomic material from complex assemblages of bacteria, archaea, viruses and microeukaryotes has yielded new insights into the structure of microbial communities. For example, analysis of metagenomic data has revealed the existence of previously unknown microbial taxa whose spatial distributions are limited by environmental conditions, ecological competition, and dispersal mechanisms. However, differences in genotypes that might lead biologists to designate two microbes as taxonomically distinct need not necessarily imply differences in ecological function. Hence, there is a growing need for large-scale analysis of the distribution of microbial function across habitats. Here, we present a framework for investigating the biogeography of microbial function by analyzing the distribution of protein families inferred from environmental sequence data across a global collection of sites. We map over 6,000,000 protein sequences from unassembled reads from the Global Ocean Survey dataset to 8214 protein families, generating a protein family relative abundance matrix that describes the distribution of each protein family across sites. We then use non-negative matrix factorization (NMF) to approximate these protein family profiles as linear combinations of a small number of ecological components. Each component has a characteristic functional profile and site profile. Our approach identifies common functional signatures within several of the components. We use our method as a filter to estimate functional distance between sites, and find that an NMF-filtered measure of functional distance is more strongly correlated with environmental distance than a comparable PCA-filtered measure. We also find that functional distance is more strongly correlated with environmental distance than with geographic distance, in agreement with prior studies. We identify similar protein functions in several components and suggest that functional co-occurrence across metagenomic samples could lead to future methods for de-novo functional prediction. We conclude by discussing how NMF, and other dimension reduction methods, can help enable a macroscopic functional description of marine ecosystems.

UR - http://www.scopus.com/inward/record.url?scp=84866495952&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84866495952&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0043866

DO - 10.1371/journal.pone.0043866

M3 - Article

C2 - 23049741

AN - SCOPUS:84866495952

VL - 7

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 9

M1 - e43866

ER -