Biomolecular network motif counting and discovery by color coding

Noga Alon, Phuong Dao, Iman Hajirasouliha, Fereydoun Hormozdiari, S. Cenk Sahinalp

Research output: Contribution to journalArticle

84 Citations (Scopus)

Abstract

Protein-protein interaction (PPI) networks of many organisms share global topological features such as degree distribution, k-hop reachability, betweenness and closeness. Yet, some of these networks can differ significantly from the others in terms of local structures: e.g. the number of specific network motifs can vary significantly among PPI networks. Counting the number of network motifs provides a major challenge to compare biomolecular networks. Recently developed algorithms have been able to count the number of induced occurrences of subgraphs with k ≤7 vertices. Yet no practical algorithm exists for counting non-induced occurrences, or counting subgraphs with k ≥8 vertices. Counting non-induced occurrences of network motifs is not only challenging but also quite desirable as available PPI networks include several false interactions and miss many others. In this article, we show how to apply the 'color coding' technique for counting non-induced occurrences of subgraph topologies in the form of trees and bounded treewidth subgraphs. Our algorithm can count all occurrences of motif G ′ with k vertices in a network G with n vertices in time polynomial with n, provided k = O(log n). We use our algorithm to obtain 'treelet' distributions for k ≤10 of available PPI networks of unicellular organisms (Saccharomyces cerevisiae Escherichia coli and Helicobacter Pyloris), which are all quite similar, and a multicellular organism (Caenorhabditis elegans) which is significantly different. Furthermore, the treelet distribution of the unicellular organisms are similar to that obtained by the 'duplication model' but are quite different from that of the 'preferential attachment model'. The treelet distribution is robust w.r.t. sparsification with bait/edge coverage of 70% but differences can be observed when bait/edge coverage drops to 50%.

Original languageEnglish (US)
JournalBioinformatics
Volume24
Issue number13
DOIs
StatePublished - Jan 1 2008
Externally publishedYes

Fingerprint

Protein Interaction Maps
Counting
Color
Protein Interaction Networks
Coding
Protein-protein Interaction
Proteins
Subgraph
Helicobacter
Humulus
Caenorhabditis elegans
Count
Coverage
Saccharomyces cerevisiae
Preferential Attachment
Bounded Treewidth
Betweenness
Local Structure
Saccharomyces Cerevisiae
Degree Distribution

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Biomolecular network motif counting and discovery by color coding. / Alon, Noga; Dao, Phuong; Hajirasouliha, Iman; Hormozdiari, Fereydoun; Sahinalp, S. Cenk.

In: Bioinformatics, Vol. 24, No. 13, 01.01.2008.

Research output: Contribution to journalArticle

Alon, Noga ; Dao, Phuong ; Hajirasouliha, Iman ; Hormozdiari, Fereydoun ; Sahinalp, S. Cenk. / Biomolecular network motif counting and discovery by color coding. In: Bioinformatics. 2008 ; Vol. 24, No. 13.
@article{23b8f33891744215b0388b48f5594641,
title = "Biomolecular network motif counting and discovery by color coding",
abstract = "Protein-protein interaction (PPI) networks of many organisms share global topological features such as degree distribution, k-hop reachability, betweenness and closeness. Yet, some of these networks can differ significantly from the others in terms of local structures: e.g. the number of specific network motifs can vary significantly among PPI networks. Counting the number of network motifs provides a major challenge to compare biomolecular networks. Recently developed algorithms have been able to count the number of induced occurrences of subgraphs with k ≤7 vertices. Yet no practical algorithm exists for counting non-induced occurrences, or counting subgraphs with k ≥8 vertices. Counting non-induced occurrences of network motifs is not only challenging but also quite desirable as available PPI networks include several false interactions and miss many others. In this article, we show how to apply the 'color coding' technique for counting non-induced occurrences of subgraph topologies in the form of trees and bounded treewidth subgraphs. Our algorithm can count all occurrences of motif G ′ with k vertices in a network G with n vertices in time polynomial with n, provided k = O(log n). We use our algorithm to obtain 'treelet' distributions for k ≤10 of available PPI networks of unicellular organisms (Saccharomyces cerevisiae Escherichia coli and Helicobacter Pyloris), which are all quite similar, and a multicellular organism (Caenorhabditis elegans) which is significantly different. Furthermore, the treelet distribution of the unicellular organisms are similar to that obtained by the 'duplication model' but are quite different from that of the 'preferential attachment model'. The treelet distribution is robust w.r.t. sparsification with bait/edge coverage of 70{\%} but differences can be observed when bait/edge coverage drops to 50{\%}.",
author = "Noga Alon and Phuong Dao and Iman Hajirasouliha and Fereydoun Hormozdiari and Sahinalp, {S. Cenk}",
year = "2008",
month = "1",
day = "1",
doi = "10.1093/bioinformatics/btn163",
language = "English (US)",
volume = "24",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "13",

}

TY - JOUR

T1 - Biomolecular network motif counting and discovery by color coding

AU - Alon, Noga

AU - Dao, Phuong

AU - Hajirasouliha, Iman

AU - Hormozdiari, Fereydoun

AU - Sahinalp, S. Cenk

PY - 2008/1/1

Y1 - 2008/1/1

N2 - Protein-protein interaction (PPI) networks of many organisms share global topological features such as degree distribution, k-hop reachability, betweenness and closeness. Yet, some of these networks can differ significantly from the others in terms of local structures: e.g. the number of specific network motifs can vary significantly among PPI networks. Counting the number of network motifs provides a major challenge to compare biomolecular networks. Recently developed algorithms have been able to count the number of induced occurrences of subgraphs with k ≤7 vertices. Yet no practical algorithm exists for counting non-induced occurrences, or counting subgraphs with k ≥8 vertices. Counting non-induced occurrences of network motifs is not only challenging but also quite desirable as available PPI networks include several false interactions and miss many others. In this article, we show how to apply the 'color coding' technique for counting non-induced occurrences of subgraph topologies in the form of trees and bounded treewidth subgraphs. Our algorithm can count all occurrences of motif G ′ with k vertices in a network G with n vertices in time polynomial with n, provided k = O(log n). We use our algorithm to obtain 'treelet' distributions for k ≤10 of available PPI networks of unicellular organisms (Saccharomyces cerevisiae Escherichia coli and Helicobacter Pyloris), which are all quite similar, and a multicellular organism (Caenorhabditis elegans) which is significantly different. Furthermore, the treelet distribution of the unicellular organisms are similar to that obtained by the 'duplication model' but are quite different from that of the 'preferential attachment model'. The treelet distribution is robust w.r.t. sparsification with bait/edge coverage of 70% but differences can be observed when bait/edge coverage drops to 50%.

AB - Protein-protein interaction (PPI) networks of many organisms share global topological features such as degree distribution, k-hop reachability, betweenness and closeness. Yet, some of these networks can differ significantly from the others in terms of local structures: e.g. the number of specific network motifs can vary significantly among PPI networks. Counting the number of network motifs provides a major challenge to compare biomolecular networks. Recently developed algorithms have been able to count the number of induced occurrences of subgraphs with k ≤7 vertices. Yet no practical algorithm exists for counting non-induced occurrences, or counting subgraphs with k ≥8 vertices. Counting non-induced occurrences of network motifs is not only challenging but also quite desirable as available PPI networks include several false interactions and miss many others. In this article, we show how to apply the 'color coding' technique for counting non-induced occurrences of subgraph topologies in the form of trees and bounded treewidth subgraphs. Our algorithm can count all occurrences of motif G ′ with k vertices in a network G with n vertices in time polynomial with n, provided k = O(log n). We use our algorithm to obtain 'treelet' distributions for k ≤10 of available PPI networks of unicellular organisms (Saccharomyces cerevisiae Escherichia coli and Helicobacter Pyloris), which are all quite similar, and a multicellular organism (Caenorhabditis elegans) which is significantly different. Furthermore, the treelet distribution of the unicellular organisms are similar to that obtained by the 'duplication model' but are quite different from that of the 'preferential attachment model'. The treelet distribution is robust w.r.t. sparsification with bait/edge coverage of 70% but differences can be observed when bait/edge coverage drops to 50%.

UR - http://www.scopus.com/inward/record.url?scp=84975860562&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84975860562&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btn163

DO - 10.1093/bioinformatics/btn163

M3 - Article

C2 - 18586721

AN - SCOPUS:84975860562

VL - 24

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 13

ER -