G-NEST: A gene neighborhood scoring tool to identify co-conserved, co-expressed genes

Danielle G. Lemay, William F. Martin, Angie S. Hinrichs, Monique Rijnkels, J. Bruce German, Ian F Korf, Katherine S. Pollard

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Background: In previous studies, gene neighborhoods-spatial clusters of co-expressed genes in the genome-have been defined using arbitrary rules such as requiring adjacency, a minimum number of genes, a fixed window size, or a minimum expression level. In the current study, we developed a Gene Neighborhood Scoring Tool (G-NEST) which combines genomic location, gene expression, and evolutionary sequence conservation data to score putative gene neighborhoods across all possible window sizes simultaneously.Results: Using G-NEST on atlases of mouse and human tissue expression data, we found that large neighborhoods of ten or more genes are extremely rare in mammalian genomes. When they do occur, neighborhoods are typically composed of families of related genes. Both the highest scoring and the largest neighborhoods in mammalian genomes are formed by tandem gene duplication. Mammalian gene neighborhoods contain highly and variably expressed genes. Co-localized noisy gene pairs exhibit lower evolutionary conservation of their adjacent genome locations, suggesting that their shared transcriptional background may be disadvantageous. Genes that are essential to mammalian survival and reproduction are less likely to occur in neighborhoods, although neighborhoods are enriched with genes that function in mitosis. We also found that gene orientation and protein-protein interactions are partially responsible for maintenance of gene neighborhoods.Conclusions: Our experiments using G-NEST confirm that tandem gene duplication is the primary driver of non-random gene order in mammalian genomes. Non-essentiality, co-functionality, gene orientation, and protein-protein interactions are additional forces that maintain gene neighborhoods, especially those formed by tandem duplicates. We expect G-NEST to be useful for other applications such as the identification of core regulatory modules, common transcriptional backgrounds, and chromatin domains. The software is available at http://docpollard.org/software.html.

Original languageEnglish (US)
Article number253
JournalBMC Bioinformatics
Volume13
Issue number1
DOIs
StatePublished - Sep 28 2012

Fingerprint

Scoring
Genes
Gene
Genome
Gene Duplication
Proteins
Software
Protein-protein Interaction
Duplication
Conservation
Gene Order
Atlases
Essential Genes
Mitosis
Chromatin
Reproduction
Atlas
Adjacency

Keywords

  • Bioinformatics
  • Cluster analysis
  • Computational biology
  • Evolution
  • Gene cluster
  • Gene duplication
  • Gene expression
  • Gene neighborhood
  • Genomics
  • Transcription

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics
  • Structural Biology

Cite this

Lemay, D. G., Martin, W. F., Hinrichs, A. S., Rijnkels, M., German, J. B., Korf, I. F., & Pollard, K. S. (2012). G-NEST: A gene neighborhood scoring tool to identify co-conserved, co-expressed genes. BMC Bioinformatics, 13(1), [253]. https://doi.org/10.1186/1471-2105-13-253

G-NEST : A gene neighborhood scoring tool to identify co-conserved, co-expressed genes. / Lemay, Danielle G.; Martin, William F.; Hinrichs, Angie S.; Rijnkels, Monique; German, J. Bruce; Korf, Ian F; Pollard, Katherine S.

In: BMC Bioinformatics, Vol. 13, No. 1, 253, 28.09.2012.

Research output: Contribution to journalArticle

Lemay, DG, Martin, WF, Hinrichs, AS, Rijnkels, M, German, JB, Korf, IF & Pollard, KS 2012, 'G-NEST: A gene neighborhood scoring tool to identify co-conserved, co-expressed genes', BMC Bioinformatics, vol. 13, no. 1, 253. https://doi.org/10.1186/1471-2105-13-253
Lemay, Danielle G. ; Martin, William F. ; Hinrichs, Angie S. ; Rijnkels, Monique ; German, J. Bruce ; Korf, Ian F ; Pollard, Katherine S. / G-NEST : A gene neighborhood scoring tool to identify co-conserved, co-expressed genes. In: BMC Bioinformatics. 2012 ; Vol. 13, No. 1.
@article{747043206be049278f93e8403c83b890,
title = "G-NEST: A gene neighborhood scoring tool to identify co-conserved, co-expressed genes",
abstract = "Background: In previous studies, gene neighborhoods-spatial clusters of co-expressed genes in the genome-have been defined using arbitrary rules such as requiring adjacency, a minimum number of genes, a fixed window size, or a minimum expression level. In the current study, we developed a Gene Neighborhood Scoring Tool (G-NEST) which combines genomic location, gene expression, and evolutionary sequence conservation data to score putative gene neighborhoods across all possible window sizes simultaneously.Results: Using G-NEST on atlases of mouse and human tissue expression data, we found that large neighborhoods of ten or more genes are extremely rare in mammalian genomes. When they do occur, neighborhoods are typically composed of families of related genes. Both the highest scoring and the largest neighborhoods in mammalian genomes are formed by tandem gene duplication. Mammalian gene neighborhoods contain highly and variably expressed genes. Co-localized noisy gene pairs exhibit lower evolutionary conservation of their adjacent genome locations, suggesting that their shared transcriptional background may be disadvantageous. Genes that are essential to mammalian survival and reproduction are less likely to occur in neighborhoods, although neighborhoods are enriched with genes that function in mitosis. We also found that gene orientation and protein-protein interactions are partially responsible for maintenance of gene neighborhoods.Conclusions: Our experiments using G-NEST confirm that tandem gene duplication is the primary driver of non-random gene order in mammalian genomes. Non-essentiality, co-functionality, gene orientation, and protein-protein interactions are additional forces that maintain gene neighborhoods, especially those formed by tandem duplicates. We expect G-NEST to be useful for other applications such as the identification of core regulatory modules, common transcriptional backgrounds, and chromatin domains. The software is available at http://docpollard.org/software.html.",
keywords = "Bioinformatics, Cluster analysis, Computational biology, Evolution, Gene cluster, Gene duplication, Gene expression, Gene neighborhood, Genomics, Transcription",
author = "Lemay, {Danielle G.} and Martin, {William F.} and Hinrichs, {Angie S.} and Monique Rijnkels and German, {J. Bruce} and Korf, {Ian F} and Pollard, {Katherine S.}",
year = "2012",
month = "9",
day = "28",
doi = "10.1186/1471-2105-13-253",
language = "English (US)",
volume = "13",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - G-NEST

T2 - A gene neighborhood scoring tool to identify co-conserved, co-expressed genes

AU - Lemay, Danielle G.

AU - Martin, William F.

AU - Hinrichs, Angie S.

AU - Rijnkels, Monique

AU - German, J. Bruce

AU - Korf, Ian F

AU - Pollard, Katherine S.

PY - 2012/9/28

Y1 - 2012/9/28

N2 - Background: In previous studies, gene neighborhoods-spatial clusters of co-expressed genes in the genome-have been defined using arbitrary rules such as requiring adjacency, a minimum number of genes, a fixed window size, or a minimum expression level. In the current study, we developed a Gene Neighborhood Scoring Tool (G-NEST) which combines genomic location, gene expression, and evolutionary sequence conservation data to score putative gene neighborhoods across all possible window sizes simultaneously.Results: Using G-NEST on atlases of mouse and human tissue expression data, we found that large neighborhoods of ten or more genes are extremely rare in mammalian genomes. When they do occur, neighborhoods are typically composed of families of related genes. Both the highest scoring and the largest neighborhoods in mammalian genomes are formed by tandem gene duplication. Mammalian gene neighborhoods contain highly and variably expressed genes. Co-localized noisy gene pairs exhibit lower evolutionary conservation of their adjacent genome locations, suggesting that their shared transcriptional background may be disadvantageous. Genes that are essential to mammalian survival and reproduction are less likely to occur in neighborhoods, although neighborhoods are enriched with genes that function in mitosis. We also found that gene orientation and protein-protein interactions are partially responsible for maintenance of gene neighborhoods.Conclusions: Our experiments using G-NEST confirm that tandem gene duplication is the primary driver of non-random gene order in mammalian genomes. Non-essentiality, co-functionality, gene orientation, and protein-protein interactions are additional forces that maintain gene neighborhoods, especially those formed by tandem duplicates. We expect G-NEST to be useful for other applications such as the identification of core regulatory modules, common transcriptional backgrounds, and chromatin domains. The software is available at http://docpollard.org/software.html.

AB - Background: In previous studies, gene neighborhoods-spatial clusters of co-expressed genes in the genome-have been defined using arbitrary rules such as requiring adjacency, a minimum number of genes, a fixed window size, or a minimum expression level. In the current study, we developed a Gene Neighborhood Scoring Tool (G-NEST) which combines genomic location, gene expression, and evolutionary sequence conservation data to score putative gene neighborhoods across all possible window sizes simultaneously.Results: Using G-NEST on atlases of mouse and human tissue expression data, we found that large neighborhoods of ten or more genes are extremely rare in mammalian genomes. When they do occur, neighborhoods are typically composed of families of related genes. Both the highest scoring and the largest neighborhoods in mammalian genomes are formed by tandem gene duplication. Mammalian gene neighborhoods contain highly and variably expressed genes. Co-localized noisy gene pairs exhibit lower evolutionary conservation of their adjacent genome locations, suggesting that their shared transcriptional background may be disadvantageous. Genes that are essential to mammalian survival and reproduction are less likely to occur in neighborhoods, although neighborhoods are enriched with genes that function in mitosis. We also found that gene orientation and protein-protein interactions are partially responsible for maintenance of gene neighborhoods.Conclusions: Our experiments using G-NEST confirm that tandem gene duplication is the primary driver of non-random gene order in mammalian genomes. Non-essentiality, co-functionality, gene orientation, and protein-protein interactions are additional forces that maintain gene neighborhoods, especially those formed by tandem duplicates. We expect G-NEST to be useful for other applications such as the identification of core regulatory modules, common transcriptional backgrounds, and chromatin domains. The software is available at http://docpollard.org/software.html.

KW - Bioinformatics

KW - Cluster analysis

KW - Computational biology

KW - Evolution

KW - Gene cluster

KW - Gene duplication

KW - Gene expression

KW - Gene neighborhood

KW - Genomics

KW - Transcription

UR - http://www.scopus.com/inward/record.url?scp=84866652688&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84866652688&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-13-253

DO - 10.1186/1471-2105-13-253

M3 - Article

C2 - 23020263

AN - SCOPUS:84866652688

VL - 13

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - 1

M1 - 253

ER -