Discovering transcription factor regulatory targets using gene expression and binding data

Mark Maienschein-Cline, Jie Zhou, Kevin P. White, Roger Sciammas, Aaron R. Dinner

Research output: Contribution to journalArticle

24 Citations (Scopus)

Abstract

Motivation: Identifying the target genes regulated by transcription factors (TFs) is the most basic step in understanding gene regulation. Recent advances in high-throughput sequencing technology, together with chromatin immunoprecipitation (ChIP), enable mapping TF binding sites genome wide, but it is not possible to infer function from binding alone. This is especially true in mammalian systems, where regulation often occurs through long-range enhancers in gene-rich neighborhoods, rather than proximal promoters, preventing straightforward assignment of a binding site to a target gene. Results: We present EMBER (Expectation Maximization of Binding and Expression pRofiles), a method that integrates high-throughput binding data (e.g. ChIP-chip or ChIP-seq) with gene expression data (e.g. DNA microarray) via an unsupervised machine learning algorithm for inferring the gene targets of sets of TF binding sites. Genes selected are those that match overrepresented expression patterns, which can be used to provide information about multiple TF regulatory modes. We apply the method to genome-wide human breast cancer data and demonstrate that EMBER confirms a role for the TFs estrogen receptor alpha, retinoic acid receptors alpha and gamma in breast cancer development, whereas the conventional approach of assigning regulatory targets based on proximity does not. Additionally, we compare several predicted target genes from EMBER to interactions inferred previously, examine combinatorial effects of TFs on gene regulation and illustrate the ability of EMBER to discover multiple modes of regulation.

Original languageEnglish (US)
Article numberbtr628
Pages (from-to)206-213
Number of pages8
JournalBioinformatics
Volume28
Issue number2
DOIs
StatePublished - Jan 2012
Externally publishedYes

Fingerprint

Transcription factors
Transcription Factor
Gene expression
Gene Expression
Transcription Factors
Genes
Target
Expectation Maximization
Gene
Chromatin Immunoprecipitation
Binding sites
Chromatin
Binding Sites
Gene Regulation
Breast Cancer
High Throughput
Throughput
Genome
Breast Neoplasms
Retinoic Acid Receptors

ASJC Scopus subject areas

  • Statistics and Probability
  • Medicine(all)
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Discovering transcription factor regulatory targets using gene expression and binding data. / Maienschein-Cline, Mark; Zhou, Jie; White, Kevin P.; Sciammas, Roger; Dinner, Aaron R.

In: Bioinformatics, Vol. 28, No. 2, btr628, 01.2012, p. 206-213.

Research output: Contribution to journalArticle

Maienschein-Cline, Mark ; Zhou, Jie ; White, Kevin P. ; Sciammas, Roger ; Dinner, Aaron R. / Discovering transcription factor regulatory targets using gene expression and binding data. In: Bioinformatics. 2012 ; Vol. 28, No. 2. pp. 206-213.
@article{6edd8e21fa2044ce860519f5a9064a4f,
title = "Discovering transcription factor regulatory targets using gene expression and binding data",
abstract = "Motivation: Identifying the target genes regulated by transcription factors (TFs) is the most basic step in understanding gene regulation. Recent advances in high-throughput sequencing technology, together with chromatin immunoprecipitation (ChIP), enable mapping TF binding sites genome wide, but it is not possible to infer function from binding alone. This is especially true in mammalian systems, where regulation often occurs through long-range enhancers in gene-rich neighborhoods, rather than proximal promoters, preventing straightforward assignment of a binding site to a target gene. Results: We present EMBER (Expectation Maximization of Binding and Expression pRofiles), a method that integrates high-throughput binding data (e.g. ChIP-chip or ChIP-seq) with gene expression data (e.g. DNA microarray) via an unsupervised machine learning algorithm for inferring the gene targets of sets of TF binding sites. Genes selected are those that match overrepresented expression patterns, which can be used to provide information about multiple TF regulatory modes. We apply the method to genome-wide human breast cancer data and demonstrate that EMBER confirms a role for the TFs estrogen receptor alpha, retinoic acid receptors alpha and gamma in breast cancer development, whereas the conventional approach of assigning regulatory targets based on proximity does not. Additionally, we compare several predicted target genes from EMBER to interactions inferred previously, examine combinatorial effects of TFs on gene regulation and illustrate the ability of EMBER to discover multiple modes of regulation.",
author = "Mark Maienschein-Cline and Jie Zhou and White, {Kevin P.} and Roger Sciammas and Dinner, {Aaron R.}",
year = "2012",
month = "1",
doi = "10.1093/bioinformatics/btr628",
language = "English (US)",
volume = "28",
pages = "206--213",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "2",

}

TY - JOUR

T1 - Discovering transcription factor regulatory targets using gene expression and binding data

AU - Maienschein-Cline, Mark

AU - Zhou, Jie

AU - White, Kevin P.

AU - Sciammas, Roger

AU - Dinner, Aaron R.

PY - 2012/1

Y1 - 2012/1

N2 - Motivation: Identifying the target genes regulated by transcription factors (TFs) is the most basic step in understanding gene regulation. Recent advances in high-throughput sequencing technology, together with chromatin immunoprecipitation (ChIP), enable mapping TF binding sites genome wide, but it is not possible to infer function from binding alone. This is especially true in mammalian systems, where regulation often occurs through long-range enhancers in gene-rich neighborhoods, rather than proximal promoters, preventing straightforward assignment of a binding site to a target gene. Results: We present EMBER (Expectation Maximization of Binding and Expression pRofiles), a method that integrates high-throughput binding data (e.g. ChIP-chip or ChIP-seq) with gene expression data (e.g. DNA microarray) via an unsupervised machine learning algorithm for inferring the gene targets of sets of TF binding sites. Genes selected are those that match overrepresented expression patterns, which can be used to provide information about multiple TF regulatory modes. We apply the method to genome-wide human breast cancer data and demonstrate that EMBER confirms a role for the TFs estrogen receptor alpha, retinoic acid receptors alpha and gamma in breast cancer development, whereas the conventional approach of assigning regulatory targets based on proximity does not. Additionally, we compare several predicted target genes from EMBER to interactions inferred previously, examine combinatorial effects of TFs on gene regulation and illustrate the ability of EMBER to discover multiple modes of regulation.

AB - Motivation: Identifying the target genes regulated by transcription factors (TFs) is the most basic step in understanding gene regulation. Recent advances in high-throughput sequencing technology, together with chromatin immunoprecipitation (ChIP), enable mapping TF binding sites genome wide, but it is not possible to infer function from binding alone. This is especially true in mammalian systems, where regulation often occurs through long-range enhancers in gene-rich neighborhoods, rather than proximal promoters, preventing straightforward assignment of a binding site to a target gene. Results: We present EMBER (Expectation Maximization of Binding and Expression pRofiles), a method that integrates high-throughput binding data (e.g. ChIP-chip or ChIP-seq) with gene expression data (e.g. DNA microarray) via an unsupervised machine learning algorithm for inferring the gene targets of sets of TF binding sites. Genes selected are those that match overrepresented expression patterns, which can be used to provide information about multiple TF regulatory modes. We apply the method to genome-wide human breast cancer data and demonstrate that EMBER confirms a role for the TFs estrogen receptor alpha, retinoic acid receptors alpha and gamma in breast cancer development, whereas the conventional approach of assigning regulatory targets based on proximity does not. Additionally, we compare several predicted target genes from EMBER to interactions inferred previously, examine combinatorial effects of TFs on gene regulation and illustrate the ability of EMBER to discover multiple modes of regulation.

UR - http://www.scopus.com/inward/record.url?scp=84862963292&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84862963292&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btr628

DO - 10.1093/bioinformatics/btr628

M3 - Article

C2 - 22084256

AN - SCOPUS:84862963292

VL - 28

SP - 206

EP - 213

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 2

M1 - btr628

ER -