Advances in high-throughput technologies, such as ChIP-chip, and the completion of human and mouse genomic sequences now allow analysis of the mechanisms of gene regulation on a systems level. In this study, we have developed a computational genomics approach (termed ChIPModules), which begins with experimentally determined binding sites and integrates positional weight matrices constructed from transcription factor binding sites, a comparative genomics approach, and statistical learning methods to identify transcriptional regulatory modules. We began with E2F1 binding site information obtained from ChIP-chip analyses of ENCODE regions, from both HeLa and MCF7 cells. Our approach not only distinguished targets from nontargets with a high specificity, but it also identified five regulatory modules for E2F1. One of the identified modules predicted a colocalization of E2F1 and AP-2α on a set of target promoters with an intersite distance of <270 bp. We tested this prediction using ChIP-chip assays with arrays containing ∼14,000 human promoters. We found that both E2F1 and AP-2α bind within the predicted distance to a large number of human promoters, demonstrating the strength of our sequence-based, unbiased, and universal protocol. Finally, we have used our ChIPModules approach to develop a database that includes thousands of computationally identified and/or experimentally verified E2F1 target promoters.
ASJC Scopus subject areas