Refined annotation and assembly of the Tetrahymena thermophila genome sequence through EST analysis, comparative genomic hybridization, and targeted gap closure

Robert S. Coyne, Mathangi Thiagarajan, Kristie M. Jones, Jennifer R. Wortman, Luke J. Tallon, Brian J. Haas, Donna M. Cassidy-Hanley, Emily A. Wiley, Joshua J. Smith, Kathleen Collins, Suzanne R. Lee, Mary T. Couvillion, Yifan Liu, Jyoti Garg, Ronald E. Pearlman, Eileen P. Hamilton, Eduardo Orias, Jonathan A Eisen, Barbara A. Methé

Research output: Contribution to journalArticle

65 Citations (Scopus)

Abstract

Background: Tetrahymena thermophila, a widely studied model for cellular and molecular biology, is a binucleated single-celled organism with a germline micronucleus (MIC) and somatic macronucleus (MAC). The recent draft MAC genome assembly revealed low sequence repetitiveness, a result of the epigenetic removal of invasive DNA elements found only in the MIC genome. Such low repetitiveness makes complete closure of the MAC genome a feasible goal, which to achieve would require standard closure methods as well as removal of minor MIC contamination of the MAC genome assembly. Highly accurate preliminary annotation of Tetrahymena's coding potential was hindered by the lack of both comparative genomic sequence information from close relatives and significant amounts of cDNA evidence, thus limiting the value of the genomic information and also leaving unanswered certain questions, such as the frequency of alternative splicing. Results: We addressed the problem of MIC contamination using comparative genomic hybridization with purified MIC and MAC DNA probes against a whole genome oligonucleotide microarray, allowing the identification of 763 genome scaffolds likely to contain MIC-limited DNA sequences. We also employed standard genome closure methods to essentially finish over 60% of the MAC genome. For the improvement of annotation, we have sequenced and analyzed over 60,000 verified EST reads from a variety of cellular growth and development conditions. Using this EST evidence, a combination of automated and manual reannotation efforts led to updates that affect 16% of the current protein-coding gene models. By comparing EST abundance, many genes showing apparent differential expression between these conditions were identified. Rare instances of alternative splicing and uses of the non-standard amino acid selenocysteine were also identified. Conclusion: We report here significant progress in genome closure and reannotation of Tetrahymena thermophila. Our experience to date suggests that complete closure of the MAC genome is attainable. Using the new EST evidence, automated and manual curation has resulted in substantial improvements to the over 24,000 gene models, which will be valuable to researchers studying this model organism as well as for comparative genomics purposes.

Original languageEnglish (US)
Article number562
JournalBMC Genomics
Volume9
DOIs
StatePublished - Nov 26 2008

Fingerprint

Tetrahymena thermophila
Comparative Genomic Hybridization
Expressed Sequence Tags
Macronucleus
Genome
Alternative Splicing
Germline Micronucleus
Selenocysteine
Tetrahymena
DNA Probes
Genomics
Oligonucleotide Array Sequence Analysis
Growth and Development
Epigenomics
Genes
Cell Biology
Molecular Biology
Complementary DNA
Research Personnel

ASJC Scopus subject areas

  • Biotechnology
  • Genetics

Cite this

Refined annotation and assembly of the Tetrahymena thermophila genome sequence through EST analysis, comparative genomic hybridization, and targeted gap closure. / Coyne, Robert S.; Thiagarajan, Mathangi; Jones, Kristie M.; Wortman, Jennifer R.; Tallon, Luke J.; Haas, Brian J.; Cassidy-Hanley, Donna M.; Wiley, Emily A.; Smith, Joshua J.; Collins, Kathleen; Lee, Suzanne R.; Couvillion, Mary T.; Liu, Yifan; Garg, Jyoti; Pearlman, Ronald E.; Hamilton, Eileen P.; Orias, Eduardo; Eisen, Jonathan A; Methé, Barbara A.

In: BMC Genomics, Vol. 9, 562, 26.11.2008.

Research output: Contribution to journalArticle

Coyne, RS, Thiagarajan, M, Jones, KM, Wortman, JR, Tallon, LJ, Haas, BJ, Cassidy-Hanley, DM, Wiley, EA, Smith, JJ, Collins, K, Lee, SR, Couvillion, MT, Liu, Y, Garg, J, Pearlman, RE, Hamilton, EP, Orias, E, Eisen, JA & Methé, BA 2008, 'Refined annotation and assembly of the Tetrahymena thermophila genome sequence through EST analysis, comparative genomic hybridization, and targeted gap closure', BMC Genomics, vol. 9, 562. https://doi.org/10.1186/1471-2164-9-562
Coyne, Robert S. ; Thiagarajan, Mathangi ; Jones, Kristie M. ; Wortman, Jennifer R. ; Tallon, Luke J. ; Haas, Brian J. ; Cassidy-Hanley, Donna M. ; Wiley, Emily A. ; Smith, Joshua J. ; Collins, Kathleen ; Lee, Suzanne R. ; Couvillion, Mary T. ; Liu, Yifan ; Garg, Jyoti ; Pearlman, Ronald E. ; Hamilton, Eileen P. ; Orias, Eduardo ; Eisen, Jonathan A ; Methé, Barbara A. / Refined annotation and assembly of the Tetrahymena thermophila genome sequence through EST analysis, comparative genomic hybridization, and targeted gap closure. In: BMC Genomics. 2008 ; Vol. 9.
@article{b6be65a0c52e4571ab451a3489322049,
title = "Refined annotation and assembly of the Tetrahymena thermophila genome sequence through EST analysis, comparative genomic hybridization, and targeted gap closure",
abstract = "Background: Tetrahymena thermophila, a widely studied model for cellular and molecular biology, is a binucleated single-celled organism with a germline micronucleus (MIC) and somatic macronucleus (MAC). The recent draft MAC genome assembly revealed low sequence repetitiveness, a result of the epigenetic removal of invasive DNA elements found only in the MIC genome. Such low repetitiveness makes complete closure of the MAC genome a feasible goal, which to achieve would require standard closure methods as well as removal of minor MIC contamination of the MAC genome assembly. Highly accurate preliminary annotation of Tetrahymena's coding potential was hindered by the lack of both comparative genomic sequence information from close relatives and significant amounts of cDNA evidence, thus limiting the value of the genomic information and also leaving unanswered certain questions, such as the frequency of alternative splicing. Results: We addressed the problem of MIC contamination using comparative genomic hybridization with purified MIC and MAC DNA probes against a whole genome oligonucleotide microarray, allowing the identification of 763 genome scaffolds likely to contain MIC-limited DNA sequences. We also employed standard genome closure methods to essentially finish over 60{\%} of the MAC genome. For the improvement of annotation, we have sequenced and analyzed over 60,000 verified EST reads from a variety of cellular growth and development conditions. Using this EST evidence, a combination of automated and manual reannotation efforts led to updates that affect 16{\%} of the current protein-coding gene models. By comparing EST abundance, many genes showing apparent differential expression between these conditions were identified. Rare instances of alternative splicing and uses of the non-standard amino acid selenocysteine were also identified. Conclusion: We report here significant progress in genome closure and reannotation of Tetrahymena thermophila. Our experience to date suggests that complete closure of the MAC genome is attainable. Using the new EST evidence, automated and manual curation has resulted in substantial improvements to the over 24,000 gene models, which will be valuable to researchers studying this model organism as well as for comparative genomics purposes.",
author = "Coyne, {Robert S.} and Mathangi Thiagarajan and Jones, {Kristie M.} and Wortman, {Jennifer R.} and Tallon, {Luke J.} and Haas, {Brian J.} and Cassidy-Hanley, {Donna M.} and Wiley, {Emily A.} and Smith, {Joshua J.} and Kathleen Collins and Lee, {Suzanne R.} and Couvillion, {Mary T.} and Yifan Liu and Jyoti Garg and Pearlman, {Ronald E.} and Hamilton, {Eileen P.} and Eduardo Orias and Eisen, {Jonathan A} and Meth{\'e}, {Barbara A.}",
year = "2008",
month = "11",
day = "26",
doi = "10.1186/1471-2164-9-562",
language = "English (US)",
volume = "9",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central",

}

TY - JOUR

T1 - Refined annotation and assembly of the Tetrahymena thermophila genome sequence through EST analysis, comparative genomic hybridization, and targeted gap closure

AU - Coyne, Robert S.

AU - Thiagarajan, Mathangi

AU - Jones, Kristie M.

AU - Wortman, Jennifer R.

AU - Tallon, Luke J.

AU - Haas, Brian J.

AU - Cassidy-Hanley, Donna M.

AU - Wiley, Emily A.

AU - Smith, Joshua J.

AU - Collins, Kathleen

AU - Lee, Suzanne R.

AU - Couvillion, Mary T.

AU - Liu, Yifan

AU - Garg, Jyoti

AU - Pearlman, Ronald E.

AU - Hamilton, Eileen P.

AU - Orias, Eduardo

AU - Eisen, Jonathan A

AU - Methé, Barbara A.

PY - 2008/11/26

Y1 - 2008/11/26

N2 - Background: Tetrahymena thermophila, a widely studied model for cellular and molecular biology, is a binucleated single-celled organism with a germline micronucleus (MIC) and somatic macronucleus (MAC). The recent draft MAC genome assembly revealed low sequence repetitiveness, a result of the epigenetic removal of invasive DNA elements found only in the MIC genome. Such low repetitiveness makes complete closure of the MAC genome a feasible goal, which to achieve would require standard closure methods as well as removal of minor MIC contamination of the MAC genome assembly. Highly accurate preliminary annotation of Tetrahymena's coding potential was hindered by the lack of both comparative genomic sequence information from close relatives and significant amounts of cDNA evidence, thus limiting the value of the genomic information and also leaving unanswered certain questions, such as the frequency of alternative splicing. Results: We addressed the problem of MIC contamination using comparative genomic hybridization with purified MIC and MAC DNA probes against a whole genome oligonucleotide microarray, allowing the identification of 763 genome scaffolds likely to contain MIC-limited DNA sequences. We also employed standard genome closure methods to essentially finish over 60% of the MAC genome. For the improvement of annotation, we have sequenced and analyzed over 60,000 verified EST reads from a variety of cellular growth and development conditions. Using this EST evidence, a combination of automated and manual reannotation efforts led to updates that affect 16% of the current protein-coding gene models. By comparing EST abundance, many genes showing apparent differential expression between these conditions were identified. Rare instances of alternative splicing and uses of the non-standard amino acid selenocysteine were also identified. Conclusion: We report here significant progress in genome closure and reannotation of Tetrahymena thermophila. Our experience to date suggests that complete closure of the MAC genome is attainable. Using the new EST evidence, automated and manual curation has resulted in substantial improvements to the over 24,000 gene models, which will be valuable to researchers studying this model organism as well as for comparative genomics purposes.

AB - Background: Tetrahymena thermophila, a widely studied model for cellular and molecular biology, is a binucleated single-celled organism with a germline micronucleus (MIC) and somatic macronucleus (MAC). The recent draft MAC genome assembly revealed low sequence repetitiveness, a result of the epigenetic removal of invasive DNA elements found only in the MIC genome. Such low repetitiveness makes complete closure of the MAC genome a feasible goal, which to achieve would require standard closure methods as well as removal of minor MIC contamination of the MAC genome assembly. Highly accurate preliminary annotation of Tetrahymena's coding potential was hindered by the lack of both comparative genomic sequence information from close relatives and significant amounts of cDNA evidence, thus limiting the value of the genomic information and also leaving unanswered certain questions, such as the frequency of alternative splicing. Results: We addressed the problem of MIC contamination using comparative genomic hybridization with purified MIC and MAC DNA probes against a whole genome oligonucleotide microarray, allowing the identification of 763 genome scaffolds likely to contain MIC-limited DNA sequences. We also employed standard genome closure methods to essentially finish over 60% of the MAC genome. For the improvement of annotation, we have sequenced and analyzed over 60,000 verified EST reads from a variety of cellular growth and development conditions. Using this EST evidence, a combination of automated and manual reannotation efforts led to updates that affect 16% of the current protein-coding gene models. By comparing EST abundance, many genes showing apparent differential expression between these conditions were identified. Rare instances of alternative splicing and uses of the non-standard amino acid selenocysteine were also identified. Conclusion: We report here significant progress in genome closure and reannotation of Tetrahymena thermophila. Our experience to date suggests that complete closure of the MAC genome is attainable. Using the new EST evidence, automated and manual curation has resulted in substantial improvements to the over 24,000 gene models, which will be valuable to researchers studying this model organism as well as for comparative genomics purposes.

UR - http://www.scopus.com/inward/record.url?scp=58149307926&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=58149307926&partnerID=8YFLogxK

U2 - 10.1186/1471-2164-9-562

DO - 10.1186/1471-2164-9-562

M3 - Article

C2 - 19036158

AN - SCOPUS:58149307926

VL - 9

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

M1 - 562

ER -