Assessing the gene space in draft genomes

Genis Parra, Keith Bradnam, Zemin Ning, Thomas Keane, Ian F Korf

Research output: Contribution to journalArticle

290 Citations (Scopus)

Abstract

Genome sequencing projects have been initiated for a wide range of eukaryotes. A few projects have reached completion, but most exist as draft assemblies. As one of the main reasons to sequence a genome is to obtain its catalog of genes, an important question is how complete or completable the catalog is in unfinished genomes. To answer this question, we have identified a set of core eukaryotic genes (CEGs), that are extremely highly conserved and which we believe are present in low copy numbers in higher eukaryotes. From an analysis of a phylogenetically diverse set of eukaryotic genome assemblies, we found that the proportion of CEGs mapped in draft genomes provides a useful metric for describing the gene space, and complements the commonly used N50 length and x-fold coverage values.

Original languageEnglish (US)
Pages (from-to)289-297
Number of pages9
JournalNucleic Acids Research
Volume37
Issue number1
DOIs
StatePublished - 2009

Fingerprint

Genome
Genes
Eukaryota

ASJC Scopus subject areas

  • Genetics

Cite this

Assessing the gene space in draft genomes. / Parra, Genis; Bradnam, Keith; Ning, Zemin; Keane, Thomas; Korf, Ian F.

In: Nucleic Acids Research, Vol. 37, No. 1, 2009, p. 289-297.

Research output: Contribution to journalArticle

Parra, G, Bradnam, K, Ning, Z, Keane, T & Korf, IF 2009, 'Assessing the gene space in draft genomes', Nucleic Acids Research, vol. 37, no. 1, pp. 289-297. https://doi.org/10.1093/nar/gkn916
Parra, Genis ; Bradnam, Keith ; Ning, Zemin ; Keane, Thomas ; Korf, Ian F. / Assessing the gene space in draft genomes. In: Nucleic Acids Research. 2009 ; Vol. 37, No. 1. pp. 289-297.
@article{8dd2d92b75e248e5a854f1d9301889b9,
title = "Assessing the gene space in draft genomes",
abstract = "Genome sequencing projects have been initiated for a wide range of eukaryotes. A few projects have reached completion, but most exist as draft assemblies. As one of the main reasons to sequence a genome is to obtain its catalog of genes, an important question is how complete or completable the catalog is in unfinished genomes. To answer this question, we have identified a set of core eukaryotic genes (CEGs), that are extremely highly conserved and which we believe are present in low copy numbers in higher eukaryotes. From an analysis of a phylogenetically diverse set of eukaryotic genome assemblies, we found that the proportion of CEGs mapped in draft genomes provides a useful metric for describing the gene space, and complements the commonly used N50 length and x-fold coverage values.",
author = "Genis Parra and Keith Bradnam and Zemin Ning and Thomas Keane and Korf, {Ian F}",
year = "2009",
doi = "10.1093/nar/gkn916",
language = "English (US)",
volume = "37",
pages = "289--297",
journal = "Nucleic Acids Research",
issn = "0305-1048",
publisher = "Oxford University Press",
number = "1",

}

TY - JOUR

T1 - Assessing the gene space in draft genomes

AU - Parra, Genis

AU - Bradnam, Keith

AU - Ning, Zemin

AU - Keane, Thomas

AU - Korf, Ian F

PY - 2009

Y1 - 2009

N2 - Genome sequencing projects have been initiated for a wide range of eukaryotes. A few projects have reached completion, but most exist as draft assemblies. As one of the main reasons to sequence a genome is to obtain its catalog of genes, an important question is how complete or completable the catalog is in unfinished genomes. To answer this question, we have identified a set of core eukaryotic genes (CEGs), that are extremely highly conserved and which we believe are present in low copy numbers in higher eukaryotes. From an analysis of a phylogenetically diverse set of eukaryotic genome assemblies, we found that the proportion of CEGs mapped in draft genomes provides a useful metric for describing the gene space, and complements the commonly used N50 length and x-fold coverage values.

AB - Genome sequencing projects have been initiated for a wide range of eukaryotes. A few projects have reached completion, but most exist as draft assemblies. As one of the main reasons to sequence a genome is to obtain its catalog of genes, an important question is how complete or completable the catalog is in unfinished genomes. To answer this question, we have identified a set of core eukaryotic genes (CEGs), that are extremely highly conserved and which we believe are present in low copy numbers in higher eukaryotes. From an analysis of a phylogenetically diverse set of eukaryotic genome assemblies, we found that the proportion of CEGs mapped in draft genomes provides a useful metric for describing the gene space, and complements the commonly used N50 length and x-fold coverage values.

UR - http://www.scopus.com/inward/record.url?scp=58549121169&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=58549121169&partnerID=8YFLogxK

U2 - 10.1093/nar/gkn916

DO - 10.1093/nar/gkn916

M3 - Article

VL - 37

SP - 289

EP - 297

JO - Nucleic Acids Research

JF - Nucleic Acids Research

SN - 0305-1048

IS - 1

ER -