A probabilistic model of 3′ end formation in Caenorhabditis elegans

Ashwin Hajarnavis, Ian F Korf, Richard Durbin

Research output: Contribution to journalArticle

47 Citations (Scopus)

Abstract

The 3′ ends of mRNAs terminate with a poly(A) tail. This post-transcriptional modification is directed by sequence features present in the 3′-untranslated region (3′-UTR). We have undertaken a computational analysis of 3′ end formation in Caenorhabditis elegans. By aligning cDNAs that diverge from genomic sequence at the poly(A) tract, we accurately identified a large set of true cleavage sites. When there are many transcripts aligned to a particular locus, local variation of the cleavage site over a span of a few bases is frequently observed. We find that in addition to the well-known AAUAAA motif there are several regions with distinct nucleotide compositional biases. We propose a generalized hidden Markov model that describes sequence features in C.elegans 3′-UTRs. We find that a computer program employing this model accurately predicts experimentally observed 3′ ends even when there are multiple AAUAAA motifs and multiple cleavage sites. We have made available a complete set of polyadenylation site predictions for the C.elegans genome, including a subset of 6570 supported by aligned transcripts.

Original languageEnglish (US)
Pages (from-to)3392-3399
Number of pages8
JournalNucleic Acids Research
Volume32
Issue number11
DOIs
StatePublished - 2004
Externally publishedYes

Fingerprint

Caenorhabditis elegans
3' Untranslated Regions
Statistical Models
Messenger RNA
Polyadenylation
Poly A
Software
Nucleotides
Complementary DNA
Genome

ASJC Scopus subject areas

  • Genetics

Cite this

A probabilistic model of 3′ end formation in Caenorhabditis elegans. / Hajarnavis, Ashwin; Korf, Ian F; Durbin, Richard.

In: Nucleic Acids Research, Vol. 32, No. 11, 2004, p. 3392-3399.

Research output: Contribution to journalArticle

Hajarnavis, Ashwin ; Korf, Ian F ; Durbin, Richard. / A probabilistic model of 3′ end formation in Caenorhabditis elegans. In: Nucleic Acids Research. 2004 ; Vol. 32, No. 11. pp. 3392-3399.
@article{fdfc355e181e46a8b673f7c6c92af447,
title = "A probabilistic model of 3′ end formation in Caenorhabditis elegans",
abstract = "The 3′ ends of mRNAs terminate with a poly(A) tail. This post-transcriptional modification is directed by sequence features present in the 3′-untranslated region (3′-UTR). We have undertaken a computational analysis of 3′ end formation in Caenorhabditis elegans. By aligning cDNAs that diverge from genomic sequence at the poly(A) tract, we accurately identified a large set of true cleavage sites. When there are many transcripts aligned to a particular locus, local variation of the cleavage site over a span of a few bases is frequently observed. We find that in addition to the well-known AAUAAA motif there are several regions with distinct nucleotide compositional biases. We propose a generalized hidden Markov model that describes sequence features in C.elegans 3′-UTRs. We find that a computer program employing this model accurately predicts experimentally observed 3′ ends even when there are multiple AAUAAA motifs and multiple cleavage sites. We have made available a complete set of polyadenylation site predictions for the C.elegans genome, including a subset of 6570 supported by aligned transcripts.",
author = "Ashwin Hajarnavis and Korf, {Ian F} and Richard Durbin",
year = "2004",
doi = "10.1093/nar/gkh656",
language = "English (US)",
volume = "32",
pages = "3392--3399",
journal = "Nucleic Acids Research",
issn = "0305-1048",
publisher = "Oxford University Press",
number = "11",

}

TY - JOUR

T1 - A probabilistic model of 3′ end formation in Caenorhabditis elegans

AU - Hajarnavis, Ashwin

AU - Korf, Ian F

AU - Durbin, Richard

PY - 2004

Y1 - 2004

N2 - The 3′ ends of mRNAs terminate with a poly(A) tail. This post-transcriptional modification is directed by sequence features present in the 3′-untranslated region (3′-UTR). We have undertaken a computational analysis of 3′ end formation in Caenorhabditis elegans. By aligning cDNAs that diverge from genomic sequence at the poly(A) tract, we accurately identified a large set of true cleavage sites. When there are many transcripts aligned to a particular locus, local variation of the cleavage site over a span of a few bases is frequently observed. We find that in addition to the well-known AAUAAA motif there are several regions with distinct nucleotide compositional biases. We propose a generalized hidden Markov model that describes sequence features in C.elegans 3′-UTRs. We find that a computer program employing this model accurately predicts experimentally observed 3′ ends even when there are multiple AAUAAA motifs and multiple cleavage sites. We have made available a complete set of polyadenylation site predictions for the C.elegans genome, including a subset of 6570 supported by aligned transcripts.

AB - The 3′ ends of mRNAs terminate with a poly(A) tail. This post-transcriptional modification is directed by sequence features present in the 3′-untranslated region (3′-UTR). We have undertaken a computational analysis of 3′ end formation in Caenorhabditis elegans. By aligning cDNAs that diverge from genomic sequence at the poly(A) tract, we accurately identified a large set of true cleavage sites. When there are many transcripts aligned to a particular locus, local variation of the cleavage site over a span of a few bases is frequently observed. We find that in addition to the well-known AAUAAA motif there are several regions with distinct nucleotide compositional biases. We propose a generalized hidden Markov model that describes sequence features in C.elegans 3′-UTRs. We find that a computer program employing this model accurately predicts experimentally observed 3′ ends even when there are multiple AAUAAA motifs and multiple cleavage sites. We have made available a complete set of polyadenylation site predictions for the C.elegans genome, including a subset of 6570 supported by aligned transcripts.

UR - http://www.scopus.com/inward/record.url?scp=3242672609&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=3242672609&partnerID=8YFLogxK

U2 - 10.1093/nar/gkh656

DO - 10.1093/nar/gkh656

M3 - Article

C2 - 15247332

AN - SCOPUS:3242672609

VL - 32

SP - 3392

EP - 3399

JO - Nucleic Acids Research

JF - Nucleic Acids Research

SN - 0305-1048

IS - 11

ER -