The 3′ ends of mRNAs terminate with a poly(A) tail. This post-transcriptional modification is directed by sequence features present in the 3′-untranslated region (3′-UTR). We have undertaken a computational analysis of 3′ end formation in Caenorhabditis elegans. By aligning cDNAs that diverge from genomic sequence at the poly(A) tract, we accurately identified a large set of true cleavage sites. When there are many transcripts aligned to a particular locus, local variation of the cleavage site over a span of a few bases is frequently observed. We find that in addition to the well-known AAUAAA motif there are several regions with distinct nucleotide compositional biases. We propose a generalized hidden Markov model that describes sequence features in C.elegans 3′-UTRs. We find that a computer program employing this model accurately predicts experimentally observed 3′ ends even when there are multiple AAUAAA motifs and multiple cleavage sites. We have made available a complete set of polyadenylation site predictions for the C.elegans genome, including a subset of 6570 supported by aligned transcripts.
ASJC Scopus subject areas