Important patterns can be found in strings of characters such as nucleotides in a DNA sequence by examining the frequency of occurrence of specific character combinations or words. The abundance of words can reveal the presence of underlying trends governing the order of characters, even if the biological reasons for those trends remain mysterious. As an example of one way in which word frequencies have provided insight, we describe the IMEter, a word-based algorithm for analyzing introns and their effect on gene expression. The IMEter demonstrates that introns located near the beginning of genes are compositionally distinct from later introns and that these differences are closely related to the ability of some introns to increase gene expression. This word-based approach has proven more successful than deletion analysis at identifying the sequences responsible for elevating expression because they are dispersed throughout stimulatory introns.
|Original language||English (US)|
|Number of pages||15|
|Journal||Methods in molecular biology (Clifton, N.J.)|
|State||Published - 2009|
ASJC Scopus subject areas