Data-directed RNA secondary structure prediction using probabilistic modeling

Fei Deng, Mirko Ledda, Sana Vaziri, Sharon Aviran

Research output: Contribution to journalArticle

13 Citations (Scopus)

Abstract

Structure dictates the function of many RNAs, but secondary RNA structure analysis is either labor intensive and costly or relies on computational predictions that are often inaccurate. These limitations are alleviated by integration of structure probing data into prediction algorithms. However, existing algorithms are optimized for a specific type of probing data. Recently, new chemistries combined with advances in sequencing have facilitated structure probing at unprecedented scale and sensitivity. These novel technologies and anticipated wealth of data highlight a need for algorithms that readily accommodate more complex and diverse input sources. We implemented and investigated a recently outlined probabilistic framework for RNA secondary structure prediction and extended it to accommodate further refinement of structural information. This framework utilizes direct likelihood-based calculations of pseudo-energy terms per considered structural context and can readily accommodate diverse data types and complex data dependencies. We use real data in conjunction with simulations to evaluate performances of several implementations and to show that proper integration of structural contexts can lead to improvements. Our tests also reveal discrepancies between real data and simulations, which we show can be alleviated by refined modeling. We then propose statistical preprocessing approaches to standardize data interpretation and integration into such a generic framework. We further systematically quantify the information content of data subsets, demonstrating that high reactivities are major drivers of SHAPE-directed predictions and that better understanding of less informative reactivities is key to further improvements. Finally, we provide evidence for the adaptive capability of our framework using mock probe simulations.

Original languageEnglish (US)
Pages (from-to)1109-1119
Number of pages11
JournalRNA
Volume22
Issue number8
DOIs
StatePublished - Aug 1 2016

Fingerprint

RNA
Technology

Keywords

  • Data-directed
  • Minimum free energy
  • Probabilistic models
  • RNA secondary structure
  • Statistical inference

ASJC Scopus subject areas

  • Molecular Biology

Cite this

Data-directed RNA secondary structure prediction using probabilistic modeling. / Deng, Fei; Ledda, Mirko; Vaziri, Sana; Aviran, Sharon.

In: RNA, Vol. 22, No. 8, 01.08.2016, p. 1109-1119.

Research output: Contribution to journalArticle

Deng, Fei ; Ledda, Mirko ; Vaziri, Sana ; Aviran, Sharon. / Data-directed RNA secondary structure prediction using probabilistic modeling. In: RNA. 2016 ; Vol. 22, No. 8. pp. 1109-1119.
@article{278cc7876a2d49e0b71160c29158d0c2,
title = "Data-directed RNA secondary structure prediction using probabilistic modeling",
abstract = "Structure dictates the function of many RNAs, but secondary RNA structure analysis is either labor intensive and costly or relies on computational predictions that are often inaccurate. These limitations are alleviated by integration of structure probing data into prediction algorithms. However, existing algorithms are optimized for a specific type of probing data. Recently, new chemistries combined with advances in sequencing have facilitated structure probing at unprecedented scale and sensitivity. These novel technologies and anticipated wealth of data highlight a need for algorithms that readily accommodate more complex and diverse input sources. We implemented and investigated a recently outlined probabilistic framework for RNA secondary structure prediction and extended it to accommodate further refinement of structural information. This framework utilizes direct likelihood-based calculations of pseudo-energy terms per considered structural context and can readily accommodate diverse data types and complex data dependencies. We use real data in conjunction with simulations to evaluate performances of several implementations and to show that proper integration of structural contexts can lead to improvements. Our tests also reveal discrepancies between real data and simulations, which we show can be alleviated by refined modeling. We then propose statistical preprocessing approaches to standardize data interpretation and integration into such a generic framework. We further systematically quantify the information content of data subsets, demonstrating that high reactivities are major drivers of SHAPE-directed predictions and that better understanding of less informative reactivities is key to further improvements. Finally, we provide evidence for the adaptive capability of our framework using mock probe simulations.",
keywords = "Data-directed, Minimum free energy, Probabilistic models, RNA secondary structure, Statistical inference",
author = "Fei Deng and Mirko Ledda and Sana Vaziri and Sharon Aviran",
year = "2016",
month = "8",
day = "1",
doi = "10.1261/rna.055756.115",
language = "English (US)",
volume = "22",
pages = "1109--1119",
journal = "RNA",
issn = "1355-8382",
publisher = "Cold Spring Harbor Laboratory Press",
number = "8",

}

TY - JOUR

T1 - Data-directed RNA secondary structure prediction using probabilistic modeling

AU - Deng, Fei

AU - Ledda, Mirko

AU - Vaziri, Sana

AU - Aviran, Sharon

PY - 2016/8/1

Y1 - 2016/8/1

N2 - Structure dictates the function of many RNAs, but secondary RNA structure analysis is either labor intensive and costly or relies on computational predictions that are often inaccurate. These limitations are alleviated by integration of structure probing data into prediction algorithms. However, existing algorithms are optimized for a specific type of probing data. Recently, new chemistries combined with advances in sequencing have facilitated structure probing at unprecedented scale and sensitivity. These novel technologies and anticipated wealth of data highlight a need for algorithms that readily accommodate more complex and diverse input sources. We implemented and investigated a recently outlined probabilistic framework for RNA secondary structure prediction and extended it to accommodate further refinement of structural information. This framework utilizes direct likelihood-based calculations of pseudo-energy terms per considered structural context and can readily accommodate diverse data types and complex data dependencies. We use real data in conjunction with simulations to evaluate performances of several implementations and to show that proper integration of structural contexts can lead to improvements. Our tests also reveal discrepancies between real data and simulations, which we show can be alleviated by refined modeling. We then propose statistical preprocessing approaches to standardize data interpretation and integration into such a generic framework. We further systematically quantify the information content of data subsets, demonstrating that high reactivities are major drivers of SHAPE-directed predictions and that better understanding of less informative reactivities is key to further improvements. Finally, we provide evidence for the adaptive capability of our framework using mock probe simulations.

AB - Structure dictates the function of many RNAs, but secondary RNA structure analysis is either labor intensive and costly or relies on computational predictions that are often inaccurate. These limitations are alleviated by integration of structure probing data into prediction algorithms. However, existing algorithms are optimized for a specific type of probing data. Recently, new chemistries combined with advances in sequencing have facilitated structure probing at unprecedented scale and sensitivity. These novel technologies and anticipated wealth of data highlight a need for algorithms that readily accommodate more complex and diverse input sources. We implemented and investigated a recently outlined probabilistic framework for RNA secondary structure prediction and extended it to accommodate further refinement of structural information. This framework utilizes direct likelihood-based calculations of pseudo-energy terms per considered structural context and can readily accommodate diverse data types and complex data dependencies. We use real data in conjunction with simulations to evaluate performances of several implementations and to show that proper integration of structural contexts can lead to improvements. Our tests also reveal discrepancies between real data and simulations, which we show can be alleviated by refined modeling. We then propose statistical preprocessing approaches to standardize data interpretation and integration into such a generic framework. We further systematically quantify the information content of data subsets, demonstrating that high reactivities are major drivers of SHAPE-directed predictions and that better understanding of less informative reactivities is key to further improvements. Finally, we provide evidence for the adaptive capability of our framework using mock probe simulations.

KW - Data-directed

KW - Minimum free energy

KW - Probabilistic models

KW - RNA secondary structure

KW - Statistical inference

UR - http://www.scopus.com/inward/record.url?scp=84979664009&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84979664009&partnerID=8YFLogxK

U2 - 10.1261/rna.055756.115

DO - 10.1261/rna.055756.115

M3 - Article

C2 - 27251549

AN - SCOPUS:84979664009

VL - 22

SP - 1109

EP - 1119

JO - RNA

JF - RNA

SN - 1355-8382

IS - 8

ER -