Pegasys: Software for executing and integrating analyses of biological sequences

Sohrab P. Shah, David Y.M. He, Jessica N. Sawkins, Jeffrey C. Druce, Gerald Quon, Drew Lett, Grace X.Y. Zheng, Tao Xu, B. F. Francis Ouellette

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Background We present Pegasys-a flexible, modular and customizable software system that facilitates the execution and data integration from heterogeneous biological sequence analysis tools. Results The Pegasys system includes numerous tools for pair-wise and multiple sequence alignment, ab initio gene prediction, RNA gene detection, masking repetitive sequences in genomic DNA as well as filters for database formatting and processing raw output from various analysis tools. We introduce a novel data structure for creating workflows of sequence analyses and a unified data model to store its results. The software allows users to dynamically create analysis workflows at run-time by manipulating a graphical user interface. All non-serial dependent analyses are executed in parallel on a compute cluster for efficiency of data generation. The uniform data model and backend relational database management system of Pegasys allow for results of heterogeneous programs included in the workflow to be integrated and exported into General Feature Format for further analyses in GFF-dependent tools, or GAME XML for import into the Apollo genome editor. The modularity of the design allows for new tools to be added to the system with little programmer overhead. The database application programming interface allows programmatic access to the data stored in the backend through SQL queries. Conclusions The Pegasys system enables biologists and bioinformaticians to create and manage sequence analysis workflows. The software is released under the Open Source GNU General Public License. All source code and documentation is available for download at http://bioinformatics.ubc.ca/pegasys/.

Original languageEnglish (US)
Title of host publicationData Structure and Software Engineering
Subtitle of host publicationChallenges and Improvements
PublisherApple Academic Press
Pages286-310
Number of pages25
ISBN (Electronic)9781466562608
ISBN (Print)9781926692975
StatePublished - Apr 19 2016

Fingerprint

Work Flow
Software
Data structures
Genes
Sequence Analysis
Data Model
Gene
Multiple Sequence Alignment
Dependent
Data integration
Data Integration
Masking
Graphical User Interface
Bioinformatics
Graphical user interfaces
Relational Database
Modularity
RNA
Application programming interfaces (API)
XML

ASJC Scopus subject areas

  • Computer Science(all)
  • Mathematics(all)

Cite this

Shah, S. P., He, D. Y. M., Sawkins, J. N., Druce, J. C., Quon, G., Lett, D., ... Francis Ouellette, B. F. (2016). Pegasys: Software for executing and integrating analyses of biological sequences. In Data Structure and Software Engineering: Challenges and Improvements (pp. 286-310). Apple Academic Press.

Pegasys : Software for executing and integrating analyses of biological sequences. / Shah, Sohrab P.; He, David Y.M.; Sawkins, Jessica N.; Druce, Jeffrey C.; Quon, Gerald; Lett, Drew; Zheng, Grace X.Y.; Xu, Tao; Francis Ouellette, B. F.

Data Structure and Software Engineering: Challenges and Improvements. Apple Academic Press, 2016. p. 286-310.

Research output: Chapter in Book/Report/Conference proceedingChapter

Shah, SP, He, DYM, Sawkins, JN, Druce, JC, Quon, G, Lett, D, Zheng, GXY, Xu, T & Francis Ouellette, BF 2016, Pegasys: Software for executing and integrating analyses of biological sequences. in Data Structure and Software Engineering: Challenges and Improvements. Apple Academic Press, pp. 286-310.
Shah SP, He DYM, Sawkins JN, Druce JC, Quon G, Lett D et al. Pegasys: Software for executing and integrating analyses of biological sequences. In Data Structure and Software Engineering: Challenges and Improvements. Apple Academic Press. 2016. p. 286-310
Shah, Sohrab P. ; He, David Y.M. ; Sawkins, Jessica N. ; Druce, Jeffrey C. ; Quon, Gerald ; Lett, Drew ; Zheng, Grace X.Y. ; Xu, Tao ; Francis Ouellette, B. F. / Pegasys : Software for executing and integrating analyses of biological sequences. Data Structure and Software Engineering: Challenges and Improvements. Apple Academic Press, 2016. pp. 286-310
@inbook{a9e424d4d406428382bde4812f1f64ef,
title = "Pegasys: Software for executing and integrating analyses of biological sequences",
abstract = "Background We present Pegasys-a flexible, modular and customizable software system that facilitates the execution and data integration from heterogeneous biological sequence analysis tools. Results The Pegasys system includes numerous tools for pair-wise and multiple sequence alignment, ab initio gene prediction, RNA gene detection, masking repetitive sequences in genomic DNA as well as filters for database formatting and processing raw output from various analysis tools. We introduce a novel data structure for creating workflows of sequence analyses and a unified data model to store its results. The software allows users to dynamically create analysis workflows at run-time by manipulating a graphical user interface. All non-serial dependent analyses are executed in parallel on a compute cluster for efficiency of data generation. The uniform data model and backend relational database management system of Pegasys allow for results of heterogeneous programs included in the workflow to be integrated and exported into General Feature Format for further analyses in GFF-dependent tools, or GAME XML for import into the Apollo genome editor. The modularity of the design allows for new tools to be added to the system with little programmer overhead. The database application programming interface allows programmatic access to the data stored in the backend through SQL queries. Conclusions The Pegasys system enables biologists and bioinformaticians to create and manage sequence analysis workflows. The software is released under the Open Source GNU General Public License. All source code and documentation is available for download at http://bioinformatics.ubc.ca/pegasys/.",
author = "Shah, {Sohrab P.} and He, {David Y.M.} and Sawkins, {Jessica N.} and Druce, {Jeffrey C.} and Gerald Quon and Drew Lett and Zheng, {Grace X.Y.} and Tao Xu and {Francis Ouellette}, {B. F.}",
year = "2016",
month = "4",
day = "19",
language = "English (US)",
isbn = "9781926692975",
pages = "286--310",
booktitle = "Data Structure and Software Engineering",
publisher = "Apple Academic Press",

}

TY - CHAP

T1 - Pegasys

T2 - Software for executing and integrating analyses of biological sequences

AU - Shah, Sohrab P.

AU - He, David Y.M.

AU - Sawkins, Jessica N.

AU - Druce, Jeffrey C.

AU - Quon, Gerald

AU - Lett, Drew

AU - Zheng, Grace X.Y.

AU - Xu, Tao

AU - Francis Ouellette, B. F.

PY - 2016/4/19

Y1 - 2016/4/19

N2 - Background We present Pegasys-a flexible, modular and customizable software system that facilitates the execution and data integration from heterogeneous biological sequence analysis tools. Results The Pegasys system includes numerous tools for pair-wise and multiple sequence alignment, ab initio gene prediction, RNA gene detection, masking repetitive sequences in genomic DNA as well as filters for database formatting and processing raw output from various analysis tools. We introduce a novel data structure for creating workflows of sequence analyses and a unified data model to store its results. The software allows users to dynamically create analysis workflows at run-time by manipulating a graphical user interface. All non-serial dependent analyses are executed in parallel on a compute cluster for efficiency of data generation. The uniform data model and backend relational database management system of Pegasys allow for results of heterogeneous programs included in the workflow to be integrated and exported into General Feature Format for further analyses in GFF-dependent tools, or GAME XML for import into the Apollo genome editor. The modularity of the design allows for new tools to be added to the system with little programmer overhead. The database application programming interface allows programmatic access to the data stored in the backend through SQL queries. Conclusions The Pegasys system enables biologists and bioinformaticians to create and manage sequence analysis workflows. The software is released under the Open Source GNU General Public License. All source code and documentation is available for download at http://bioinformatics.ubc.ca/pegasys/.

AB - Background We present Pegasys-a flexible, modular and customizable software system that facilitates the execution and data integration from heterogeneous biological sequence analysis tools. Results The Pegasys system includes numerous tools for pair-wise and multiple sequence alignment, ab initio gene prediction, RNA gene detection, masking repetitive sequences in genomic DNA as well as filters for database formatting and processing raw output from various analysis tools. We introduce a novel data structure for creating workflows of sequence analyses and a unified data model to store its results. The software allows users to dynamically create analysis workflows at run-time by manipulating a graphical user interface. All non-serial dependent analyses are executed in parallel on a compute cluster for efficiency of data generation. The uniform data model and backend relational database management system of Pegasys allow for results of heterogeneous programs included in the workflow to be integrated and exported into General Feature Format for further analyses in GFF-dependent tools, or GAME XML for import into the Apollo genome editor. The modularity of the design allows for new tools to be added to the system with little programmer overhead. The database application programming interface allows programmatic access to the data stored in the backend through SQL queries. Conclusions The Pegasys system enables biologists and bioinformaticians to create and manage sequence analysis workflows. The software is released under the Open Source GNU General Public License. All source code and documentation is available for download at http://bioinformatics.ubc.ca/pegasys/.

UR - http://www.scopus.com/inward/record.url?scp=85052652631&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85052652631&partnerID=8YFLogxK

M3 - Chapter

AN - SCOPUS:85052652631

SN - 9781926692975

SP - 286

EP - 310

BT - Data Structure and Software Engineering

PB - Apple Academic Press

ER -