Pegasys: Software for executing and integrating analyses of biological sequences

Sohrab P. Shah, David Y.M. He, Jessica N. Sawkins, Jeffrey C. Druce, Gerald Quon, Drew Lett, Grace X.Y. Zheng, Tao Xu, B. F.Francis Ouellette

Research output: Contribution to journalArticle

66 Citations (Scopus)

Abstract

Background: We present Pegasys - a flexible, modular and customizable software system that facilitates the execution and data integration from heterogeneous biological sequence analysis tools. Results: The Pegasys system includes numerous tools for pair-wise and multiple sequence alignment, ab initio gene prediction, RNA gene detection, masking repetitive sequences in genomic DNA as well as filters for database formatting and processing raw output from various analysis tools. We introduce a novel data structure for creating workflows of sequence analyses and a unified data model to store its results. The software allows users to dynamically create analysis workflows at run-time by manipulating a graphical user interface. All non-serial dependent analyses are executed in parallel on a compute cluster for efficiency of data generation. The uniform data model and backend relational database management system of Pegasys allow for results of heterogeneous programs included in the workflow to be integrated and exported into General Feature Format for further analyses in GFF-dependent tools, or GAME XML for import into the Apollo genome editor. The modularity of the design allows for new tools to be added to the system with little programmer overhead. The database application programming interface allows programmatic access to the data stored in the backend through SQL queries. Conclusions: The Pegasys system enables biologists and bioinformaticians to create and manage sequence analysis workflows. The software is released under the Open Source GNU General Public License.

Original languageEnglish (US)
Article number40
JournalBMC Bioinformatics
Volume5
DOIs
StatePublished - Apr 19 2004
Externally publishedYes

Fingerprint

Workflow
Sequence Analysis
Software
Work Flow
Data structures
Genes
Database Management Systems
Data Model
Databases
Sequence Alignment
Nucleic Acid Repetitive Sequences
Gene
Licensure
Multiple Sequence Alignment
Dependent
Data integration
Data Integration
Masking
Graphical User Interface
Graphical user interfaces

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

Shah, S. P., He, D. Y. M., Sawkins, J. N., Druce, J. C., Quon, G., Lett, D., ... Ouellette, B. F. F. (2004). Pegasys: Software for executing and integrating analyses of biological sequences. BMC Bioinformatics, 5, [40]. https://doi.org/10.1186/1471-2105-5-40

Pegasys : Software for executing and integrating analyses of biological sequences. / Shah, Sohrab P.; He, David Y.M.; Sawkins, Jessica N.; Druce, Jeffrey C.; Quon, Gerald; Lett, Drew; Zheng, Grace X.Y.; Xu, Tao; Ouellette, B. F.Francis.

In: BMC Bioinformatics, Vol. 5, 40, 19.04.2004.

Research output: Contribution to journalArticle

Shah, SP, He, DYM, Sawkins, JN, Druce, JC, Quon, G, Lett, D, Zheng, GXY, Xu, T & Ouellette, BFF 2004, 'Pegasys: Software for executing and integrating analyses of biological sequences', BMC Bioinformatics, vol. 5, 40. https://doi.org/10.1186/1471-2105-5-40
Shah, Sohrab P. ; He, David Y.M. ; Sawkins, Jessica N. ; Druce, Jeffrey C. ; Quon, Gerald ; Lett, Drew ; Zheng, Grace X.Y. ; Xu, Tao ; Ouellette, B. F.Francis. / Pegasys : Software for executing and integrating analyses of biological sequences. In: BMC Bioinformatics. 2004 ; Vol. 5.
@article{cf9d651ebb1142ec80ec4d35f780c711,
title = "Pegasys: Software for executing and integrating analyses of biological sequences",
abstract = "Background: We present Pegasys - a flexible, modular and customizable software system that facilitates the execution and data integration from heterogeneous biological sequence analysis tools. Results: The Pegasys system includes numerous tools for pair-wise and multiple sequence alignment, ab initio gene prediction, RNA gene detection, masking repetitive sequences in genomic DNA as well as filters for database formatting and processing raw output from various analysis tools. We introduce a novel data structure for creating workflows of sequence analyses and a unified data model to store its results. The software allows users to dynamically create analysis workflows at run-time by manipulating a graphical user interface. All non-serial dependent analyses are executed in parallel on a compute cluster for efficiency of data generation. The uniform data model and backend relational database management system of Pegasys allow for results of heterogeneous programs included in the workflow to be integrated and exported into General Feature Format for further analyses in GFF-dependent tools, or GAME XML for import into the Apollo genome editor. The modularity of the design allows for new tools to be added to the system with little programmer overhead. The database application programming interface allows programmatic access to the data stored in the backend through SQL queries. Conclusions: The Pegasys system enables biologists and bioinformaticians to create and manage sequence analysis workflows. The software is released under the Open Source GNU General Public License.",
author = "Shah, {Sohrab P.} and He, {David Y.M.} and Sawkins, {Jessica N.} and Druce, {Jeffrey C.} and Gerald Quon and Drew Lett and Zheng, {Grace X.Y.} and Tao Xu and Ouellette, {B. F.Francis}",
year = "2004",
month = "4",
day = "19",
doi = "10.1186/1471-2105-5-40",
language = "English (US)",
volume = "5",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",

}

TY - JOUR

T1 - Pegasys

T2 - Software for executing and integrating analyses of biological sequences

AU - Shah, Sohrab P.

AU - He, David Y.M.

AU - Sawkins, Jessica N.

AU - Druce, Jeffrey C.

AU - Quon, Gerald

AU - Lett, Drew

AU - Zheng, Grace X.Y.

AU - Xu, Tao

AU - Ouellette, B. F.Francis

PY - 2004/4/19

Y1 - 2004/4/19

N2 - Background: We present Pegasys - a flexible, modular and customizable software system that facilitates the execution and data integration from heterogeneous biological sequence analysis tools. Results: The Pegasys system includes numerous tools for pair-wise and multiple sequence alignment, ab initio gene prediction, RNA gene detection, masking repetitive sequences in genomic DNA as well as filters for database formatting and processing raw output from various analysis tools. We introduce a novel data structure for creating workflows of sequence analyses and a unified data model to store its results. The software allows users to dynamically create analysis workflows at run-time by manipulating a graphical user interface. All non-serial dependent analyses are executed in parallel on a compute cluster for efficiency of data generation. The uniform data model and backend relational database management system of Pegasys allow for results of heterogeneous programs included in the workflow to be integrated and exported into General Feature Format for further analyses in GFF-dependent tools, or GAME XML for import into the Apollo genome editor. The modularity of the design allows for new tools to be added to the system with little programmer overhead. The database application programming interface allows programmatic access to the data stored in the backend through SQL queries. Conclusions: The Pegasys system enables biologists and bioinformaticians to create and manage sequence analysis workflows. The software is released under the Open Source GNU General Public License.

AB - Background: We present Pegasys - a flexible, modular and customizable software system that facilitates the execution and data integration from heterogeneous biological sequence analysis tools. Results: The Pegasys system includes numerous tools for pair-wise and multiple sequence alignment, ab initio gene prediction, RNA gene detection, masking repetitive sequences in genomic DNA as well as filters for database formatting and processing raw output from various analysis tools. We introduce a novel data structure for creating workflows of sequence analyses and a unified data model to store its results. The software allows users to dynamically create analysis workflows at run-time by manipulating a graphical user interface. All non-serial dependent analyses are executed in parallel on a compute cluster for efficiency of data generation. The uniform data model and backend relational database management system of Pegasys allow for results of heterogeneous programs included in the workflow to be integrated and exported into General Feature Format for further analyses in GFF-dependent tools, or GAME XML for import into the Apollo genome editor. The modularity of the design allows for new tools to be added to the system with little programmer overhead. The database application programming interface allows programmatic access to the data stored in the backend through SQL queries. Conclusions: The Pegasys system enables biologists and bioinformaticians to create and manage sequence analysis workflows. The software is released under the Open Source GNU General Public License.

UR - http://www.scopus.com/inward/record.url?scp=2942544409&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=2942544409&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-5-40

DO - 10.1186/1471-2105-5-40

M3 - Article

C2 - 15096276

AN - SCOPUS:2942544409

VL - 5

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 40

ER -