Toward fully automated high performance computing drug discovery: A massively parallel virtual screening pipeline for docking and molecular mechanics/generalized born surface area rescoring to improve enrichment

Xiaohua Zhang, Sergio E. Wong, Felice C Lightstone

Research output: Contribution to journalArticle

41 Citations (Scopus)

Abstract

In this work we announce and evaluate a high throughput virtual screening pipeline for in-silico screening of virtual compound databases using high performance computing (HPC). Notable features of this pipeline are an automated receptor preparation scheme with unsupervised binding site identification. The pipeline includes receptor/target preparation, ligand preparation, VinaLC docking calculation, and molecular mechanics/generalized Born surface area (MM/GBSA) rescoring using the GB model by Onufriev and co-workers [J. Chem. Theory Comput. 2007, 3, 156-169]. Furthermore, we leverage HPC resources to perform an unprecedented, comprehensive evaluation of MM/GBSA rescoring when applied to the DUD-E data set (Directory of Useful Decoys: Enhanced), in which we selected 38 protein targets and a total of ∼0.7 million actives and decoys. The computer wall time for virtual screening has been reduced drastically on HPC machines, which increases the feasibility of extremely large ligand database screening with more accurate methods. HPC resources allowed us to rescore 20 poses per compound and evaluate the optimal number of poses to rescore. We find that keeping 5-10 poses is a good compromise between accuracy and computational expense. Overall the results demonstrate that MM/GBSA rescoring has higher average receiver operating characteristic (ROC) area under curve (AUC) values and consistently better early recovery of actives than Vina docking alone. Specifically, the enrichment performance is target-dependent. MM/GBSA rescoring significantly out performs Vina docking for the folate enzymes, kinases, and several other enzymes. The more accurate energy function and solvation terms of the MM/GBSA method allow MM/GBSA to achieve better enrichment, but the rescoring is still limited by the docking method to generate the poses with the correct binding modes.

Original languageEnglish (US)
Pages (from-to)324-337
Number of pages14
JournalJournal of Chemical Information and Modeling
Volume54
Issue number1
DOIs
StatePublished - Jan 27 2014
Externally publishedYes

Fingerprint

Computing Methodologies
Molecular mechanics
Drug Discovery
Mechanics
mechanic
Screening
Pipelines
drug
performance
Enzymes
Ligands
Databases
Directories
Solvation
Binding sites
Folic Acid
ROC Curve
co-worker
Computer Simulation
Area Under Curve

ASJC Scopus subject areas

  • Chemistry(all)
  • Chemical Engineering(all)
  • Computer Science Applications
  • Library and Information Sciences
  • Medicine(all)

Cite this

@article{3e8a2e30731e4128812e67860bc637b0,
title = "Toward fully automated high performance computing drug discovery: A massively parallel virtual screening pipeline for docking and molecular mechanics/generalized born surface area rescoring to improve enrichment",
abstract = "In this work we announce and evaluate a high throughput virtual screening pipeline for in-silico screening of virtual compound databases using high performance computing (HPC). Notable features of this pipeline are an automated receptor preparation scheme with unsupervised binding site identification. The pipeline includes receptor/target preparation, ligand preparation, VinaLC docking calculation, and molecular mechanics/generalized Born surface area (MM/GBSA) rescoring using the GB model by Onufriev and co-workers [J. Chem. Theory Comput. 2007, 3, 156-169]. Furthermore, we leverage HPC resources to perform an unprecedented, comprehensive evaluation of MM/GBSA rescoring when applied to the DUD-E data set (Directory of Useful Decoys: Enhanced), in which we selected 38 protein targets and a total of ∼0.7 million actives and decoys. The computer wall time for virtual screening has been reduced drastically on HPC machines, which increases the feasibility of extremely large ligand database screening with more accurate methods. HPC resources allowed us to rescore 20 poses per compound and evaluate the optimal number of poses to rescore. We find that keeping 5-10 poses is a good compromise between accuracy and computational expense. Overall the results demonstrate that MM/GBSA rescoring has higher average receiver operating characteristic (ROC) area under curve (AUC) values and consistently better early recovery of actives than Vina docking alone. Specifically, the enrichment performance is target-dependent. MM/GBSA rescoring significantly out performs Vina docking for the folate enzymes, kinases, and several other enzymes. The more accurate energy function and solvation terms of the MM/GBSA method allow MM/GBSA to achieve better enrichment, but the rescoring is still limited by the docking method to generate the poses with the correct binding modes.",
author = "Xiaohua Zhang and Wong, {Sergio E.} and Lightstone, {Felice C}",
year = "2014",
month = "1",
day = "27",
doi = "10.1021/ci4005145",
language = "English (US)",
volume = "54",
pages = "324--337",
journal = "Journal of Chemical Information and Computer Sciences",
issn = "0095-2338",
publisher = "American Chemical Society",
number = "1",

}

TY - JOUR

T1 - Toward fully automated high performance computing drug discovery

T2 - A massively parallel virtual screening pipeline for docking and molecular mechanics/generalized born surface area rescoring to improve enrichment

AU - Zhang, Xiaohua

AU - Wong, Sergio E.

AU - Lightstone, Felice C

PY - 2014/1/27

Y1 - 2014/1/27

N2 - In this work we announce and evaluate a high throughput virtual screening pipeline for in-silico screening of virtual compound databases using high performance computing (HPC). Notable features of this pipeline are an automated receptor preparation scheme with unsupervised binding site identification. The pipeline includes receptor/target preparation, ligand preparation, VinaLC docking calculation, and molecular mechanics/generalized Born surface area (MM/GBSA) rescoring using the GB model by Onufriev and co-workers [J. Chem. Theory Comput. 2007, 3, 156-169]. Furthermore, we leverage HPC resources to perform an unprecedented, comprehensive evaluation of MM/GBSA rescoring when applied to the DUD-E data set (Directory of Useful Decoys: Enhanced), in which we selected 38 protein targets and a total of ∼0.7 million actives and decoys. The computer wall time for virtual screening has been reduced drastically on HPC machines, which increases the feasibility of extremely large ligand database screening with more accurate methods. HPC resources allowed us to rescore 20 poses per compound and evaluate the optimal number of poses to rescore. We find that keeping 5-10 poses is a good compromise between accuracy and computational expense. Overall the results demonstrate that MM/GBSA rescoring has higher average receiver operating characteristic (ROC) area under curve (AUC) values and consistently better early recovery of actives than Vina docking alone. Specifically, the enrichment performance is target-dependent. MM/GBSA rescoring significantly out performs Vina docking for the folate enzymes, kinases, and several other enzymes. The more accurate energy function and solvation terms of the MM/GBSA method allow MM/GBSA to achieve better enrichment, but the rescoring is still limited by the docking method to generate the poses with the correct binding modes.

AB - In this work we announce and evaluate a high throughput virtual screening pipeline for in-silico screening of virtual compound databases using high performance computing (HPC). Notable features of this pipeline are an automated receptor preparation scheme with unsupervised binding site identification. The pipeline includes receptor/target preparation, ligand preparation, VinaLC docking calculation, and molecular mechanics/generalized Born surface area (MM/GBSA) rescoring using the GB model by Onufriev and co-workers [J. Chem. Theory Comput. 2007, 3, 156-169]. Furthermore, we leverage HPC resources to perform an unprecedented, comprehensive evaluation of MM/GBSA rescoring when applied to the DUD-E data set (Directory of Useful Decoys: Enhanced), in which we selected 38 protein targets and a total of ∼0.7 million actives and decoys. The computer wall time for virtual screening has been reduced drastically on HPC machines, which increases the feasibility of extremely large ligand database screening with more accurate methods. HPC resources allowed us to rescore 20 poses per compound and evaluate the optimal number of poses to rescore. We find that keeping 5-10 poses is a good compromise between accuracy and computational expense. Overall the results demonstrate that MM/GBSA rescoring has higher average receiver operating characteristic (ROC) area under curve (AUC) values and consistently better early recovery of actives than Vina docking alone. Specifically, the enrichment performance is target-dependent. MM/GBSA rescoring significantly out performs Vina docking for the folate enzymes, kinases, and several other enzymes. The more accurate energy function and solvation terms of the MM/GBSA method allow MM/GBSA to achieve better enrichment, but the rescoring is still limited by the docking method to generate the poses with the correct binding modes.

UR - http://www.scopus.com/inward/record.url?scp=84893388298&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84893388298&partnerID=8YFLogxK

U2 - 10.1021/ci4005145

DO - 10.1021/ci4005145

M3 - Article

C2 - 24358939

AN - SCOPUS:84893388298

VL - 54

SP - 324

EP - 337

JO - Journal of Chemical Information and Computer Sciences

JF - Journal of Chemical Information and Computer Sciences

SN - 0095-2338

IS - 1

ER -