Accurately predicting small molecule partitioning and hydrophobicity is critical in the drug discovery process. There are many heterogeneous chemical environments within a cell and entire human body. For example, drugs must be able to cross the hydrophobic cellular membrane to reach their intracellular targets, and hydrophobicity is an important driving force for drug-protein binding. Atomistic molecular dynamics (MD) simulations are routinely used to calculate free energies of small molecules binding to proteins, crossing lipid membranes, and solvation but are computationally expensive. Machine learning (ML) and empirical methods are also used throughout drug discovery but rely on experimental data, limiting the domain of applicability. We present atomistic MD simulations calculating 15,000 small molecule free energies of transfer from water to cyclohexane. This large data set is used to train ML models that predict the free energies of transfer. We show that a spatial graph neural network model achieves the highest accuracy, followed closely by a 3D-convolutional neural network, and shallow learning based on the chemical fingerprint is significantly less accurate. A mean absolute error of ∼4 kJ/mol compared to the MD calculations was achieved for our best ML model. We also show that including data from the MD simulation improves the predictions, tests the transferability of each model to a diverse set of molecules, and show multitask learning improves the predictions. This work provides insight into the hydrophobicity of small molecules and ML cheminformatics modeling, and our data set will be useful for designing and testing future ML cheminformatics methods.
ASJC Scopus subject areas
- Chemical Engineering(all)
- Computer Science Applications
- Library and Information Sciences