Thanks to advances in biophysics, we are often able to find the
structure of proteins from experimental techniques like Cryo-EM or
X-ray crystallography. These structures can be powerful aides in
designing small molecules. The technique of Molecular docking performs
geometric calculations to find a “binding pose” with the small
molecule interacting with the protein in question in a suitable
binding pocket (that is, a region on the protein which has a groove in
which the small molecule can rest). For more information about
docking, check out the Autodock Vina paper:
Trott, Oleg, and Arthur J. Olson. “AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading.” Journal of computational chemistry 31.2 (2010): 455-461.
DeepChem has some utilities to help find binding pockets on proteins
automatically. For now, these utilities are simple, but we will
improve these in future versions of DeepChem.
Many times when working with a new protein or other macromolecule,
it’s not clear what zones of the macromolecule may be good targets
for potential ligands or other molecules to interact with. This
abstract class provides a template for child classes that
algorithmically locate potential binding pockets that are good
potential interaction sites.
Note that potential interactions sites can be found by many
different methods, and that this abstract class doesn’t specify the
technique to be used.
This function computes putative binding pockets on this protein.
This class uses the ConvexHull to compute binding pockets. Each
face of the hull is converted into a coordinate box used for
binding.
Parameters:
macromolecule_file (str) – Location of the macromolecule file to load
Pose generation is the task of finding a “pose”, that is a geometric
configuration of a small molecule interacting with a protein. Pose
generation is a complex process, so for now DeepChem relies on
external software to perform pose generation. This software is invoked
and installed under the hood.
A Pose Generator computes low energy conformations for molecular complexes.
Many questions in structural biophysics reduce to that of computing
the binding free energy of molecular complexes. A key step towards
computing the binding free energy of two complexes is to find low
energy “poses”, that is energetically favorable conformations of
molecules with respect to each other. One application of this
technique is to find low energy poses for protein-ligand
interactions.
Generates a list of low energy poses for molecular complex
Parameters:
molecular_complexes (Tuple[str, str]) – A representation of a molecular complex. This tuple is
(protein_file, ligand_file).
centroid (np.ndarray, optional (default None)) – The centroid to dock against. Is computed if not specified.
box_dims (np.ndarray, optional (default None)) – A numpy array of shape (3,) holding the size of the box to dock.
If not specified is set to size of molecular complex plus 5 angstroms.
exhaustiveness (int, optional (default 10)) – Tells pose generator how exhaustive it should be with pose
generation.
num_modes (int, optional (default 9)) – Tells pose generator how many binding modes it should generate at
each invocation.
num_pockets (int, optional (default None)) – If specified, self.pocket_finder must be set. Will only
generate poses for the first num_pockets returned by
self.pocket_finder.
out_dir (str, optional (default None)) – If specified, write generated poses to this directory.
generate_score (bool, optional (default False)) – If True, the pose generator will return scores for complexes.
This is used typically when invoking external docking programs
that compute scores.
Return type:
A list of molecular complexes in energetically favorable poses.
This class requires RDKit and vina to be installed. As on 9-March-22,
Vina is not available on Windows. Hence, this utility is currently
available only on Ubuntu and MacOS.
Generates the docked complex and outputs files for docked complex.
Parameters:
molecular_complexes (Tuple[str, str]) – A representation of a molecular complex. This tuple is
(protein_file, ligand_file). The protein should be a pdb file
and the ligand should be an sdf file.
centroid (np.ndarray, optional) – The centroid to dock against. Is computed if not specified.
box_dims (np.ndarray, optional) – A numpy array of shape (3,) holding the size of the box to dock. If not
specified is set to size of molecular complex plus 5 angstroms.
exhaustiveness (int, optional (default 10)) – Tells Autodock Vina how exhaustive it should be with pose generation. A
higher value of exhaustiveness implies more computation effort for the
docking experiment.
num_modes (int, optional (default 9)) – Tells Autodock Vina how many binding modes it should generate at
each invocation.
num_pockets (int, optional (default None)) – If specified, self.pocket_finder must be set. Will only
generate poses for the first num_pockets returned by
self.pocket_finder.
out_dir (str, optional) – If specified, write generated poses to this directory.
generate_score (bool, optional (default False)) – If True, the pose generator will return scores for complexes.
This is used typically when invoking external docking programs
that compute scores.
Tuple of (docked_poses, scores), docked_poses, or scores. docked_poses
is a list of docked molecular complexes. Each entry in this list
contains a (protein_mol, ligand_mol) pair of RDKit molecules.
scores is a list of binding free energies predicted by Vina.
Return type:
Tuple[docked_poses, scores] or docked_poses or scores
This class uses GNINA (a deep learning framework for molecular
docking) to generate binding poses. It downloads the GNINA
executable to DEEPCHEM_DATA_DIR (an environment variable you set)
and invokes the executable to perform pose generation.
GNINA uses pre-trained convolutional neural network (CNN) scoring
functions to rank binding poses based on learned representations of
3D protein-ligand interactions. It has been shown to outperform
AutoDock Vina in virtual screening applications [1]_.
Generates the docked complex and outputs files for docked complex.
Parameters:
molecular_complexes (Tuple[str, str]) – A representation of a molecular complex. This tuple is
(protein_file, ligand_file).
centroid (np.ndarray, optional (default None)) – The centroid to dock against. Is computed if not specified.
box_dims (np.ndarray, optional (default None)) – A numpy array of shape (3,) holding the size of the box to dock.
If not specified is set to size of molecular complex plus 4 angstroms.
exhaustiveness (int (default 8)) – Tells GNINA how exhaustive it should be with pose
generation.
num_modes (int (default 9)) – Tells GNINA how many binding modes it should generate at
each invocation.
out_dir (str, optional) – If specified, write generated poses to this directory.
generate_scores (bool, optional (default True)) – If True, the pose generator will return scores for complexes.
This is used typically when invoking external docking programs
that compute scores.
Tuple of (docked_poses, scores) or docked_poses. docked_poses
is a list of docked molecular complexes. Each entry in this list
contains a (protein_mol, ligand_mol) pair of RDKit molecules.
scores is an array of binding affinities (kcal/mol),
CNN pose scores, and CNN affinities predicted by GNINA.
The dc.dock.docking module provides a generic docking
implementation that depends on provide pose generation and pose
scoring utilities to perform docking. This implementation is generic.
This class provides a docking engine which uses provided models for
featurization, pose generation, and scoring. Most pieces of docking
software are command line tools that are invoked from the shell. The
goal of this class is to provide a python clean API for invoking
molecular docking programmatically.
The implementation of this class is lightweight and generic. It’s
expected that the majority of the heavy lifting will be done by pose
generation and scoring classes that are provided to this class.
This docking function uses this object’s featurizer, pose
generator, and scoring model to make docking predictions. This
function is written in generic style so
Parameters:
molecular_complex (Tuple[str, str]) – A representation of a molecular complex. This tuple is
(protein_file, ligand_file).
centroid (np.ndarray, optional (default None)) – The centroid to dock against. Is computed if not specified.
box_dims (np.ndarray, optional (default None)) – A numpy array of shape (3,) holding the size of the box to dock. If not
specified is set to size of molecular complex plus 5 angstroms.
exhaustiveness (int, optional (default 10)) – Tells pose generator how exhaustive it should be with pose
generation.
num_modes (int, optional (default 9)) – Tells pose generator how many binding modes it should generate at
each invocation.
num_pockets (int, optional (default None)) – If specified, self.pocket_finder must be set. Will only
generate poses for the first num_pockets returned by
self.pocket_finder.
out_dir (str, optional (default None)) – If specified, write generated poses to this directory.
use_pose_generator_scores (bool, optional (default False)) – If True, ask pose generator to generate scores. This cannot be
True if self.featurizer and self.scoring_model are set
since those will be used to generate scores in that case.
Returns:
A generator. If use_pose_generator_scores==True or
self.scoring_model is set, then will yield tuples
(posed_complex, score). Else will yield posed_complex.
Return type:
Generator[Tuple[posed_complex, score]] or Generator[posed_complex]
Computes the Vina Energy function for two molecular conformations
Parameters:
coords1 (np.ndarray) – Molecular coordinates of shape (N, 3)
coords2 (np.ndarray) – Molecular coordinates of shape (M, 3)
weights (np.ndarray) – A numpy array of shape (5,). The 5 values are weights for repulsion interaction term,
hydrophobic interaction term, hydrogen bond interaction term,
first Gaussian interaction term and second Gaussian interaction term.
wrot (float) – The scaling factor for nonlinearity
Nrot (int) – Number of rotatable bonds in this calculation