Deep learning models are often said to be made up of “layers”.
Intuitively, a “layer” is a function which transforms some
tensor into another tensor. DeepChem maintains an extensive
collection of layers which perform various useful scientific
transformations. For now, most layers are Keras only but over
time we expect this support to expand to other types of models
and layers.
The “layers cheatsheet” lists various scientifically relevant differentiable layers implemented in DeepChem.
Note that some layers implemented for specific model architectures such as GROVER
and Attention layers, this is indicated in the Model column of the table.
In order to use the layers, make sure that the backend (Keras and tensorflow, Pytorch or Jax) is installed.
Tensorflow Keras Layers
These layers are subclasses of the tensorflow.keras.layers.Layer class.
This layer implements the graph convolution introduced in [1]_. The graph
convolution combines per-node feature vectures in a nonlinear fashion with
the feature vectors for neighboring nodes. This “blends” information in
local neighborhoods of a graph.
out_channel (int) – The number of output channels per graph node.
min_deg (int, optional (default 0)) – The minimum allowed degree for each graph node.
max_deg (int, optional (default 10)) – The maximum allowed degree for each graph node. Note that this
is set to 10 to handle complex molecules (some organometallic
compounds have strange structures). If you’re using this for
non-molecular applications, you may need to set this much higher
depending on your dataset.
activation_fn (function) – A nonlinear activation function to apply. If you’re not sure,
tf.nn.relu is probably a good default for your application.
Creates the variables of the layer (for subclass implementers).
This is a method that implementers of subclasses of Layer or Model
can override if they need a state-creation step in-between
layer instantiation and layer call. It is invoked automatically before
the first execution of call().
This is typically used to create the weights of Layer subclasses
(at the discretion of the subclass implementer).
Parameters:
input_shape – Instance of TensorShape, or list of instances of
TensorShape if the layer expects a list of inputs
(one instance per input).
A layer config is a Python dictionary (serializable)
containing the configuration of a layer.
The same layer can be reinstantiated later
(without its trained weights) from this configuration.
The config of a layer does not include connectivity
information, nor the layer class name. These are handled
by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of
dict every time it is called. The callers should make a copy of the
returned dict if they want to modify it.
The call() method may not create state (except in its first
invocation, wrapping the creation of variables or other resources in
tf.init_scope()). It is recommended to create state, including
tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors.
The first positional inputs argument is subject to special rules:
- inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value
of a keyword argument.
NumPy array or Python scalar values in inputs get cast as
tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method)
using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs.
If a layer has tensor arguments in *args or **kwargs, their
casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs
only.
Integration with various ecosystem packages like TFMOT, TFLite,
TF.js, etc is only supported for inputs and not for tensors in
positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although
this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although
this is not recommended, for the reasons above.
The following optional keyword arguments are reserved:
- training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a
mask argument, its default value will be set to the mask
generated for inputs by the previous layer (if input did come
from a layer that generated a corresponding mask, i.e. if it came
from a Keras layer with masking support).
A GraphPool gathers data from local neighborhoods of a graph.
This layer does a max-pooling over the feature vectors of atoms in a
neighborhood. You can think of this layer as analogous to a max-pooling
layer for 2D convolutions but which operates on graphs instead. This
technique is described in [1]_.
min_deg (int, optional (default 0)) – The minimum allowed degree for each graph node.
max_deg (int, optional (default 10)) – The maximum allowed degree for each graph node. Note that this
is set to 10 to handle complex molecules (some organometallic
compounds have strange structures). If you’re using this for
non-molecular applications, you may need to set this much higher
depending on your dataset.
A layer config is a Python dictionary (serializable)
containing the configuration of a layer.
The same layer can be reinstantiated later
(without its trained weights) from this configuration.
The config of a layer does not include connectivity
information, nor the layer class name. These are handled
by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of
dict every time it is called. The callers should make a copy of the
returned dict if they want to modify it.
The call() method may not create state (except in its first
invocation, wrapping the creation of variables or other resources in
tf.init_scope()). It is recommended to create state, including
tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors.
The first positional inputs argument is subject to special rules:
- inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value
of a keyword argument.
NumPy array or Python scalar values in inputs get cast as
tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method)
using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs.
If a layer has tensor arguments in *args or **kwargs, their
casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs
only.
Integration with various ecosystem packages like TFMOT, TFLite,
TF.js, etc is only supported for inputs and not for tensors in
positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although
this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although
this is not recommended, for the reasons above.
The following optional keyword arguments are reserved:
- training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a
mask argument, its default value will be set to the mask
generated for inputs by the previous layer (if input did come
from a layer that generated a corresponding mask, i.e. if it came
from a Keras layer with masking support).
A GraphGather layer pools node-level feature vectors to create a graph feature vector.
Many graph convolutional networks manipulate feature vectors per
graph-node. For a molecule for example, each node might represent an
atom, and the network would manipulate atomic feature vectors that
summarize the local chemistry of the atom. However, at the end of
the application, we will likely want to work with a molecule level
feature representation. The GraphGather layer creates a graph level
feature vector by combining all the node-level feature vectors.
One subtlety about this layer is that it depends on the
batch_size. This is done for internal implementation reasons. The
GraphConv, and GraphPool layers pool all nodes from all graphs
in a batch that’s being processed. The GraphGather reassembles
these jumbled node feature vectors into per-graph feature vectors.
A layer config is a Python dictionary (serializable)
containing the configuration of a layer.
The same layer can be reinstantiated later
(without its trained weights) from this configuration.
The config of a layer does not include connectivity
information, nor the layer class name. These are handled
by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of
dict every time it is called. The callers should make a copy of the
returned dict if they want to modify it.
inputs (list) – This list should consist of inputs = [atom_features, deg_slice,
membership, deg_adj_list placeholders…]. These are all
tensors that are created/process by GraphConv and GraphPool
Graph convolution layer used in MolGAN model.
MolGAN is a WGAN type model for generation of small molecules.
Not used directly, higher level layers like MolGANMultiConvolutionLayer use it.
This layer performs basic convolution on one-hot encoded matrices containing
atom and bond information. This layer also accepts three inputs for the case
when convolution is performed more than once and results of previous convolution
need to used. It was done in such a way to avoid creating another layer that
accepts three inputs rather than two. The last input layer is so-called
hidden_layer and it hold results of the convolution while first two are unchanged
input tensors.
Example
See: MolGANMultiConvolutionLayer for using in layers.
Graph Aggregation layer used in MolGAN model.
MolGAN is a WGAN type model for generation of small molecules.
Performs aggregation on tensor resulting from convolution layers.
Given its simple nature it might be removed in future and moved to
MolGANEncoderLayer.
Multiple pass convolution layer used in MolGAN model.
MolGAN is a WGAN type model for generation of small molecules.
It takes outputs of previous convolution layer and uses
them as inputs for the next one.
It simplifies the overall framework, but might be moved to
MolGANEncoderLayer in the future in order to reduce number of layers.
units (Tuple, optional (default=(128,64)), min_length=2) – List of dimensions used by consecutive convolution layers.
The more values the more convolution layers invoked.
activation (function, optional (default=tanh)) – activation function used across model, default is Tanh
dropout_rate (float, optional (default=0.0)) – Used by dropout layer
edges (int, optional (default=0)) – Controls how many dense layers use for single convolution unit.
Typically matches number of bond types used in the molecule.
name (string, optional (default="")) – Name of the layer
Main learning layer used by MolGAN model.
MolGAN is a WGAN type model for generation of small molecules.
It role is to further simplify model.
This layer can be manually built by stacking graph convolution layers
followed by graph aggregation.
units (List, optional (default=[(128, 64), 128])) – List of units for MolGANMultiConvolutionLayer and GraphAggregationLayer
i.e. [(128,64),128] means two convolution layers dims = [128,64]
followed by aggregation layer dims=128
activation (function, optional (default=Tanh)) – activation function used across model, default is Tanh
dropout_rate (float, optional (default=0.0)) – Used by dropout layer
edges (int, optional (default=0)) – Controls how many dense layers use for single convolution unit.
Typically matches number of bond types used in the molecule.
name (string, optional (default="")) – Name of the layer
This layer performs a single step LSTM update. Note that it is not
a full LSTM recurrent network. The LSTMStep layer is useful as a
primitive for designing layers such as the AttnLSTMEmbedding or the
IterRefLSTMEmbedding below.
A layer config is a Python dictionary (serializable)
containing the configuration of a layer.
The same layer can be reinstantiated later
(without its trained weights) from this configuration.
The config of a layer does not include connectivity
information, nor the layer class name. These are handled
by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of
dict every time it is called. The callers should make a copy of the
returned dict if they want to modify it.
Implements AttnLSTM as in matching networks paper.
The AttnLSTM embedding adjusts two sets of vectors, the “test” and
“support” sets. The “support” consists of a set of evidence vectors.
Think of these as the small training set for low-data machine
learning. The “test” consists of the queries we wish to answer with
the small amounts of available data. The AttnLSTMEmbdding allows us to
modify the embedding of the “test” set depending on the contents of
the “support”. The AttnLSTMEmbedding is thus a type of learnable
metric that allows a network to modify its internal notion of
distance.
A layer config is a Python dictionary (serializable)
containing the configuration of a layer.
The same layer can be reinstantiated later
(without its trained weights) from this configuration.
The config of a layer does not include connectivity
information, nor the layer class name. These are handled
by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of
dict every time it is called. The callers should make a copy of the
returned dict if they want to modify it.
Creates the variables of the layer (for subclass implementers).
This is a method that implementers of subclasses of Layer or Model
can override if they need a state-creation step in-between
layer instantiation and layer call. It is invoked automatically before
the first execution of call().
This is typically used to create the weights of Layer subclasses
(at the discretion of the subclass implementer).
Parameters:
input_shape – Instance of TensorShape, or list of instances of
TensorShape if the layer expects a list of inputs
(one instance per input).
inputs (list) – List of two tensors (X, Xp). X should be of shape (n_test,
n_feat) and Xp should be of shape (n_support, n_feat) where
n_test is the size of the test set, n_support that of the support
set, and n_feat is the number of per-atom features.
Returns:
Returns two tensors of same shape as input. Namely the output
shape will be [(n_test, n_feat), (n_support, n_feat)]
Much like AttnLSTMEmbedding, the IterRefLSTMEmbedding is another type
of learnable metric which adjusts “test” and “support.” Recall that
“support” is the small amount of data available in a low data machine
learning problem, and that “test” is the query. The AttnLSTMEmbedding
only modifies the “test” based on the contents of the support.
However, the IterRefLSTM modifies both the “support” and “test” based
on each other. This allows the learnable metric to be more malleable
than that from AttnLSTMEmbeding.
Unlike the AttnLSTM model which only modifies the test vectors
additively, this model allows for an additive update to be
performed to both test and support using information from each
other.
Parameters:
n_support (int) – Size of support set.
n_test (int) – Size of test set.
n_feat (int) – Number of input atom features
max_depth (int) – Number of LSTM Embedding layers.
A layer config is a Python dictionary (serializable)
containing the configuration of a layer.
The same layer can be reinstantiated later
(without its trained weights) from this configuration.
The config of a layer does not include connectivity
information, nor the layer class name. These are handled
by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of
dict every time it is called. The callers should make a copy of the
returned dict if they want to modify it.
Creates the variables of the layer (for subclass implementers).
This is a method that implementers of subclasses of Layer or Model
can override if they need a state-creation step in-between
layer instantiation and layer call. It is invoked automatically before
the first execution of call().
This is typically used to create the weights of Layer subclasses
(at the discretion of the subclass implementer).
Parameters:
input_shape – Instance of TensorShape, or list of instances of
TensorShape if the layer expects a list of inputs
(one instance per input).
inputs (list) – List of two tensors (X, Xp). X should be of shape (n_test,
n_feat) and Xp should be of shape (n_support, n_feat) where
n_test is the size of the test set, n_support that of the
support set, and n_feat is the number of per-atom features.
Returns:
Returns two tensors of same shape as input. Namely the output
shape will be [(n_test, n_feat), (n_support, n_feat)]
This is required for uncertainty prediction. The standard Keras
Dropout layer only performs dropout during training, but we
sometimes need to do it during prediction. The second input to this
layer should be a scalar equal to 0 or 1, indicating whether to
perform dropout.
A layer config is a Python dictionary (serializable)
containing the configuration of a layer.
The same layer can be reinstantiated later
(without its trained weights) from this configuration.
The config of a layer does not include connectivity
information, nor the layer class name. These are handled
by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of
dict every time it is called. The callers should make a copy of the
returned dict if they want to modify it.
The call() method may not create state (except in its first
invocation, wrapping the creation of variables or other resources in
tf.init_scope()). It is recommended to create state, including
tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors.
The first positional inputs argument is subject to special rules:
- inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value
of a keyword argument.
NumPy array or Python scalar values in inputs get cast as
tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method)
using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs.
If a layer has tensor arguments in *args or **kwargs, their
casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs
only.
Integration with various ecosystem packages like TFMOT, TFLite,
TF.js, etc is only supported for inputs and not for tensors in
positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although
this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although
this is not recommended, for the reasons above.
The following optional keyword arguments are reserved:
- training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a
mask argument, its default value will be set to the mask
generated for inputs by the previous layer (if input did come
from a layer that generated a corresponding mask, i.e. if it came
from a Keras layer with masking support).
A layer config is a Python dictionary (serializable)
containing the configuration of a layer.
The same layer can be reinstantiated later
(without its trained weights) from this configuration.
The config of a layer does not include connectivity
information, nor the layer class name. These are handled
by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of
dict every time it is called. The callers should make a copy of the
returned dict if they want to modify it.
Creates the variables of the layer (for subclass implementers).
This is a method that implementers of subclasses of Layer or Model
can override if they need a state-creation step in-between
layer instantiation and layer call. It is invoked automatically before
the first execution of call().
This is typically used to create the weights of Layer subclasses
(at the discretion of the subclass implementer).
Parameters:
input_shape – Instance of TensorShape, or list of instances of
TensorShape if the layer expects a list of inputs
(one instance per input).
The call() method may not create state (except in its first
invocation, wrapping the creation of variables or other resources in
tf.init_scope()). It is recommended to create state, including
tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors.
The first positional inputs argument is subject to special rules:
- inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value
of a keyword argument.
NumPy array or Python scalar values in inputs get cast as
tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method)
using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs.
If a layer has tensor arguments in *args or **kwargs, their
casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs
only.
Integration with various ecosystem packages like TFMOT, TFLite,
TF.js, etc is only supported for inputs and not for tensors in
positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although
this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although
this is not recommended, for the reasons above.
The following optional keyword arguments are reserved:
- training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a
mask argument, its default value will be set to the mask
generated for inputs by the previous layer (if input did come
from a layer that generated a corresponding mask, i.e. if it came
from a Keras layer with masking support).
This layer should have two inputs with the same shape, and its
output also has the same shape. Each element of the output is a
Gaussian distributed random number whose mean is the corresponding
element of the first input, and whose standard deviation is the
corresponding element of the second input.
Parameters:
training_only (bool) – if True, noise is only generated during training. During
prediction, the output is simply equal to the first input (that
is, the mean of the distribution used during training).
noise_epsilon (float) – The noise is scaled by this factor
A layer config is a Python dictionary (serializable)
containing the configuration of a layer.
The same layer can be reinstantiated later
(without its trained weights) from this configuration.
The config of a layer does not include connectivity
information, nor the layer class name. These are handled
by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of
dict every time it is called. The callers should make a copy of the
returned dict if they want to modify it.
The call() method may not create state (except in its first
invocation, wrapping the creation of variables or other resources in
tf.init_scope()). It is recommended to create state, including
tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors.
The first positional inputs argument is subject to special rules:
- inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value
of a keyword argument.
NumPy array or Python scalar values in inputs get cast as
tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method)
using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs.
If a layer has tensor arguments in *args or **kwargs, their
casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs
only.
Integration with various ecosystem packages like TFMOT, TFLite,
TF.js, etc is only supported for inputs and not for tensors in
positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although
this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although
this is not recommended, for the reasons above.
The following optional keyword arguments are reserved:
- training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a
mask argument, its default value will be set to the mask
generated for inputs by the previous layer (if input did come
from a layer that generated a corresponding mask, i.e. if it came
from a Keras layer with masking support).
A layer config is a Python dictionary (serializable)
containing the configuration of a layer.
The same layer can be reinstantiated later
(without its trained weights) from this configuration.
The config of a layer does not include connectivity
information, nor the layer class name. These are handled
by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of
dict every time it is called. The callers should make a copy of the
returned dict if they want to modify it.
The call() method may not create state (except in its first
invocation, wrapping the creation of variables or other resources in
tf.init_scope()). It is recommended to create state, including
tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors.
The first positional inputs argument is subject to special rules:
- inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value
of a keyword argument.
NumPy array or Python scalar values in inputs get cast as
tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method)
using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs.
If a layer has tensor arguments in *args or **kwargs, their
casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs
only.
Integration with various ecosystem packages like TFMOT, TFLite,
TF.js, etc is only supported for inputs and not for tensors in
positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although
this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although
this is not recommended, for the reasons above.
The following optional keyword arguments are reserved:
- training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a
mask argument, its default value will be set to the mask
generated for inputs by the previous layer (if input did come
from a layer that generated a corresponding mask, i.e. if it came
from a Keras layer with masking support).
A layer config is a Python dictionary (serializable)
containing the configuration of a layer.
The same layer can be reinstantiated later
(without its trained weights) from this configuration.
The config of a layer does not include connectivity
information, nor the layer class name. These are handled
by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of
dict every time it is called. The callers should make a copy of the
returned dict if they want to modify it.
Creates the variables of the layer (for subclass implementers).
This is a method that implementers of subclasses of Layer or Model
can override if they need a state-creation step in-between
layer instantiation and layer call. It is invoked automatically before
the first execution of call().
This is typically used to create the weights of Layer subclasses
(at the discretion of the subclass implementer).
Parameters:
input_shape – Instance of TensorShape, or list of instances of
TensorShape if the layer expects a list of inputs
(one instance per input).
Neighbor-lists (also called Verlet Lists) are a tool for grouping
atoms which are close to each other spatially. This layer computes a
Neighbor List from a provided tensor of atomic coordinates. You can
think of this as a general “k-means” layer, but optimized for the
case k==3.
A layer config is a Python dictionary (serializable)
containing the configuration of a layer.
The same layer can be reinstantiated later
(without its trained weights) from this configuration.
The config of a layer does not include connectivity
information, nor the layer class name. These are handled
by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of
dict every time it is called. The callers should make a copy of the
returned dict if they want to modify it.
The call() method may not create state (except in its first
invocation, wrapping the creation of variables or other resources in
tf.init_scope()). It is recommended to create state, including
tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors.
The first positional inputs argument is subject to special rules:
- inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value
of a keyword argument.
NumPy array or Python scalar values in inputs get cast as
tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method)
using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs.
If a layer has tensor arguments in *args or **kwargs, their
casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs
only.
Integration with various ecosystem packages like TFMOT, TFLite,
TF.js, etc is only supported for inputs and not for tensors in
positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although
this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although
this is not recommended, for the reasons above.
The following optional keyword arguments are reserved:
- training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a
mask argument, its default value will be set to the mask
generated for inputs by the previous layer (if input did come
from a layer that generated a corresponding mask, i.e. if it came
from a Keras layer with masking support).
# TODO(rbharath): Do we need to handle periodic boundary conditions
properly here?
# TODO(rbharath): This doesn’t handle boundaries well. We hard-code
# looking for n_nbr_cells neighbors, which isn’t right for boundary cells in
# the cube.
Suppose start is -10 Angstrom, stop is 10 Angstrom, nbr_cutoff is 1.
Then would return a list of length 20^3 whose entries would be
[(-10, -10, -10), (-10, -10, -9), …, (9, 9, 9)]
A layer config is a Python dictionary (serializable)
containing the configuration of a layer.
The same layer can be reinstantiated later
(without its trained weights) from this configuration.
The config of a layer does not include connectivity
information, nor the layer class name. These are handled
by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of
dict every time it is called. The callers should make a copy of the
returned dict if they want to modify it.
Creates the variables of the layer (for subclass implementers).
This is a method that implementers of subclasses of Layer or Model
can override if they need a state-creation step in-between
layer instantiation and layer call. It is invoked automatically before
the first execution of call().
This is typically used to create the weights of Layer subclasses
(at the discretion of the subclass implementer).
Parameters:
input_shape – Instance of TensorShape, or list of instances of
TensorShape if the layer expects a list of inputs
(one instance per input).
A layer config is a Python dictionary (serializable)
containing the configuration of a layer.
The same layer can be reinstantiated later
(without its trained weights) from this configuration.
The config of a layer does not include connectivity
information, nor the layer class name. These are handled
by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of
dict every time it is called. The callers should make a copy of the
returned dict if they want to modify it.
Creates the variables of the layer (for subclass implementers).
This is a method that implementers of subclasses of Layer or Model
can override if they need a state-creation step in-between
layer instantiation and layer call. It is invoked automatically before
the first execution of call().
This is typically used to create the weights of Layer subclasses
(at the discretion of the subclass implementer).
Parameters:
input_shape – Instance of TensorShape, or list of instances of
TensorShape if the layer expects a list of inputs
(one instance per input).
The call() method may not create state (except in its first
invocation, wrapping the creation of variables or other resources in
tf.init_scope()). It is recommended to create state, including
tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors.
The first positional inputs argument is subject to special rules:
- inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value
of a keyword argument.
NumPy array or Python scalar values in inputs get cast as
tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method)
using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs.
If a layer has tensor arguments in *args or **kwargs, their
casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs
only.
Integration with various ecosystem packages like TFMOT, TFLite,
TF.js, etc is only supported for inputs and not for tensors in
positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although
this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although
this is not recommended, for the reasons above.
The following optional keyword arguments are reserved:
- training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a
mask argument, its default value will be set to the mask
generated for inputs by the previous layer (if input did come
from a layer that generated a corresponding mask, i.e. if it came
from a Keras layer with masking support).
A layer config is a Python dictionary (serializable)
containing the configuration of a layer.
The same layer can be reinstantiated later
(without its trained weights) from this configuration.
The config of a layer does not include connectivity
information, nor the layer class name. These are handled
by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of
dict every time it is called. The callers should make a copy of the
returned dict if they want to modify it.
The call() method may not create state (except in its first
invocation, wrapping the creation of variables or other resources in
tf.init_scope()). It is recommended to create state, including
tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors.
The first positional inputs argument is subject to special rules:
- inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value
of a keyword argument.
NumPy array or Python scalar values in inputs get cast as
tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method)
using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs.
If a layer has tensor arguments in *args or **kwargs, their
casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs
only.
Integration with various ecosystem packages like TFMOT, TFLite,
TF.js, etc is only supported for inputs and not for tensors in
positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although
this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although
this is not recommended, for the reasons above.
The following optional keyword arguments are reserved:
- training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a
mask argument, its default value will be set to the mask
generated for inputs by the previous layer (if input did come
from a layer that generated a corresponding mask, i.e. if it came
from a Keras layer with masking support).
A layer config is a Python dictionary (serializable)
containing the configuration of a layer.
The same layer can be reinstantiated later
(without its trained weights) from this configuration.
The config of a layer does not include connectivity
information, nor the layer class name. These are handled
by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of
dict every time it is called. The callers should make a copy of the
returned dict if they want to modify it.
Creates the variables of the layer (for subclass implementers).
This is a method that implementers of subclasses of Layer or Model
can override if they need a state-creation step in-between
layer instantiation and layer call. It is invoked automatically before
the first execution of call().
This is typically used to create the weights of Layer subclasses
(at the discretion of the subclass implementer).
Parameters:
input_shape – Instance of TensorShape, or list of instances of
TensorShape if the layer expects a list of inputs
(one instance per input).
A layer config is a Python dictionary (serializable)
containing the configuration of a layer.
The same layer can be reinstantiated later
(without its trained weights) from this configuration.
The config of a layer does not include connectivity
information, nor the layer class name. These are handled
by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of
dict every time it is called. The callers should make a copy of the
returned dict if they want to modify it.
A layer config is a Python dictionary (serializable)
containing the configuration of a layer.
The same layer can be reinstantiated later
(without its trained weights) from this configuration.
The config of a layer does not include connectivity
information, nor the layer class name. These are handled
by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of
dict every time it is called. The callers should make a copy of the
returned dict if they want to modify it.
Creates the variables of the layer (for subclass implementers).
This is a method that implementers of subclasses of Layer or Model
can override if they need a state-creation step in-between
layer instantiation and layer call. It is invoked automatically before
the first execution of call().
This is typically used to create the weights of Layer subclasses
(at the discretion of the subclass implementer).
Parameters:
input_shape – Instance of TensorShape, or list of instances of
TensorShape if the layer expects a list of inputs
(one instance per input).
Spatial-domain convolutions can be defined as
H = h_0I + h_1A + h_2A^2 + … + hkAk, H ∈ R**(N×N)
We approximate it by
H ≈ h_0I + h_1A
We can define a convolution as applying multiple these linear filters
over edges of different types (think up, down, left, right, diagonal in images)
Where each edge type has its own adjacency matrix
H ≈ h_0I + h_1A_1 + h_2A_2 + … h_(L−1)A_(L−1)
A layer config is a Python dictionary (serializable)
containing the configuration of a layer.
The same layer can be reinstantiated later
(without its trained weights) from this configuration.
The config of a layer does not include connectivity
information, nor the layer class name. These are handled
by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of
dict every time it is called. The callers should make a copy of the
returned dict if they want to modify it.
Creates the variables of the layer (for subclass implementers).
This is a method that implementers of subclasses of Layer or Model
can override if they need a state-creation step in-between
layer instantiation and layer call. It is invoked automatically before
the first execution of call().
This is typically used to create the weights of Layer subclasses
(at the discretion of the subclass implementer).
Parameters:
input_shape – Instance of TensorShape, or list of instances of
TensorShape if the layer expects a list of inputs
(one instance per input).
The call() method may not create state (except in its first
invocation, wrapping the creation of variables or other resources in
tf.init_scope()). It is recommended to create state, including
tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors.
The first positional inputs argument is subject to special rules:
- inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value
of a keyword argument.
NumPy array or Python scalar values in inputs get cast as
tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method)
using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs.
If a layer has tensor arguments in *args or **kwargs, their
casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs
only.
Integration with various ecosystem packages like TFMOT, TFLite,
TF.js, etc is only supported for inputs and not for tensors in
positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although
this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although
this is not recommended, for the reasons above.
The following optional keyword arguments are reserved:
- training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a
mask argument, its default value will be set to the mask
generated for inputs by the previous layer (if input did come
from a layer that generated a corresponding mask, i.e. if it came
from a Keras layer with masking support).
A layer config is a Python dictionary (serializable)
containing the configuration of a layer.
The same layer can be reinstantiated later
(without its trained weights) from this configuration.
The config of a layer does not include connectivity
information, nor the layer class name. These are handled
by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of
dict every time it is called. The callers should make a copy of the
returned dict if they want to modify it.
Creates the variables of the layer (for subclass implementers).
This is a method that implementers of subclasses of Layer or Model
can override if they need a state-creation step in-between
layer instantiation and layer call. It is invoked automatically before
the first execution of call().
This is typically used to create the weights of Layer subclasses
(at the discretion of the subclass implementer).
Parameters:
input_shape – Instance of TensorShape, or list of instances of
TensorShape if the layer expects a list of inputs
(one instance per input).
The call() method may not create state (except in its first
invocation, wrapping the creation of variables or other resources in
tf.init_scope()). It is recommended to create state, including
tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors.
The first positional inputs argument is subject to special rules:
- inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value
of a keyword argument.
NumPy array or Python scalar values in inputs get cast as
tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method)
using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs.
If a layer has tensor arguments in *args or **kwargs, their
casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs
only.
Integration with various ecosystem packages like TFMOT, TFLite,
TF.js, etc is only supported for inputs and not for tensors in
positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although
this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although
this is not recommended, for the reasons above.
The following optional keyword arguments are reserved:
- training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a
mask argument, its default value will be set to the mask
generated for inputs by the previous layer (if input did come
from a layer that generated a corresponding mask, i.e. if it came
from a Keras layer with masking support).
This class implements the core Weave convolution from the
Google graph convolution paper [1]_
This model contains atom features and bond features
separately.Here, bond features are also called pair features.
There are 2 types of transformation, atom->atom, atom->pair,
pair->atom, pair->pair that this model implements.
Examples
This layer expects 4 inputs in a list of the form [atom_features,
pair_features, pair_split, atom_to_pair]. We’ll walk through the structure
of these inputs. Let’s start with some basic definitions.
>>> importdeepchemasdc>>> importnumpyasnp
Suppose you have a batch of molecules
>>> smiles=["CCC","C"]
Note that there are 4 atoms in total in this system. This layer expects its
input molecules to be batched together.
>>> total_n_atoms=4
Let’s suppose that we have a featurizer that computes n_atom_feat features
per atom.
>>> n_atom_feat=75
Then conceptually, atom_feat is the array of shape (total_n_atoms,
n_atom_feat) of atomic features. For simplicity, let’s just go with a
random such matrix.
Let’s suppose we have n_pair_feat pairwise features
>>> n_pair_feat=14
For each molecule, we compute a matrix of shape (n_atoms*n_atoms,
n_pair_feat) of pairwise features for each pair of atoms in the molecule.
Let’s construct this conceptually for our example.
pair_split is an index into pair_feat which tells us which atom each row belongs to. In our case, we hve
>>> pair_split=np.array([0,0,0,1,1,1,2,2,2,3])
That is, the first 9 entries belong to “CCC” and the last entry to “C”. The
final entry atom_to_pair goes in a little more in-depth than pair_split
and tells us the precise pair each pair feature belongs to. In our case
The 4 is total_num_atoms and the 10 is the total number of pairs. Where
does 50 come from? It’s from the default arguments n_atom_input_feat and
n_pair_input_feat.
n_atom_input_feat (int, optional (default 75)) – Number of features for each atom in input.
n_pair_input_feat (int, optional (default 14)) – Number of features for each pair of atoms in input.
n_atom_output_feat (int, optional (default 50)) – Number of features for each atom in output.
n_pair_output_feat (int, optional (default 50)) – Number of features for each pair of atoms in output.
n_hidden_AA (int, optional (default 50)) – Number of units(convolution depths) in corresponding hidden layer
n_hidden_PA (int, optional (default 50)) – Number of units(convolution depths) in corresponding hidden layer
n_hidden_AP (int, optional (default 50)) – Number of units(convolution depths) in corresponding hidden layer
n_hidden_PP (int, optional (default 50)) – Number of units(convolution depths) in corresponding hidden layer
update_pair (bool, optional (default True)) – Whether to calculate for pair features,
could be turned off for last layer
init (str, optional (default 'glorot_uniform')) – Weight initialization for filters.
activation (str, optional (default 'relu')) – Activation function applied
batch_normalize (bool, optional (default True)) – If this is turned on, apply batch normalization before applying
activation functions on convolutional layers.
batch_normalize_kwargs (Dict, optional (default {renorm=True})) – Batch normalization is a complex layer which has many potential
argumentswhich change behavior. This layer accepts user-defined
parameters which are passed to all BatchNormalization layers in
WeaveModel, WeaveLayer, and WeaveGather.
Implements the weave-gathering section of weave convolutions.
Implements the gathering layer from [1]_. The weave gathering layer gathers
per-atom features to create a molecule-level fingerprint in a weave
convolutional network. This layer can also performs Gaussian histogram
expansion as detailed in [1]_. Note that the gathering function here is
simply addition as in [1]_>
Examples
This layer expects 2 inputs in a list of the form [atom_features,
pair_features]. We’ll walk through the structure
of these inputs. Let’s start with some basic definitions.
>>> importdeepchemasdc>>> importnumpyasnp
Suppose you have a batch of molecules
>>> smiles=["CCC","C"]
Note that there are 4 atoms in total in this system. This layer expects its
input molecules to be batched together.
>>> total_n_atoms=4
Let’s suppose that we have n_atom_feat features per atom.
>>> n_atom_feat=75
Then conceptually, atom_feat is the array of shape (total_n_atoms,
n_atom_feat) of atomic features. For simplicity, let’s just go with a
random such matrix.
n_input (int, optional (default 128)) – number of features for each input molecule
gaussian_expand (boolean, optional (default True)) – Whether to expand each dimension of atomic features by gaussian histogram
compress_post_gaussian_expansion (bool, optional (default False)) – If True, compress the results of the Gaussian expansion back to the
original dimensions of the input by using a linear layer with specified
activation function. Note that this compression was not in the original
paper, but was present in the original DeepChem implementation so is
left present for backwards compatibility.
init (str, optional (default 'glorot_uniform')) – Weight initialization for filters if compress_post_gaussian_expansion
is True.
activation (str, optional (default 'tanh')) – Activation function applied for filters if
compress_post_gaussian_expansion is True. Should be recognizable by
tf.keras.activations.
A layer config is a Python dictionary (serializable)
containing the configuration of a layer.
The same layer can be reinstantiated later
(without its trained weights) from this configuration.
The config of a layer does not include connectivity
information, nor the layer class name. These are handled
by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of
dict every time it is called. The callers should make a copy of the
returned dict if they want to modify it.
Creates the variables of the layer (for subclass implementers).
This is a method that implementers of subclasses of Layer or Model
can override if they need a state-creation step in-between
layer instantiation and layer call. It is invoked automatically before
the first execution of call().
This is typically used to create the weights of Layer subclasses
(at the discretion of the subclass implementer).
Parameters:
input_shape – Instance of TensorShape, or list of instances of
TensorShape if the layer expects a list of inputs
(one instance per input).
We construct a Gaussian at gaussian_memberships[i][0] with standard
deviation gaussian_memberships[i][1]. Each feature in x is assigned
the probability of falling in each Gaussian, and probabilities are
normalized across the 11 different Gaussians.
A layer config is a Python dictionary (serializable)
containing the configuration of a layer.
The same layer can be reinstantiated later
(without its trained weights) from this configuration.
The config of a layer does not include connectivity
information, nor the layer class name. These are handled
by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of
dict every time it is called. The callers should make a copy of the
returned dict if they want to modify it.
Creates the variables of the layer (for subclass implementers).
This is a method that implementers of subclasses of Layer or Model
can override if they need a state-creation step in-between
layer instantiation and layer call. It is invoked automatically before
the first execution of call().
This is typically used to create the weights of Layer subclasses
(at the discretion of the subclass implementer).
Parameters:
input_shape – Instance of TensorShape, or list of instances of
TensorShape if the layer expects a list of inputs
(one instance per input).
A layer config is a Python dictionary (serializable)
containing the configuration of a layer.
The same layer can be reinstantiated later
(without its trained weights) from this configuration.
The config of a layer does not include connectivity
information, nor the layer class name. These are handled
by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of
dict every time it is called. The callers should make a copy of the
returned dict if they want to modify it.
Creates the variables of the layer (for subclass implementers).
This is a method that implementers of subclasses of Layer or Model
can override if they need a state-creation step in-between
layer instantiation and layer call. It is invoked automatically before
the first execution of call().
This is typically used to create the weights of Layer subclasses
(at the discretion of the subclass implementer).
Parameters:
input_shape – Instance of TensorShape, or list of instances of
TensorShape if the layer expects a list of inputs
(one instance per input).
A layer config is a Python dictionary (serializable)
containing the configuration of a layer.
The same layer can be reinstantiated later
(without its trained weights) from this configuration.
The config of a layer does not include connectivity
information, nor the layer class name. These are handled
by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of
dict every time it is called. The callers should make a copy of the
returned dict if they want to modify it.
Creates the variables of the layer (for subclass implementers).
This is a method that implementers of subclasses of Layer or Model
can override if they need a state-creation step in-between
layer instantiation and layer call. It is invoked automatically before
the first execution of call().
This is typically used to create the weights of Layer subclasses
(at the discretion of the subclass implementer).
Parameters:
input_shape – Instance of TensorShape, or list of instances of
TensorShape if the layer expects a list of inputs
(one instance per input).
This layer generates a directed acyclic graph for each atom
in a molecule. This layer is based on the algorithm from the
following paper:
Lusci, Alessandro, Gianluca Pollastri, and Pierre Baldi. “Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules.” Journal of chemical information and modeling 53.7 (2013): 1563-1575.
This layer performs a sort of inward sweep. Recall that for
each atom, a DAG is generated that “points inward” to that
atom from the undirected molecule graph. Picture this as
“picking up” the atom as the vertex and using the natural
tree structure that forms from gravity. The layer “sweeps
inwards” from the leaf nodes of the DAG upwards to the
atom. This is batched so the transformation is done for
each atom.
n_graph_feat (int, optional) – Number of features for each node(and the whole grah).
n_atom_feat (int, optional) – Number of features listed per atom.
max_atoms (int, optional) – Maximum number of atoms in molecules.
layer_sizes (list of int, optional(default=[100])) – List of hidden layer size(s):
length of this list represents the number of hidden layers,
and each element is the width of corresponding hidden layer.
init (str, optional) – Weight initialization for filters.
activation (str, optional) – Activation function applied.
dropout (float, optional) – Dropout probability in hidden layer(s).
batch_size (int, optional) – number of molecules in a batch.
A layer config is a Python dictionary (serializable)
containing the configuration of a layer.
The same layer can be reinstantiated later
(without its trained weights) from this configuration.
The config of a layer does not include connectivity
information, nor the layer class name. These are handled
by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of
dict every time it is called. The callers should make a copy of the
returned dict if they want to modify it.
n_graph_feat (int, optional) – Number of features for each atom.
n_outputs (int, optional) – Number of features for each molecule.
max_atoms (int, optional) – Maximum number of atoms in molecules.
layer_sizes (list of int, optional) – List of hidden layer size(s):
length of this list represents the number of hidden layers,
and each element is the width of corresponding hidden layer.
init (str, optional) – Weight initialization for filters.
activation (str, optional) – Activation function applied.
dropout (float, optional) – Dropout probability in the hidden layer(s).
A layer config is a Python dictionary (serializable)
containing the configuration of a layer.
The same layer can be reinstantiated later
(without its trained weights) from this configuration.
The config of a layer does not include connectivity
information, nor the layer class name. These are handled
by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of
dict every time it is called. The callers should make a copy of the
returned dict if they want to modify it.
Creates the variables of the layer (for subclass implementers).
This is a method that implementers of subclasses of Layer or Model
can override if they need a state-creation step in-between
layer instantiation and layer call. It is invoked automatically before
the first execution of call().
This is typically used to create the weights of Layer subclasses
(at the discretion of the subclass implementer).
Parameters:
input_shape – Instance of TensorShape, or list of instances of
TensorShape if the layer expects a list of inputs
(one instance per input).
A layer config is a Python dictionary (serializable)
containing the configuration of a layer.
The same layer can be reinstantiated later
(without its trained weights) from this configuration.
The config of a layer does not include connectivity
information, nor the layer class name. These are handled
by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of
dict every time it is called. The callers should make a copy of the
returned dict if they want to modify it.
Creates the variables of the layer (for subclass implementers).
This is a method that implementers of subclasses of Layer or Model
can override if they need a state-creation step in-between
layer instantiation and layer call. It is invoked automatically before
the first execution of call().
This is typically used to create the weights of Layer subclasses
(at the discretion of the subclass implementer).
Parameters:
input_shape – Instance of TensorShape, or list of instances of
TensorShape if the layer expects a list of inputs
(one instance per input).
A layer config is a Python dictionary (serializable)
containing the configuration of a layer.
The same layer can be reinstantiated later
(without its trained weights) from this configuration.
The config of a layer does not include connectivity
information, nor the layer class name. These are handled
by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of
dict every time it is called. The callers should make a copy of the
returned dict if they want to modify it.
Creates the variables of the layer (for subclass implementers).
This is a method that implementers of subclasses of Layer or Model
can override if they need a state-creation step in-between
layer instantiation and layer call. It is invoked automatically before
the first execution of call().
This is typically used to create the weights of Layer subclasses
(at the discretion of the subclass implementer).
Parameters:
input_shape – Instance of TensorShape, or list of instances of
TensorShape if the layer expects a list of inputs
(one instance per input).
The call() method may not create state (except in its first
invocation, wrapping the creation of variables or other resources in
tf.init_scope()). It is recommended to create state, including
tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors.
The first positional inputs argument is subject to special rules:
- inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value
of a keyword argument.
NumPy array or Python scalar values in inputs get cast as
tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method)
using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs.
If a layer has tensor arguments in *args or **kwargs, their
casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs
only.
Integration with various ecosystem packages like TFMOT, TFLite,
TF.js, etc is only supported for inputs and not for tensors in
positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although
this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although
this is not recommended, for the reasons above.
The following optional keyword arguments are reserved:
- training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a
mask argument, its default value will be set to the mask
generated for inputs by the previous layer (if input did come
from a layer that generated a corresponding mask, i.e. if it came
from a Keras layer with masking support).
A layer config is a Python dictionary (serializable)
containing the configuration of a layer.
The same layer can be reinstantiated later
(without its trained weights) from this configuration.
The config of a layer does not include connectivity
information, nor the layer class name. These are handled
by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of
dict every time it is called. The callers should make a copy of the
returned dict if they want to modify it.
Creates the variables of the layer (for subclass implementers).
This is a method that implementers of subclasses of Layer or Model
can override if they need a state-creation step in-between
layer instantiation and layer call. It is invoked automatically before
the first execution of call().
This is typically used to create the weights of Layer subclasses
(at the discretion of the subclass implementer).
Parameters:
input_shape – Instance of TensorShape, or list of instances of
TensorShape if the layer expects a list of inputs
(one instance per input).
The call() method may not create state (except in its first
invocation, wrapping the creation of variables or other resources in
tf.init_scope()). It is recommended to create state, including
tf.Variable instances and nested Layer instances,
in __init__(), or in the build() method that is
called automatically before call() executes for the first time.
Parameters:
inputs –
Input tensor, or dict/list/tuple of input tensors.
The first positional inputs argument is subject to special rules:
- inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value
of a keyword argument.
NumPy array or Python scalar values in inputs get cast as
tensors.
Keras mask metadata is only collected from inputs.
Layers are built (build(input_shape) method)
using shape info from inputs only.
input_spec compatibility is only checked against inputs.
Mixed precision input casting is only applied to inputs.
If a layer has tensor arguments in *args or **kwargs, their
casting behavior in mixed precision should be handled manually.
The SavedModel input specification is generated using inputs
only.
Integration with various ecosystem packages like TFMOT, TFLite,
TF.js, etc is only supported for inputs and not for tensors in
positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although
this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although
this is not recommended, for the reasons above.
The following optional keyword arguments are reserved:
- training: Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask: Boolean input mask. If the layer’s call() method takes a
mask argument, its default value will be set to the mask
generated for inputs by the previous layer (if input did come
from a layer that generated a corresponding mask, i.e. if it came
from a Keras layer with masking support).
A layer config is a Python dictionary (serializable)
containing the configuration of a layer.
The same layer can be reinstantiated later
(without its trained weights) from this configuration.
The config of a layer does not include connectivity
information, nor the layer class name. These are handled
by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of
dict every time it is called. The callers should make a copy of the
returned dict if they want to modify it.
Creates the variables of the layer (for subclass implementers).
This is a method that implementers of subclasses of Layer or Model
can override if they need a state-creation step in-between
layer instantiation and layer call. It is invoked automatically before
the first execution of call().
This is typically used to create the weights of Layer subclasses
(at the discretion of the subclass implementer).
Parameters:
input_shape – Instance of TensorShape, or list of instances of
TensorShape if the layer expects a list of inputs
(one instance per input).
The atomic convolutional networks function as a variant of
graph convolutions. The difference is that the “graph” here is
the nearest neighbors graph in 3D space [1]. The AtomicConvModule
leverages these connections in 3D space to train models that
learn to predict energetic states starting from the spatial
geometry of the model.
frag1_num_atoms (int) – Number of atoms in first fragment
frag2_num_atoms (int) – Number of atoms in sec
max_num_neighbors (int) – Maximum number of neighbors possible for an atom. Recall neighbors
are spatial neighbors.
atom_types (list) – List of atoms recognized by model. Atoms are indicated by their
nuclear numbers.
radial (list) – Radial parameters used in the atomic convolution transformation.
layer_sizes (list) – the size of each dense layer in the network. The length of
this list determines the number of layers.
weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight
initialization of each layer. The length of this list should
equal len(layer_sizes). Alternatively, this may be a single
value instead of a list, where the same value is used
for every layer.
bias_init_consts (list or float) – the value to initialize the biases in each layer. The
length of this list should equal len(layer_sizes).
Alternatively, this may be a single value instead of a list, where the same value is used for every layer.
dropouts (list or float) – the dropout probability to use for each layer. The length of this list should equal len(layer_sizes).
Alternatively, this may be a single value instead of a list, where the same value is used for every layer.
activation_fns (list or object) – the Tensorflow activation function to apply to each layer. The length of this list should equal
len(layer_sizes). Alternatively, this may be a single value instead of a list, where the
same value is used for every layer.
A 1, 2, or 3 dimensional convolutional network for either regression or classification.
The network consists of the following sequence of layers:
- A configurable number of convolutional layers
- A global pooling layer (either max pool or average pool)
- A final fully connected layer to compute the output
It optionally can compose the model from pre-activation residual blocks, as
described in https://arxiv.org/abs/1603.05027, rather than a simple stack of
convolution layers. This often leads to easier training, especially when using a
large number of layers. Note that residual blocks can only be used when
successive layers have the same output shape. Wherever the output shape changes, a
simple convolution layer will be used even if residual=True.
.. rubric:: Examples
dims (int) – the number of dimensions to apply convolutions over (1, 2, or 3)
layer_filters (list) – the number of output filters for each convolutional layer in the network.
The length of this list determines the number of layers.
kernel_size (int, tuple, or list) – a list giving the shape of the convolutional kernel for each layer. Each
element may be either an int (use the same kernel width for every dimension)
or a tuple (the kernel width along each dimension). Alternatively this may
be a single int or tuple instead of a list, in which case the same kernel
shape is used for every layer.
strides (int, tuple, or list) – a list giving the stride between applications of the kernel for each layer.
Each element may be either an int (use the same stride for every dimension)
or a tuple (the stride along each dimension). Alternatively this may be a
single int or tuple instead of a list, in which case the same stride is
used for every layer.
weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization
of each layer. The length of this list should equal len(layer_filters)+1,
where the final element corresponds to the dense layer. Alternatively this
may be a single value instead of a list, in which case the same value is used
for every layer.
bias_init_consts (list or float) – the value to initialize the biases in each layer to. The length of this
list should equal len(layer_filters)+1, where the final element corresponds
to the dense layer. Alternatively this may be a single value instead of a
list, in which case the same value is used for every layer.
dropouts (list or float) – the dropout probability to use for each layer. The length of this list should equal len(layer_filters).
Alternatively this may be a single value instead of a list, in which case the same value is used for every layer
activation_fns (str or list) – the torch activation function to apply to each layer. The length of this list should equal
len(layer_filters). Alternatively this may be a single value instead of a list, in which case the
same value is used for every layer, ‘relu’ by default
pool_type (str) – the type of pooling layer to use, either ‘max’ or ‘average’
mode (str) – Either ‘classification’ or ‘regression’
n_classes (int) – the number of classes to predict (only used in classification mode)
uncertainty (bool) – if True, include extra outputs and loss terms to enable the uncertainty
in outputs to be predicted
residual (bool) – if True, the model will be composed of pre-activation residual blocks instead
of a simple stack of convolutional layers.
padding (str, int or tuple) – the padding to use for convolutional layers, either ‘valid’ or ‘same’
The ScaleNorm layer first computes the square root of the scale, then computes the matrix/vector norm of the input tensor.
The norm value is calculated as sqrt(scale) / matrix norm.
Finally, the result is returned as input_tensor * norm value.
This layer can be used instead of LayerNorm when a scaled version of the norm is required.
Instead of performing the scaling operation (scale / norm) in a lambda-like layer, we are defining it within this layer to make prototyping more efficient.
Although the recipe for forward pass needs to be defined within
this function, one should call the Module instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
Encoder layer for use in the Molecular Attention Transformer [1]_.
The MATEncoder layer primarily consists of a self-attention layer (MultiHeadedMATAttention) and a feed-forward layer (PositionwiseFeedForward).
This layer can be stacked multiple times to form an encoder.
dist_kernel (str) – Kernel activation to be used. Can be either ‘softmax’ for softmax or ‘exp’ for exponential, for the self-attention layer.
lambda_attention (float) – Constant to be multiplied with the attention matrix in the self-attention layer.
lambda_distance (float) – Constant to be multiplied with the distance matrix in the self-attention layer.
h (int) – Number of attention heads for the self-attention layer.
sa_hsize (int) – Size of dense layer in the self-attention layer.
sa_dropout_p (float) – Dropout probability for the self-attention layer.
output_bias (bool) – If True, dense layers will use bias vectors in the self-attention layer.
d_input (int) – Size of input layer in the feed-forward layer.
d_hidden (int) – Size of hidden layer in the feed-forward layer.
d_output (int) – Size of output layer in the feed-forward layer.
activation (str) – Activation function to be used in the feed-forward layer.
Can choose between ‘relu’ for ReLU, ‘leakyrelu’ for LeakyReLU, ‘prelu’ for PReLU,
‘tanh’ for TanH, ‘selu’ for SELU, ‘elu’ for ELU and ‘linear’ for linear activation.
n_layers (int) – Number of layers in the feed-forward layer.
dropout_p (float) – Dropout probability in the feeed-forward layer.
encoder_hsize (int) – Size of Dense layer for the encoder itself.
encoder_dropout_p (float) – Dropout probability for connections in the encoder layer.
In the MATEncoderLayer intialization, self.sublayer is defined as an nn.ModuleList of 2 layers. We will be passing our computation through these layers sequentially.
nn.ModuleList is subscriptable and thus we can access it as self.sublayer[0], for example.
Parameters:
x (torch.Tensor) – Input tensor.
mask (torch.Tensor) – Masks out padding values so that they are not taken into account when computing the attention score.
adj_matrix (torch.Tensor) – Adjacency matrix of a molecule.
distance_matrix (torch.Tensor) – Distance matrix of a molecule.
sa_dropout_p (float) – Dropout probability for the self-attention layer (MultiHeadedMATAttention).
First constructs an attention layer tailored to the Molecular Attention Transformer [1]_ and then converts it into Multi-Headed Attention.
In Multi-Headed attention the attention mechanism multiple times parallely through the multiple attention heads.
Thus, different subsequences of a given sequences can be processed differently.
The query, key and value parameters are split multiple ways and each split is passed separately through a different attention head.
.. rubric:: References
Initialize a multi-headed attention layer.
:param dist_kernel: Kernel activation to be used. Can be either ‘softmax’ for softmax or ‘exp’ for exponential.
:type dist_kernel: str
:param lambda_attention: Constant to be multiplied with the attention matrix.
:type lambda_attention: float
:param lambda_distance: Constant to be multiplied with the distance matrix.
:type lambda_distance: float
:param h: Number of attention heads.
:type h: int
:param hsize: Size of dense layer.
:type hsize: int
:param dropout_p: Dropout probability.
:type dropout_p: float
:param output_bias: If True, dense layers will use bias vectors.
:type output_bias: bool
Output computation for the MultiHeadedAttention layer.
:param query: Standard query parameter for attention.
:type query: torch.Tensor
:param key: Standard key parameter for attention.
:type key: torch.Tensor
:param value: Standard value parameter for attention.
:type value: torch.Tensor
:param mask: Masks out padding values so that they are not taken into account when computing the attention score.
:type mask: torch.Tensor
:param adj_matrix: Adjacency matrix of the input molecule, returned from dc.feat.MATFeaturizer()
:type adj_matrix: torch.Tensor
:param dist_matrix: Distance matrix of the input molecule, returned from dc.feat.MATFeaturizer()
:type dist_matrix: torch.Tensor
:param dropout_p: Dropout probability.
:type dropout_p: float
:param eps: Epsilon value
:type eps: float
:param inf: Value of infinity to be used.
:type inf: float
The SublayerConnection normalizes and adds dropout to output tensor of an arbitary layer.
It further adds a residual layer connection between the input of the arbitary layer and the dropout-adjusted layer output.
Output computation for the SublayerConnection layer.
Takes an input tensor x, then adds the dropout-adjusted sublayer output for normalized x to it.
This is done to add a residual connection followed by LayerNorm.
Parameters:
x (torch.Tensor) – Input tensor.
output (torch.Tensor) – Layer whose normalized output will be added to x.
PositionwiseFeedForward is a layer used to define the position-wise feed-forward (FFN) algorithm for the Molecular Attention Transformer [1]_
Each layer in the MAT encoder contains a fully connected feed-forward network which applies two linear transformations and the given activation function.
This is done in addition to the SublayerConnection module.
Note: This modified version of PositionwiseFeedForward class contains dropout_at_input_no_act condition to facilitate its use in defining
the feed-forward (FFN) algorithm for the Directed Message Passing Neural Network (D-MPNN) [2]_
d_hidden (int (same as d_input if d_output = 0)) – Size of hidden layer.
d_output (int (same as d_input if d_output = 0)) – Size of output layer.
activation (str) – Activation function to be used. Can choose between ‘relu’ for ReLU, ‘leakyrelu’ for LeakyReLU, ‘prelu’ for PReLU,
‘tanh’ for TanH, ‘selu’ for SELU, ‘elu’ for ELU and ‘linear’ for linear activation.
n_layers (int) – Number of layers.
dropout_p (float) – Dropout probability.
dropout_at_input_no_act (bool) – If true, dropout is applied on the input tensor. For single layer, it is not passed to an activation function.
In an embedding layer, input is taken and converted to a vector representation for each input.
In the MATEmbedding layer, an input tensor is processed through a dropout-adjusted linear layer and the resultant vector is returned.
MATGenerator defines the linear and softmax generator step for the Molecular Attention Transformer [1]_.
In the MATGenerator, a Generator is defined which performs the Linear + Softmax generation step.
Depending on the type of aggregation selected, the attention output layer performs different operations.
Computes the inner product (cosine similarity) between two tensors.
This assumes that the two input tensors contain rows of vectors where
each column represents a different feature. The output tensor will have
elements that represent the inner product between pairs of normalized vectors
in the rows of x and y. The two tensors need to have the same number of
columns, because one cannot take the dot product between vectors of different
lengths. For example, in sentence similarity and sentence classification tasks,
the number of columns is the embedding size. In these tasks, the rows of the
input tensors would be different test vectors or sentences. The input tensors
themselves could be different batches. Using vectors or tensors of all 0s
should be avoided.
The cosine similarity between two equivalent vectors will be 1. The cosine
similarity between two equivalent tensors (tensors where all the elements are
the same) will be a tensor of 1s. In this scenario, if the input tensors x and
y are each of shape (n,p), where each element in x and y is the same, then
the output tensor would be a tensor of shape (n,n) with 1 in every entry.
x and y_same are the same tensor (equivalent at every element, in this
case 1). As such, the pairwise inner product of the rows in x and y will
always be 1. The output tensor will be of shape (6,6).
The cosine similarity between two orthogonal vectors will be 0 (by definition).
If every row in x is orthogonal to every row in y, then the output will be a
tensor of 0s. In the following example, each row in the tensor x1 is orthogonal
to each row in x2 because they are halves of an identity matrix.
Each row in x1 is orthogonal to each row in x2. As such, the pairwise inner
product of the rows in x1`and `x2 will always be 0. Furthermore, because the
shape of the input tensors are both of shape (256,512), the output tensor will
be of shape (256,256).
x (tf.Tensor) – Input Tensor of shape (n, p).
The shape of this input tensor should be n rows by p columns.
Note that n need not equal m (the number of rows in y).
y (tf.Tensor) – Input Tensor of shape (m, p)
The shape of this input tensor should be m rows by p columns.
Note that m need not equal n (the number of rows in x).
Returns:
Returns a tensor of shape (n, m), that is, n rows by m columns.
Each i,j-th entry of this output tensor is the inner product between
the l2-normalized i-th row of the input tensor x and the
the l2-normalized j-th row of the output tensor y.
A Graph Network [1]_ takes a graph as input and returns an updated graph
as output. The output graph has same structure as input graph but it
has updated node features, edge features and global state features.
Parameters:
n_node_features (int) – Number of features in a node
n_edge_features (int) – Number of features in a edge
n_global_features (int) – Number of global features
is_undirected (bool, optional (default True)) – Directed or undirected graph
residual_connection (bool, optional (default True)) – If True, the layer uses a residual connection during training
node_features (torch.Tensor) – Input node features of shape \((|\mathcal{V}|, F_n)\)
edge_index (torch.Tensor) – Edge indexes of shape \((2, |\mathcal{E}|)\)
edge_features (torch.Tensor) – Edge features of the graph, shape: \((|\mathcal{E}|, F_e)\)
global_features (torch.Tensor) – Global features of the graph, shape: \((F_g, 1)\) where, \(|\mathcal{V}|\) and \(|\mathcal{E}|\) denotes the number of nodes and edges in the graph,
\(F_n\), \(F_e\), \(F_g\) denotes the number of node features, edge features and global state features respectively.
batch (torch.LongTensor (optional, default: None)) – A vector that maps each node to its respective graph identifier. The attribute is used only when more than one graph are batched together during a single forward pass.
This transformation is based on the affinity of the base distribution with
the target distribution. A geometric transformation is applied where
the parameters performs changes on the scale and shift of a function
(inputs).
Normalizing Flow transformations must be bijective in order to compute
the logarithm of jacobian’s determinant. For this reason, transformations
must perform a forward and inverse pass.
Example
>>> importdeepchemasdc>>> fromdeepchem.models.torch_models.layersimportAffine>>> importtorch>>> fromtorch.distributionsimportMultivariateNormal>>> # initialize the transformation layer's parameters>>> dim=2>>> samples=96>>> transforms=Affine(dim)>>> # forward pass based on a given distribution>>> distribution=MultivariateNormal(torch.zeros(dim),torch.eye(dim))>>> input=distribution.sample(torch.Size((samples,dim)))>>> len(transforms.forward(input))2>>> # inverse pass based on a distribution>>> len(transforms.inverse(input))2
Performs a transformation between two different distributions. This
particular transformation represents the following function:
y = x * exp(a) + b, where a is scale parameter and b performs a shift.
This class also returns the logarithm of the jacobians determinant
which is useful when invert a transformation and compute the
probability of the transformation.
Parameters:
x (Sequence) – Tensor sample with the initial distribution data which will pass into
the normalizing flow algorithm.
Returns:
y (torch.Tensor) – Transformed tensor according to Affine layer with the shape of ‘x’.
log_det_jacobian (torch.Tensor) – Tensor which represents the info about the deviation of the initial
and target distribution.
Performs a transformation between two different distributions.
This transformation represents the bacward pass of the function
mention before. Its mathematical representation is x = (y - b) / exp(a)
, where “a” is scale parameter and “b” performs a shift. This class
also returns the logarithm of the jacobians determinant which is
useful when invert a transformation and compute the probability of
the transformation.
Parameters:
y (Sequence) – Tensor sample with transformed distribution data which will be used in
the normalizing algorithm inverse pass.
Returns:
x (torch.Tensor) – Transformed tensor according to Affine layer with the shape of ‘y’.
inverse_log_det_jacobian (torch.Tensor) – Tensor which represents the information of the deviation of the initial
and target distribution.
This class class is a constructor transformation layer used on a
NormalizingFLow model. The Real Non-Preserving-Volumen (Real NVP) is a type
of normalizing flow layer which gives advantages over this mainly because an
ease to compute the inverse pass [1]_, this is to learn a target
distribution.
This particular transformation is represented by the following function:
y = x + (1 - x) * exp( s(x)) + t(x), where t and s needs an activation
function. This class also returns the logarithm of the jacobians
determinant which is useful when invert a transformation and compute
the probability of the transformation.
Parameters:
x (Sequence) – Tensor sample with the initial distribution data which will pass into
the normalizing algorithm
Returns:
y (torch.Tensor) – Transformed tensor according to Real NVP layer with the shape of ‘x’.
log_det_jacobian (torch.Tensor) – Tensor which represents the info about the deviation of the initial
and target distribution.
This class performs the inverse of the previous method (formward).
Also, this metehod returns the logarithm of the jacobians determinant
which is useful to compute the learneable features of target distribution.
Parameters:
y (Sequence) – Tensor sample with transformed distribution data which will be used in
the normalizing algorithm inverse pass.
Returns:
x (torch.Tensor) – Transformed tensor according to Real NVP layer with the shape of ‘y’.
inverse_log_det_jacobian (torch.Tensor) – Tensor which represents the information of the deviation of the initial
and target distribution.
Let the bonds from atoms 1->2 (B[12]) and 2->1 (B[21]) be considered as 2 different bonds.
Hence, by considering the same for all atoms, the total number of bonds = 8.
Let:
atom features : a1,a2,a3,a4,a5
hidden states of atoms : h1,h2,h3,h4,h5
bond features bonds : b12,b21,b23,b32,b24,b42,b15,b51
initial hidden states of bonds : (0)h12,(0)h21,(0)h23,(0)h32,(0)h24,(0)h42,(0)h15,(0)h51
The hidden state of every bond is a function of the concatenated feature vector which contains
concatenation of the features of initial atom of the bond and bond features.
Example: (0)h21=func1(concat(a2,b21))
Note
Here func1 is self.W_i
The Message passing phase
The goal of the message-passing phase is to generate hidden states of all the atoms in the molecule.
The hidden state of an atom is a function of concatenation of atom features and messages (at T depth).
A message is a sum of hidden states of bonds coming to the atom (at T depth).
Note
Depth refers to the number of iterations in the message passing phase (here, T iterations). After each iteration, the hidden states of the bonds are updated.
Example:
h1=func3(concat(a1,m1))
Note
Here func3 is self.W_o.
m1 refers to the message coming to the atom.
m1=(T-1)h21+(T-1)h51
(hidden state of bond 2->1 + hidden state of bond 5->1) (at T depth)
for, depth T = 2:
the hidden states of the bonds @ 1st iteration will be => (0)h21, (0)h51
the hidden states of the bonds @ 2nd iteration will be => (1)h21, (1)h51
The hidden states of the bonds in 1st iteration are already know.
For hidden states of the bonds in 2nd iteration, we follow the criterion that:
hidden state of the bond is a function of initial hidden state of bond
and messages coming to that bond in that iteration
Example:
(1)h21=func2((0)h21,(1)m21)
Note
Here func2 is self.W_h.
(1)m21 refers to the messages coming to that bond 2->1 in that 2nd iteration.
Messages coming to a bond in an iteration is a sum of hidden states of bonds (from previous iteration) coming to this bond.
use_default_fdim (bool) – If True, self.atom_fdim and self.bond_fdim are initialized using values from the GraphConvConstants class. If False, self.atom_fdim and self.bond_fdim are initialized from the values provided.
atom_fdim (int) – Dimension of atom feature vector.
bond_fdim (int) – Dimension of bond feature vector.
d_hidden (int) – Size of hidden layer in the encoder layer.
depth (int) – No of message passing steps.
bias (bool) – If True, dense layers will use bias vectors.
activation (str) – Activation function to be used in the encoder layer.
Can choose between ‘relu’ for ReLU, ‘leakyrelu’ for LeakyReLU, ‘prelu’ for PReLU,
‘tanh’ for TanH, ‘selu’ for SELU, and ‘elu’ for ELU.
dropout_p (float) – Dropout probability for the encoder layer.
aggregation (str) – Aggregation type to be used in the encoder layer.
Can choose between ‘mean’, ‘sum’, and ‘norm’.
aggregation_norm (Union[int, float]) – Value required if aggregation type is ‘norm’.
f_ini_atoms_bonds (torch.Tensor) – Tensor containing concatenated feature vector which contains concatenation of initial atom and bond features.
atom_to_incoming_bonds (torch.Tensor) – Tensor containing mapping from atom index to list of indicies of incoming bonds.
mapping (torch.Tensor) – Tensor containing the mapping that maps bond index to ‘array of indices of the bonds’
incoming at the initial atom of the bond (excluding the reverse bonds).
The encoder for the InfoGraph model. It is a message passing graph convolutional
network that produces encoded representations for molecular graph inputs.
Parameters:
num_features (int) – Number of node features for each input
edge_features (int) – Number of edge features for each input
This module is responsible for the graph neural network layers in the GNNModular model.
Parameters:
node_type_embedding (torch.nn.Embedding) – Embedding layer for node types.
chirality_embedding (torch.nn.Embedding) – Embedding layer for chirality tags.
gconvs (torch.nn.ModuleList) – ModuleList of graph convolutional layers.
batch_norms (torch.nn.ModuleList) – ModuleList of batch normalization layers.
dropout (int) – Dropout probability.
jump_knowledge (str) – The type of jump knowledge to use. [1] Must be one of “last”, “sum”, “max”, “concat” or “none”.
“last”: Use the node representation from the last GNN layer.
“concat”: Concatenate the node representations from all GNN layers.
“max”: Take the element-wise maximum of the node representations from all GNN layers.
“sum”: Take the element-wise sum of the node representations from all GNN layers.
init_emb (bool) – Whether to initialize the embedding layers with Xavier uniform initialization.
data (tuple) – A tuple containing the node representations and the input graph data.
node_representation is a torch.Tensor created after passing input through the GNN layers.
input_batch is the original input BatchGraphData.
This discriminator module is a linear layer without bias, used to measure the similarity between local node representations (x) and global graph representations (summary).
The goal of the discriminator is to distinguish between positive and negative pairs of local and global representations.
Examples
>>> importtorch>>> fromdeepchem.models.torch_models.gnnimportLocalGlobalDiscriminator>>> discriminator=LocalGlobalDiscriminator(hidden_dim=64)>>> x=torch.randn(32,64)# Local node representations>>> summary=torch.randn(32,64)# Global graph representations>>> similarity_scores=discriminator(x,summary)>>> print(similarity_scores.shape)torch.Size([32])
Computes the product of summary and self.weight, and then calculates the element-wise product of x and the resulting matrix h.
It then sums over the hidden_dim dimension, resulting in a tensor of shape (batch_size,), which represents the similarity scores between the local and global representations.
Parameters:
x (torch.Tensor) – Local node representations of shape (batch_size, hidden_dim).
summary (torch.Tensor) – Global graph representations of shape (batch_size, hidden_dim).
Returns:
A tensor of shape (batch_size,), representing the similarity scores between the local and global representations.
Message passing neural network for graph representation learning [1]_.
Parameters:
hidden_dim (int) – Hidden dimension size.
target_dim (int) – Dimensionality of the output, for example for binary classification target_dim = 1.
aggregators (List[str]) – Type of message passing functions. Options are ‘mean’,’sum’,’max’,’min’,’std’,’var’,’moment3’,’moment4’,’moment5’.
scalers (List[str]) – Type of normalization layers in the message passing network. Options are ‘identity’,’amplification’,’attenuation’.
readout_aggregators (List[str]) – Type of aggregators in the readout network.
readout_hidden_dim (int, default None) – The dimension of the hidden layer in the readout network. If not provided, the readout has the same dimensionality of the final layer of the PNA layer, which is the hidden dimension size.
readout_layers (int, default 1) – The number of linear layers in the readout network.
residual (bool, default True) – Whether to use residual connections.
pairwise_distances (bool, default False) – Whether to use pairwise distances.
activation (Union[Callable, str]) – Activation function to use.
batch_norm (bool, default True) – Whether to use batch normalization in the layers before the aggregator..
batch_norm_momentum (float, default 0.1) – Momentum for the batch normalization layers.
propagation_depth (int, default) – Number of propagation layers.
dropout (float, default 0.0) – Dropout probability in the message passing layers.
posttrans_layers (int, default 1) – Number of post-transformation layers.
pretrans_layers (int, default 1) – Number of pre-transformation layers.
Although the recipe for forward pass needs to be defined within
this function, one should call the Module instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
Net3DLayer is a single layer of a 3D graph neural network based on the 3D Infomax architecture [1].
This class expects a DGL graph with node features stored under the name ‘feat’ and edge features stored under the name ‘d’ (representing 3D distances). The edge features are updated by the message network and the node features are updated by the update network.
Parameters:
edge_dim (int) – The dimension of the edge features.
hidden_dim (int) – The dimension of the hidden layers.
reduce_func (str) – The reduce function to use for aggregating messages. Can be either ‘sum’ or ‘mean’.
batch_norm (bool, optional (default=False)) – Whether to use batch normalization.
batch_norm_momentum (float, optional (default=0.1)) – The momentum for the batch normalization layers.
dropout (float, optional (default=0.0)) – The dropout rate for the layers.
mid_activation (str, optional (default='SiLU')) – The activation function to use in the network.
message_net_layers (int, optional (default=2)) – The number of message network layers.
update_net_layers (int, optional (default=2)) – The number of update network layers.
Net3D is a 3D graph neural network that expects a DGL graph input with 3D coordinates stored under the name ‘d’ and node features stored under the name ‘feat’. It is based on the 3D Infomax architecture [1].
Parameters:
hidden_dim (int) – The dimension of the hidden layers.
target_dim (int) – The dimension of the output layer.
readout_aggregators (List[str]) – A list of aggregator functions for the readout layer. Options are ‘sum’, ‘max’, ‘min’, ‘mean’.
batch_norm (bool, optional (default=False)) – Whether to use batch normalization.
node_wise_output_layers (int, optional (default=2)) – The number of output layers for each node.
readout_batchnorm (bool, optional (default=True)) – Whether to use batch normalization in the readout layer.
batch_norm_momentum (float, optional (default=0.1)) – The momentum for the batch normalization layers.
reduce_func (str, optional (default='sum')) – The reduce function to use for aggregating messages.
dropout (float, optional (default=0.0)) – The dropout rate for the layers.
propagation_depth (int, optional (default=4)) – The number of propagation layers in the network.
readout_layers (int, optional (default=2)) – The number of readout layers in the network.
readout_hidden_dim (int, optional (default=None)) – The dimension of the hidden layers in the readout network.
fourier_encodings (int, optional (default=0)) – The number of Fourier encodings to use.
activation (str, optional (default='SiLU')) – The activation function to use in the network.
update_net_layers (int, optional (default=2)) – The number of update network layers.
message_net_layers (int, optional (default=2)) – The number of message network layers.
use_node_features (bool, optional (default=False)) – Whether to use node features as input.
This layer creates ‘n’ number of embeddings as initial atomic descriptors. According to the required weight initializer and periodic_table_length (Total number of unique atoms).
References
[1] Schütt, Kristof T., et al. “Quantum-chemical insights from deep
Implements the gradient penalty loss term for WGANs.
This class implements the gradient penalty loss term for WGANs as described in
Gulrajani et al., “Improved Training of Wasserstein GANs” [1]_. It is used
internally by WGANModel
>>> classDiscriminator(nn.Module):... def__init__(self,data_input_shape,conditional_input_shape):... super(Discriminator,self).__init__()... self.data_input_shape=data_input_shape... self.conditional_input_shape=conditional_input_shape... # Extracting the actual data dimension... data_dim=data_input_shape[1:]... # Extracting the actual conditional dimension... conditional_dim=conditional_input_shape[1:]... input_dim=sum(data_dim)+sum(conditional_dim)... # Define the dense layers... self.dense1=nn.Linear(input_dim,10)... self.dense2=nn.Linear(10,1)... defforward(self,input):... data_input,conditional_input=input... # Concatenate data_input and conditional_input along the second dimension... discrim_in=torch.cat((data_input,conditional_input),dim=1)... # Pass the concatenated input through the dense layers... x=F.relu(self.dense1(discrim_in))... output=self.dense2(x)... returnoutput
Graph convolution layer used in MolGAN model.
MolGAN is a WGAN type model for generation of small molecules.
Not used directly, higher level layers like MolGANMultiConvolutionLayer use it.
This layer performs basic convolution on one-hot encoded matrices containing
atom and bond information. This layer also accepts three inputs for the case
when convolution is performed more than once and results of previous convolution
need to used. It was done in such a way to avoid creating another layer that
accepts three inputs rather than two. The last input layer is so-called
hidden_layer and it hold results of the convolution while first two are unchanged
input tensors.
Examples
See: MolGANMultiConvolutionLayer for using in layers.
Graph Aggregation layer used in MolGAN model.
MolGAN is a WGAN type model for generation of small molecules.
Performs aggregation on tensor resulting from convolution layers.
Given its simple nature it might be removed in future and moved to
MolGANEncoderLayer.
Multiple pass convolution layer used in MolGAN model.
MolGAN is a WGAN type model for generation of small molecules.
It takes outputs of previous convolution layer and uses
them as inputs for the next one.
It simplifies the overall framework, but might be moved to
MolGANEncoderLayer in the future in order to reduce number of layers.
units (Tuple, optional (default=(128,64)), min_length=2) – ist of dimensions used by consecutive convolution layers.
The more values the more convolution layers invoked.
nodes (int, optional (default=5)) – Number of features in node tensor
activation (function, optional (default=Tanh)) – activation function used across model, default is Tanh
dropout_rate (float, optional (default=0.0)) – Used by dropout layer
edges (int, optional (default=5)) – Controls how many dense layers use for single convolution unit.
Typically matches number of bond types used in the molecule.
name (string, optional (default="")) – Name of the layer
Main learning layer used by MolGAN model.
MolGAN is a WGAN type model for generation of small molecules.
It role is to further simplify model.
This layer can be manually built by stacking graph convolution layers
followed by graph aggregation.
units (List, optional (default=[(128,64),128])) – List of dimensions used by consecutive convolution layers.
The more values the more convolution layers invoked.
activation (function, optional (default=Tanh)) – activation function used across model, default is Tanh
dropout_rate (float, optional (default=0.0)) – Used by dropout layer
edges (int, optional (default=5)) – Controls how many dense layers use for single convolution unit.
Typically matches number of bond types used in the molecule.
nodes (int, optional (default=5)) – Number of features in node tensor
name (string, optional (default="")) – Name of the layer
inputs (List[torch.Tensor]) – The length of atom_to_pair should be same as n_pair_features.
Returns:
result – Tensor containing the mapping of the edge vector to a d × d matrix, where d denotes the dimension of the internal hidden representation of each node in the graph.
This class implements the core Weave convolution from the Google graph convolution paper [1]_
This is the Torch equivalent of the original implementation using Keras.
This model contains atom features and bond features
separately.Here, bond features are also called pair features.
There are 2 types of transformation, atom->atom, atom->pair, pair->atom, pair->pair that this model implements.
Examples
This layer expects 4 inputs in a list of the form [atom_features,
pair_features, pair_split, atom_to_pair]. We’ll walk through the structure
of these inputs. Let’s start with some basic definitions.
>>> importdeepchemasdc>>> importnumpyasnp
Suppose you have a batch of molecules
>>> smiles=["CCC","C"]
Note that there are 4 atoms in total in this system. This layer expects its input molecules to be batched together.
>>> total_n_atoms=4
Let’s suppose that we have a featurizer that computes n_atom_feat features per atom.
>>> n_atom_feat=75
Then conceptually, atom_feat is the array of shape (total_n_atoms,
n_atom_feat) of atomic features. For simplicity, let’s just go with a
random such matrix.
Let’s suppose we have n_pair_feat pairwise features
>>> n_pair_feat=14
For each molecule, we compute a matrix of shape (n_atoms*n_atoms,n_pair_feat) of pairwise features for each pair of atoms in the molecule.
Let’s construct this conceptually for our example.
pair_split is an index into pair_feat which tells us which atom each row belongs to. In our case, we hve
>>> pair_split=np.array([0,0,0,1,1,1,2,2,2,3])
That is, the first 9 entries belong to “CCC” and the last entry to “C”. The
final entry atom_to_pair goes in a little more in-depth than pair_split
and tells us the precise pair each pair feature belongs to. In our case
The 4 is total_num_atoms and the 10 is the total number of pairs. Where
does 50 come from? It’s from the default arguments n_atom_input_feat and
n_pair_input_feat.
n_atom_input_feat (int, optional (default 75)) – Number of features for each atom in input.
n_pair_input_feat (int, optional (default 14)) – Number of features for each pair of atoms in input.
n_atom_output_feat (int, optional (default 50)) – Number of features for each atom in output.
n_pair_output_feat (int, optional (default 50)) – Number of features for each pair of atoms in output.
n_hidden_AA (int, optional (default 50)) – Number of units(convolution depths) in corresponding hidden layer
n_hidden_PA (int, optional (default 50)) – Number of units(convolution depths) in corresponding hidden layer
n_hidden_AP (int, optional (default 50)) – Number of units(convolution depths) in corresponding hidden layer
n_hidden_PP (int, optional (default 50)) – Number of units(convolution depths) in corresponding hidden layer
update_pair (bool, optional (default True)) – Whether to calculate for pair features,
could be turned off for last layer
init (str, optional (default ‘xavier_uniform_’)) – Weight initialization for filters.
activation (str, optional (default 'relu')) – Activation function applied
batch_normalize (bool, optional (default True)) – If this is turned on, apply batch normalization before applying
activation functions on convolutional layers.
Implements the weave-gathering section of weave convolutions.
This is the Torch equivalent of the original implementation using Keras.
Implements the gathering layer from [1]_. The weave gathering layer gathers
per-atom features to create a molecule-level fingerprint in a weave
convolutional network. This layer can also performs Gaussian histogram
expansion as detailed in [1]_. Note that the gathering function here is
simply addition as in [1]_>
Examples
This layer expects 2 inputs in a list of the form [atom_features,
pair_features]. We’ll walk through the structure
of these inputs. Let’s start with some basic definitions.
>>> importdeepchemasdc>>> importnumpyasnp
Suppose you have a batch of molecules
>>> smiles=["CCC","C"]
Note that there are 4 atoms in total in this system. This layer expects its
input molecules to be batched together.
>>> total_n_atoms=4
Let’s suppose that we have n_atom_feat features per atom.
>>> n_atom_feat=75
Then conceptually, atom_feat is the array of shape (total_n_atoms,
n_atom_feat) of atomic features. For simplicity, let’s just go with a
random such matrix.
n_input (int, optional (default 128)) – number of features for each input molecule
gaussian_expand (boolean, optional (default True)) – Whether to expand each dimension of atomic features by gaussian histogram
compress_post_gaussian_expansion (bool, optional (default False)) – If True, compress the results of the Gaussian expansion back to the
original dimensions of the input by using a linear layer with specified
activation function. Note that this compression was not in the original
paper, but was present in the original DeepChem implementation so is
left present for backwards compatibility.
init (str, optional (default ‘xavier_uniform_’)) – Weight initialization for filters if compress_post_gaussian_expansion
is True.
activation (str, optional (default 'tanh')) – Activation function applied for filters if
compress_post_gaussian_expansion is True.
We construct a Gaussian at gaussian_memberships[i][0] with standard
deviation gaussian_memberships[i][1]. Each feature in x is assigned
the probability of falling in each Gaussian, and probabilities are
normalized across the 11 different Gaussians.
This class implements the Global Message Passing Layer from the Molecular Mechanics-Driven Graph Neural Network
with Multiplex Graph for Molecular Structures(MXMNet) paper [1]_.
This layer consists of two message passing steps and an update step between them.
Message passing and message aggregation(sum) is handled by propagate().
References
Examples
The provided example demonstrates how to use the GlobalMessagePassing layer by creating an instance, passing input tensors (node_features, edge_attributes, edge_indices) through it, and checking the shape of the output.
Initializes variables and creates a configuration dictionary with specific values.
This layer implements a basis layer for the MXMNet model using Bessel functions.
The basis layer is used to model radial symmetry in molecular systems.
The output of the layer is given by:
output = envelope(dist / cutoff) * (freq * dist / cutoff).sin()
Reset and initialize the learnable parameters of the MXMNet Bessel Basis Layer.
The ‘freq’ tensor, representing the frequencies of the Bessel functions, is set up with initial values proportional to π (PI) and becomes a learnable parameter.
The ‘freq’ tensor will be updated during the training process to optimize the performance of the MXMNet model for the specific task it is being trained on.
DTNN is based on the many-body Hamiltonian concept, which is a fundamental principle in quantum mechanics.
The DTNN recieves a molecule’s distance matrix and membership of its atom from its Coulomb Matrix representation.
Then, it iteratively refines the representation of each atom by considering its interactions with neighboring atoms.
Finally, it predicts the energy of the molecule by summing up the energies of the individual atoms.
In this class, we establish a sequential model for the Deep Tensor Neural Network (DTNN) [1]_.
Add random noise to the embedding and include a corresponding loss.
This adds random noise to the encoder, and also adds a constraint term to
the loss that forces the embedding vector to have a unit Gaussian distribution.
We can then pick random vectors from a Gaussian distribution, and the output
sequences should follow the same distribution as the training data.
We can use this layer with an AutoEncoder, which makes it a Variational
AutoEncoder. The constraint term in the loss is initially set to 0, so the
optimizer just tries to minimize the reconstruction loss. Once it has made
reasonable progress toward that, the constraint term can be gradually turned
back on. The range of steps over which this happens is configured by modifying
the annealing_start_step and annealing final_step parameter.
It takes input sequences and converts them into a fixed-size context vector
called the “embedding”. This vector contains all relevant information from
the input sequence. This context vector is then used by the decoder to
generate the output sequence and can also be used as a representation of the
input sequence for other Models.
The decoder transforms the embedding vector into the output sequence.
It is trained to predict the next token in the sequence given the previous
tokens in the sequence. It uses the context vector from the encoder to
help generate the correct token in the sequence.
Implements sequence to sequence translation models.
The model is based on the description in Sutskever et al., “Sequence to
Sequence Learning with Neural Networks” (https://arxiv.org/abs/1409.3215),
although this implementation uses GRUs instead of LSTMs. The goal is to
take sequences of tokens as input, and translate each one into a different
output sequence. The input and output sequences can both be of variable
length, and an output sequence need not have the same length as the input
sequence it was generated from. For example, these models were originally
developed for use in natural language processing. In that context, the
input might be a sequence of English words, and the output might be a
sequence of French words. The goal would be to train the model to translate
sentences from English to French.
The model consists of two parts called the “encoder” and “decoder”. Each one
consists of a stack of recurrent layers. The job of the encoder is to
transform the input sequence into a single, fixed length vector called the
“embedding”. That vector contains all relevant information from the input
sequence. The decoder then transforms the embedding vector into the output
sequence.
These models can be used for various purposes. First and most obviously,
they can be used for sequence to sequence translation. In any case where you
have sequences of tokens, and you want to translate each one into a different
sequence, a SeqToSeq model can be trained to perform the translation.
Another possible use case is transforming variable length sequences into
fixed length vectors. Many types of models require their inputs to have a
fixed shape, which makes it difficult to use them with variable sized inputs
(for example, when the input is a molecule, and different molecules have
different numbers of atoms). In that case, you can train a SeqToSeq model as
an autoencoder, so that it tries to make the output sequence identical to the
input one. That forces the embedding vector to contain all information from
the original sequence. You can then use the encoder for transforming
sequences into fixed length embedding vectors, suitable to use as inputs to
other types of models.
Another use case is to train the decoder for use as a generative model. Here
again you begin by training the SeqToSeq model as an autoencoder. Once
training is complete, you can supply arbitrary embedding vectors, and
transform each one into an output sequence. When used in this way, you
typically train it as a variational autoencoder. This adds random noise to
the encoder, and also adds a constraint term to the loss that forces the
embedding vector to have a unit Gaussian distribution. You can then pick
random vectors from a Gaussian distribution, and the output sequences should
follow the same distribution as the training data.
When training as a variational autoencoder, it is best to use KL cost
annealing, as described in https://arxiv.org/abs/1511.06349. The constraint
term in the loss is initially set to 0, so the optimizer just tries to
minimize the reconstruction loss. Once it has made reasonable progress
toward that, the constraint term can be gradually turned back on. The range
of steps over which this happens is configurable.
In this class, we establish a sequential model for the Sequence to Sequence (SeqToSeq) [1]_.
max_output_length (int) – Maximum length of output sequence.
encoder_layers (int (default 4)) – Number of recurrent layers in the encoder
decoder_layers (int (default 4)) – Number of recurrent layers in the decoder
embedding_dimension (int (default 512)) – Width of the embedding vector. This also is the width of all recurrent
layers.
dropout (float (default 0.0)) – Dropout probability to use during training.
variational (bool (default False)) – If True, train the model as a variational autoencoder. This adds random
noise to the encoder, and also constrains the embedding to follow a unit
Gaussian distribution.
annealing_start_step (int (default 5000)) – the step (that is, batch) at which to begin turning on the constraint
term for KL cost annealing.
annealing_final_step (int (default 10000)) – the tep (that is, batch) at which to finish turning on the constraint
term for KL cost annealing.
A Pytorch Module implementing the ferminet’s electron features interaction layer _[1]. This is a helper class for the Ferminet model.
The layer consists of 2 types of linear layers - v for the one elctron features and w for the two electron features. The number and dimensions
of each layer depends on the number of atoms and electrons in the molecule system.
one_electron (torch.Tensor) – The one electron feature which has the shape (batch_size, number of electrons, number of atoms * 4). Here the last dimension contains
the electron’s distance from each of the atom as a vector concatenated with norm of that vector.
two_electron (torch.Tensor) – The two electron feature which has the shape (batch_size, number of electrons, number of electron , 4). Here the last dimension contains
the electron’s distance from the other electrons as a vector concatenated with norm of that vector.
Returns:
one_electron (torch.Tensor) – The one electron feature after passing through the layer which has the shape (batch_size, number of electrons, n_one shape).
two_electron (torch.Tensor) – The two electron feature after passing through the layer which has the shape (batch_size, number of electrons, number of electron , n_two shape).
A Pytorch Module implementing the ferminet’s envlope layer _[1], which is used to calculate the spin up and spin down orbital values.
This is a helper class for the Ferminet model.
The layer consists of 4 types of parameter lists - envelope_w, envelope_g, sigma and pi, which helps to calculate the orbital vlaues.
one_electron (torch.Tensor) – Torch tensor which is output from FerminElectronFeature layer in the shape of (batch_size, number of elctrons, n_one layer size).
one_electron_vector_permuted (torch.Tensor) – Torch tensor which is shape permuted vector of the original one_electron vector tensor. shape of the tensor should be (batch_size, number of atoms, number of electrons, 3).
Returns:
psi_up – Torch tensor with a scalar value containing the sampled wavefunction value for each batch.
The MXMNetLocalMessagePassing class defines a local message passing layer used in the MXMNet model [1]_.
This layer integrates cross-layer mappings inside the local message passing, allowing for the transformation
of input tensors representing pairwise distances and angles between atoms in a molecular system.
The layer aggregates information using message passing and updates atom representations accordingly.
The 3-step message passing scheme is proposed in the paper [1]_.
Step 1 contains Message Passing 1 that captures the two-hop angles and related pairwise distances to update edge-level embeddings {mji}.
Step 2 contains Message Passing 2 that captures the one-hop angles and related pairwise distances to further update {mji}.
Step 3 finally aggregates {mji} to update the node-level embedding hi.
These steps in the t-th iteration can be formulated as follows:
dim (int) – The dimension of the input and output tensors for the local message passing layer.
activation_fn (Union[Callable, str], optional (default: 'silu')) – The activation function to be used in the multilayer perceptrons (MLPs) within the layer.
The forward method performs the computation for the MXMNetLocalMessagePassing Layer.
This method processes the input tensors representing atom features, radial basis functions (RBF), and spherical basis functions (SBF) using message passing over the molecular graph. The message passing updates the atom representations, and the resulting tensor represents the updated atom feature after local message passing.
Parameters:
node_features (torch.Tensor) – Input tensor representing atom features.
It takes pairwise distances and angles between atoms as input and combines radial basis functions with spherical harmonic
functions to generate a fixed-size representation that captures both radial and orientation information. This type of
representation is commonly used in molecular modeling and simulations to capture the behavior of atoms and molecules in
chemical systems.
Inside the initialization, Bessel basis functions and real spherical harmonic functions are generated.
The Bessel basis functions capture the radial information, and the spherical harmonic functions capture the orientation information.
These functions are generated based on the provided num_spherical and num_radial parameters.
num_spherical (int) – The number of spherical harmonic functions to use. These functions capture orientation information related to atom positions.
num_radial (int) – The number of radial basis functions to use. These functions capture information about pairwise distances between atoms.
cutoff (float, optional (default 5.0)) – The cutoff distance for the radial basis functions. It specifies the distance beyond which the interactions are ignored.
envelope_exponent (int, optional (default 5)) – The exponent for the envelope function. It controls the degree of damping for the radial basis functions.
The following layers are used for implementing GROVER model as described in the paper <Self-Supervised Graph Transformer on Large-Scale Molecular Data <https://drug.ai.tencent.com/publications/GROVER.pdf>_
Performs Message Passing to generate encodings for the molecule.
Parameters:
atom_messages (bool) – True if encoding atom-messages else False.
init_message_dim (int) – Dimension of embedding message.
attach_feats (bool) – Set to True if additional features are passed along with node/edge embeddings.
attached_feat_fdim (int) – Dimension of additional features when attach_feats is True
undirected (bool) – If set to True, the graph is considered as an undirected graph.
depth (int) – number of hops in a message passing iteration
dynamic_depth (str, default: none) – If set to uniform for randomly sampling dynamic depth from an uniform distribution else if set to truncnorm, dynamic depth is sampled from a truncated normal distribution.
input_layer (str) – If set to fc, adds an initial feed-forward layer. If set to none, does not add an initial feed forward layer.
Although the recipe for forward pass needs to be defined within
this function, one should call the Module instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
Although the recipe for forward pass needs to be defined within
this function, one should call the Module instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
Although the recipe for forward pass needs to be defined within
this function, one should call the Module instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
The GroverTransEncoder layer is used for encoding a molecular graph.
The layer returns 4 outputs. They are atom messages aggregated from atom hidden states,
atom messages aggregated from bond hidden states, bond messages aggregated from atom hidden
states, bond messages aggregated from bond hidden states.
Parameters:
hidden_size (int) – the hidden size of the model.
edge_fdim (int) – the dimension of additional feature for edge/bond.
node_fdim (int) – the dimension of additional feature for node/atom.
depth (int) – Dynamic message passing depth for use in MPNEncoder
undirected (bool) – The message passing is undirected or not
dropout (float) – the dropout ratio
activation (str) – the activation function
num_mt_block (int) – the number of mt block.
num_head (int) – the number of attention AttentionHead.
bias (bool) – enable bias term in all linear layers.
res_connection (bool) – enables the skip-connection in MTBlock.
graph_batch (List[torch.Tensor]) – A list containing f_atoms, f_bonds, a2b, b2a, b2revb, a_scope, b_scope, a2a
Returns:
embedding – Returns a dictionary of embeddings. The embeddings are:
- atom_from_atom: node messages aggregated from node hidden states
- bond_from_atom: bond messages aggregated from bond hidden states
- atom_from_bond: node message aggregated from bond hidden states
- bond_from_bond: bond messages aggregated from bond hidden states.
graph_batch (List[torch.Tensor]) – A list containing f_atoms, f_bonds, a2b, b2a, b2revb, a_scope, b_scope, a2a
Returns:
embedding – Returns a dictionary of embeddings. The embeddings are:
- atom_from_atom: node messages aggregated from node hidden states
- bond_from_atom: bond messages aggregated from bond hidden states
- atom_from_bond: node message aggregated from bond hidden states
- bond_from_bond: bond messages aggregated from bond hidden states.
The GroverAtomVocabPredictor module is used for predicting atom-vocabulary
for the self-supervision task in Grover architecture. In the self-supervision tasks,
one task is to learn contextual-information of nodes (atoms).
Contextual information are encoded as strings, like C_N-DOUBLE1_O-SINGLE1.
The module accepts an atom encoding and learns to predict the contextual information
of the atom as a multi-class classification problem.
Layer for learning contextual information for bonds.
The layer is used in Grover architecture to learn contextual information of a bond by predicting
the context of a bond from the bond embedding in a multi-class classification setting.
The contextual information of a bond are encoded as strings (ex: ‘(DOUBLE-STEREONONE-NONE)_C-(SINGLE-STEREONONE-NONE)2’).
The functional group prediction task for self-supervised learning.
Molecules have functional groups in them. This module is used for predicting
the functional group and the problem is formulated as an multi-label classification problem.
Parameters:
functional_group_size (int,) – size of functional group
The forward function for the GroverFunctionalGroupPredictor (semantic motif prediction) layer.
It takes atom/bond embeddings produced from node and bond hidden states from GroverEmbedding module
and the atom, bond scopes and produces prediction logits for different each embedding.
The scopes are used to differentiate atoms/bonds belonging to a molecule in a batched molecular graph.
Parameters:
embedding (Dict) – The input embeddings organized as an dictionary. The input embeddings are output of GroverEmbedding layer.
atom_scope (List) – The scope for atoms.
bond_scope (List) – The scope for bonds
Returns:
preds (Dict) – A dictionary containing the predicted logits of functional group from four different types of input embeddings. The key and their corresponding predictions
are described below. –
atom_from_atom - prediction logits from atom embeddings generated via node hidden states
atom_from_bond - prediction logits from atom embeddings generated via bond hidden states
bond_from_atom - prediction logits from bond embeddings generated via node hidden states
bond_from_bond - prediction logits from bond embeddings generated via bond hidden states
The GroverPretrain module is used for training an embedding based on the Grover Pretraining task.
Grover pretraining is a self-supervised task where an embedding is trained to learn the contextual
information of atoms and bonds along with graph-level properties, which are functional groups
in case of molecular graphs.
Parameters:
embedding (nn.Module) – An embedding layer to generate embedding from input molecular graph
atom_vocab_task_atom (nn.Module) – A layer used for predicting atom vocabulary from atom features generated via atom hidden states.
atom_vocab_task_bond (nn.Module) – A layer used for predicting atom vocabulary from atom features generated via bond hidden states.
bond_vocab_task_atom (nn.Module) – A layer used for predicting bond vocabulary from bond features generated via atom hidden states.
bond_vocab_task_bond (nn.Module) – A layer used for predicting bond vocabulary from bond features generated via bond hidden states.
Returns:
prediction_logits (Tuple) – A tuple of prediction logits containing prediction logits of atom vocabulary task from atom hidden state,
prediction logits for atom vocabulary task from bond hidden states, prediction logits for bond vocabulary task
from atom hidden states, prediction logits for bond vocabulary task from bond hidden states, functional
group prediction logits from atom embedding generated from atom and bond hidden states, functional group
prediction logits from bond embedding generated from atom and bond hidden states.
For a graph level prediction task, the GroverFinetune model uses node/edge embeddings
output by the GroverEmbeddong layer and applies a readout function on it to get
graph embeddings and use additional MLP layers to predict the property of the molecular graph.
Parameters:
embedding (nn.Module) – An embedding layer to generate embedding from input molecular graph
readout (nn.Module) – A readout layer to perform readout atom and bond hidden states
mol_atom_from_atom_ffn (nn.Module) – A feed forward network which learns representation from atom messages generated via atom hidden states of a molecular graph
mol_atom_from_bond_ffn (nn.Module) – A feed forward network which learns representation from atom messages generated via bond hidden states of a molecular graph
The Scaled Dot Production Attention operation from Attention Is All You Need <https://arxiv.org/abs/1706.03762>_ paper.
Example
>>> fromdeepchem.modelsimportScaledDotProductAttentionasSDPA>>> attn=SDPA()>>> x=torch.ones(1,5)>>> # Linear layers for making query, key, value>>> Q,K,V=nn.Parameter(torch.ones(5)),nn.Parameter(torch.ones(5)),nn.Parameter(torch.ones(5))>>> query,key,value=Q*x,K*x,V*x>>> x_out,attn_score=attn(query,key,value)
Given $Xin mathbb{R}^{n imes in_feature}$, the attention is calculated by: $a=softmax(W_2tanh(W_1X))$, where
$W_1 in mathbb{R}^{hidden imes in_feature}$, $W_2 in mathbb{R}^{out_feature imes hidden}$.
The final output is $y=aX$ where $y in mathbb{R}^{n imes out_feature}$.
The readout module is used for performing readouts on batched graphs to
convert node embeddings/edge embeddings into graph embeddings. It is used
in the Grover architecture to generate a graph embedding from node and edge
embeddings. The generate embedding can be used in downstream tasks like graph
classification or graph prediction problems.
Parameters:
rtype (str) – Readout type, can be ‘mean’ or ‘self-attention’
in_features (int) – Size fof input features
attn_hidden_size (int) – If readout type is attention, size of hidden layer in attention network.
attn_out_size (int) – If readout type is attention, size of attention out layer.
Given a batch node/edge embedding and a scope list, produce the graph-level embedding by scope.
Parameters:
embeddings (torch.Tensor) – The embedding matrix, num_nodes x in_features or num_edges x in_features.
scope (List[List]) – A list, in which the element is a list [start, range]. start is the index,
range is the length of scope. (start + range = end)
Returns:
graph_embeddings – A stacked tensor containing graph embeddings of shape len(scope) x in_features if readout type is mean or len(scope) x attn_out_size when readout type is self-attention.