mdlearn.nn.models.lstm
Warning
LSTM models are still under development, use with caution!
Classes
|
LSTM model to predict the dynamics for a time series of feature vectors. |
|
Trainer class to fit an LSTM model to a time series of feature vectors. |
- class mdlearn.nn.models.lstm.LSTM(*args: Any, **kwargs: Any)
LSTM model to predict the dynamics for a time series of feature vectors.
- __init__(input_size: int, hidden_size: Optional[int] = None, num_layers: int = 1, bias: bool = True, dropout: float = 0.0, bidirectional: bool = False)
- Parameters
input_size (int) – The number of expected features in the input
x
.hidden_size (Optional[int], default=None) – The number of features in the hidden state h. By default, the
hidden_size
will be equal to theinput_size
in order to propogate the dynamics.num_layers (int, default=1) – Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two LSTMs together to form a stacked LSTM, with the second LSTM taking in outputs of the first LSTM and computing the final results.
bias (bool, default=True) – If False, then the layer does not use bias weights b_ih and b_hh. Default: True
dropout (float, default=0.0) – If non-zero, introduces a Dropout layer on the outputs of each LSTM layer except the last layer, with dropout probability equal to dropout.
bidirectional (bool, default=False) – If True, becomes a bidirectional LSTM.
- forward(x: torch.Tensor) torch.Tensor
- Parameters
x (torch.Tensor) – Tensor of shape BxNxD for B batches of N examples by D feature dimensions.
- Returns
torch.Tensor – The predicted tensor of size (B, N, hidden_size).
- mse_loss(y_true: torch.Tensor, y_pred: torch.Tensor, reduction: str = 'mean') torch.Tensor
Compute the MSE loss between
y_true
andy_pred
.- Parameters
y_true (torch.Tensor) – The true data.
y_pred (torch.Tensor) – The prediction.
reduction (str, default=”mean”) – The reduction strategy for the F.mse_loss function.
- Returns
torch.Tensor – The MSE loss between
y_true
andy_pred
.
- class mdlearn.nn.models.lstm.LSTMTrainer(input_size: int, hidden_size: Optional[int] = None, num_layers: int = 1, bias: bool = True, dropout: float = 0.0, bidirectional: bool = False, window_size: int = 10, horizon: int = 1, seed: int = 42, in_gpu_memory: bool = False, num_data_workers: int = 0, prefetch_factor: int = 2, split_pct: float = 0.8, split_method: str = 'partition', batch_size: int = 128, shuffle: bool = True, device: str = 'cpu', optimizer_name: str = 'RMSprop', optimizer_hparams: Dict[str, Any] = {'lr': 0.001, 'weight_decay': 1e-05}, scheduler_name: Optional[str] = None, scheduler_hparams: Dict[str, Any] = {}, epochs: int = 100, verbose: bool = False, clip_grad_max_norm: float = 10.0, checkpoint_log_every: int = 10, plot_log_every: int = 10, plot_n_samples: int = 10000, plot_method: Optional[str] = 'TSNE', train_subsample_pct: float = 1.0, valid_subsample_pct: float = 1.0, use_wandb: bool = False)
Trainer class to fit an LSTM model to a time series of feature vectors.
- __init__(input_size: int, hidden_size: Optional[int] = None, num_layers: int = 1, bias: bool = True, dropout: float = 0.0, bidirectional: bool = False, window_size: int = 10, horizon: int = 1, seed: int = 42, in_gpu_memory: bool = False, num_data_workers: int = 0, prefetch_factor: int = 2, split_pct: float = 0.8, split_method: str = 'partition', batch_size: int = 128, shuffle: bool = True, device: str = 'cpu', optimizer_name: str = 'RMSprop', optimizer_hparams: Dict[str, Any] = {'lr': 0.001, 'weight_decay': 1e-05}, scheduler_name: Optional[str] = None, scheduler_hparams: Dict[str, Any] = {}, epochs: int = 100, verbose: bool = False, clip_grad_max_norm: float = 10.0, checkpoint_log_every: int = 10, plot_log_every: int = 10, plot_n_samples: int = 10000, plot_method: Optional[str] = 'TSNE', train_subsample_pct: float = 1.0, valid_subsample_pct: float = 1.0, use_wandb: bool = False)
- Parameters
input_size (int) – The number of expected features in the input x.
hidden_size (Optional[int], default=None) – The number of features in the hidden state h. By default, the
hidden_size
will be equal to theinput_size
in order to propogate the dynamics.num_layers (int, default=1) – Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two LSTMs together to form a stacked LSTM, with the second LSTM taking in outputs of the first LSTM and computing the final results.
bias (bool, default=True) – If False, then the layer does not use bias weights b_ih and b_hh. Default: True
dropout (float, default=0.0) – If non-zero, introduces a Dropout layer on the outputs of each LSTM layer except the last layer, with dropout probability equal to dropout.
bidirectional (bool, default=False) – If True, becomes a bidirectional LSTM.
window_size (int, default=10) – Number of timesteps considered for prediction.
horizon (int, default=1) – How many time steps to predict ahead.
seed (int, default=42) – Random seed for torch, numpy, and random module.
in_gpu_memory (bool, default=False) – If True, will pre-load the entire
data
array to GPU memory.num_data_workers (int, default=0) – How many subprocesses to use for data loading. 0 means that the data will be loaded in the main process.
prefetch_factor (int, by default=2) – Number of samples loaded in advance by each worker. 2 means there will be a total of 2 * num_workers samples prefetched across all workers.
split_pct (float, default=0.8) – Proportion of data set to use for training. The rest goes to validation.
split_method (str, default=”random”) – Method to split the data. For random split use “random”, for a simple partition, use “partition”.
batch_size (int, default=128) – Mini-batch size for training.
shuffle (bool, default=True) – Whether to shuffle training data or not.
device (str, default=”cpu”) – Specify training hardware either
cpu
orcuda
for GPU devices.optimizer_name (str, default=”RMSprop”) – Name of the PyTorch optimizer to use. Matches PyTorch optimizer class name.
optimizer_hparams (Dict[str, Any], default={“lr”: 0.001, “weight_decay”: 0.00001}) – Dictionary of hyperparameters to pass to the chosen PyTorch optimizer.
scheduler_name (Optional[str], default=None) – Name of the PyTorch learning rate scheduler to use. Matches PyTorch optimizer class name.
scheduler_hparams (Dict[str, Any], default={}) – Dictionary of hyperparameters to pass to the chosen PyTorch learning rate scheduler.
epochs (int, default=100) – Number of epochs to train for.
verbose (bool, default=False) – If True, will print training and validation loss at each epoch.
clip_grad_max_norm (float, default=10.0) – Max norm of the gradients for gradient clipping for more information see:
torch.nn.utils.clip_grad_norm_
documentation.checkpoint_log_every (int, default=10) – Epoch interval to log a checkpoint file containing the model weights, optimizer, and scheduler parameters.
plot_log_every (int, default=10) – Epoch interval to log a visualization plot of the latent space.
plot_n_samples (int, default=10000) – Number of validation samples to use for plotting.
plot_method (Optional[str], default=”TSNE”) – The method for visualizing the latent space or if visualization should not be run, set
plot_method=None
. If using"TSNE"
, it will attempt to use the RAPIDS.ai GPU implementation and will fallback to the sklearn CPU implementation if RAPIDS.ai is unavailable.train_subsample_pct (float, default=1.0) – Percentage of training data to use during hyperparameter sweeps.
valid_subsample_pct (float, default=1.0) – Percentage of validation data to use during hyperparameter sweeps.
use_wandb (bool, default=False) – If True, will log results to wandb.
- Raises
ValueError –
split_pct
should be between 0 and 1.ValueError –
train_subsample_pct
should be between 0 and 1.ValueError –
valid_subsample_pct
should be between 0 and 1.ValueError – Specified
device
ascuda
, but it is unavailable.
- fit(X: numpy.ndarray, scalars: Dict[str, numpy.ndarray] = {}, output_path: Union[str, pathlib.Path] = './', checkpoint: Optional[Union[str, pathlib.Path]] = None)
Trains the LSTM on the input data
X
.- Parameters
X (np.ndarray) – Input features vectors of shape (N, D) where N is the number of data examples, and D is the dimension of the feature vector.
scalars (Dict[str, np.ndarray], default={}) – Dictionary of scalar arrays. For instance, the root mean squared deviation (RMSD) for each feature vector can be passed via
{"rmsd": np.array(...)}
. The dimension of each scalar array should match the number of input feature vectors N.output_path (PathLike, default=”./”) – Path to write training results to. Makes an
output_path/checkpoints
folder to save model checkpoint files, andoutput_path/plots
folder to store latent space visualizations.checkpoint (Optional[PathLike], default=None) – Path to a specific model checkpoint file to restore training.
- Raises
ValueError – If
X
does not have two dimensions. For scalar time series, please reshape to (N, 1).TypeError – If
scalars
is not type dict. A common error is to passoutput_path
as the second argument.NotImplementedError – If using a learning rate scheduler other than
ReduceLROnPlateau
, a step function will need to be implemented.
- predict(X: numpy.ndarray, inference_batch_size: int = 512, checkpoint: Optional[Union[str, pathlib.Path]] = None) Tuple[numpy.ndarray, float]
Predict using the LSTM.
- Parameters
X (np.ndarray) – The input data to predict on.
inference_batch_size (int, default=512) – The batch size for inference.
checkpoint (Optional[PathLike], default=None) – Path to a specific model checkpoint file.
- Returns
Tuple[np.ndarray, float] – The predictions and the average MSE loss.