mdlearn.data.datasets.feature_vector

Classes

FeatureVectorDataset(*args, **kwargs)

PyTorch Dataset class to load vector or scalar data directly from a np.ndarray.

FeatureVectorHDF5Dataset(*args, **kwargs)

PyTorch Dataset class to load vector or scalar data from an HDF5 file.

TimeFeatureVectorDataset(*args, **kwargs)

PyTorch Dataset class to handle time series feature vectors and optional scalars directly from a np.ndarray.

class mdlearn.data.datasets.feature_vector.FeatureVectorDataset(*args: Any, **kwargs: Any)

PyTorch Dataset class to load vector or scalar data directly from a np.ndarray.

__init__(data: numpy.ndarray, scalars: Dict[str, numpy.ndarray] = {}, scalar_requires_grad: bool = False, in_gpu_memory: bool = False)
Parameters
  • data (np.ndarray) – Input features vectors of shape (N, D) where N is the number of data examples, and D is the dimension of the feature vector.

  • scalars (Dict[str, np.ndarray], default={}) – Dictionary of scalar arrays. For instance, the root mean squared deviation (RMSD) for each feature vector can be passed via {"rmsd": np.array(...)}. The dimension of each scalar array should match the number of input feature vectors N.

  • scalar_requires_grad (bool, default=False) – Sets requires_grad torch.Tensor parameter for scalars specified by scalars. Set to True, to use scalars for multi-task learning. If scalars are only required for plotting, then set it as False.

  • in_gpu_memory (bool, default=False) – If True, will pre-load the entire data array to GPU memory.

class mdlearn.data.datasets.feature_vector.FeatureVectorHDF5Dataset(*args: Any, **kwargs: Any)

PyTorch Dataset class to load vector or scalar data from an HDF5 file.

__init__(path: Union[str, pathlib.Path], dataset_name: str, scalar_dset_names: List[str] = [], scalar_requires_grad: bool = False, in_memory: bool = True)
Parameters
  • path (PathLike) – Path to h5 file containing contact matrices.

  • dataset_name (str) – Path to contact maps in HDF5 file.

  • scalar_dset_names (List[str], default=[]) – List of scalar dataset names inside HDF5 file to be passed to training logs.

  • scalar_requires_grad (bool, default=False) – Sets requires_grad torch.Tensor parameter for scalars specified by scalar_dset_names. Set to True, to use scalars for multi-task learning. If scalars are only required for plotting, then set it as False.

  • in_memory (bool, default=True) – If True, pull data stored in HDF5 from disk to numpy arrays. Otherwise, read each batch from HDF5 on the fly.

class mdlearn.data.datasets.feature_vector.TimeFeatureVectorDataset(*args: Any, **kwargs: Any)

PyTorch Dataset class to handle time series feature vectors and optional scalars directly from a np.ndarray.

__init__(data: numpy.ndarray, scalars: Dict[str, numpy.ndarray] = {}, scalar_requires_grad: bool = False, in_gpu_memory: bool = False, window_size: int = 10, horizon: int = 1)
Parameters
  • window_size (int, default=10) – Number of timesteps considered for prediction.

  • horizon (int, default=1) – How many time steps to predict ahead.

Raises

ValueError – If the sum of window_size and horizon is longer than the input data.