mdlearn.data.datasets.time_contact_map

ContactMapTimeSeriesDataset Dataset.

Classes

ContactMapTimeSeriesDataset(*args, **kwargs)

PyTorch Dataset class to load contact matrix data in the format of a time series.

class mdlearn.data.datasets.time_contact_map.ContactMapTimeSeriesDataset(*args: Any, **kwargs: Any)

PyTorch Dataset class to load contact matrix data in the format of a time series.

__init__(path: Union[str, pathlib.Path], shape: Tuple[int, ...], lag_time: int = 1, dataset_name: str = 'contact_map', scalar_dset_names: List[str] = [], values_dset_name: Optional[str] = None, scalar_requires_grad: bool = False, in_memory: bool = True)

Parameters

path (PathLike) – Path to HDF5 file containing contact matrices.
shape (Tuple[int, …]) – Shape of contact matrices required by the model (H, W), may be (1, H, W).
lag_time (int) – Delay time forward or backward in the input data. The time-lagged correlations is computed between X[t] and X[t+lag_time].
dataset_name (str) – Name of contact map dataset in HDF5 file.
scalar_dset_names (List[str]) – List of scalar dataset names inside HDF5 file to be passed to training logs.
values_dset_name (str, optional) – Name of HDF5 dataset field containing optional values of the entries the distance/contact matrix. By default, values are all assumed to be 1 corresponding to a binary contact map and created on the fly.
scalar_requires_grad (bool) – Sets requires_grad torch.Tensor parameter for scalars specified by scalar_dset_names. Set to True, to use scalars for multi-task learning. If scalars are only required for plotting, then set it as False.
in_memory (bool) – If True, pull data stored in HDF5 from disk to numpy arrays. Otherwise, read each batch from HDF5 on the fly.

Examples

>>> dataset = ContactMapTimeSeriesDataset("contact_maps.h5", (28, 28))
>>> dataset[0]
{'X_t': torch.Tensor(..., dtype=float32), 'X_t_tau': torch.Tensor(..., dtype=float32), 'index': tensor(0)}
>>> dataset[0]["X_t"].shape
(28, 28)
>>> dataset[0]["X_t_tau"].shape
(28, 28)