mdlearn.data.preprocess.simulation
Module for extracting outputs from molecular dynamics trajectories.
Functions
|
Preprocess simulation data from many trajectory files in parallel. |
|
Preprocess simulation data from a trajectory file. |
Classes
|
Process contact maps from a MD trajectory. |
|
Process coordinates from a MD trajectory. |
|
Process RMSD from a MD trajectory. |
|
Protocol for simulation data preprocessors. |
- class mdlearn.data.preprocess.simulation.ContactMapPreprocessor(top_file: Path | str, traj_file: Path | str, cutoff: float = 8.0, selection: str = 'protein and name CA')
Process contact maps from a MD trajectory.
- __init__(top_file: Path | str, traj_file: Path | str, cutoff: float = 8.0, selection: str = 'protein and name CA') None
Initialize the contact map preprocessor.
- Parameters:
top_file (Path | str) – Topology file of the simulation.
traj_file (Path | str) – Trajectory file of the simulation.
cutoff (float) – Cutoff distance (in Angstroms) for contact map calculation, defaults to 8.0.
selection (str) – Atom selection string for the reference structure, defaults to ‘protein and name CA’.
- get() numpy.ndarray
Get contact maps of a trajectory file.
- Returns:
np.ndarray – The contact maps of the trajectory with shape (n_frames, R) where R is a ragged dimension containing the concatenated row and column indices of the ones in the contact map.
- class mdlearn.data.preprocess.simulation.CoordinatePreprocessor(top_file: Path | str, traj_file: Path | str, ref_file: Path | str, selection: str = 'protein and name CA')
Process coordinates from a MD trajectory.
- __init__(top_file: Path | str, traj_file: Path | str, ref_file: Path | str, selection: str = 'protein and name CA') None
Initialize the coordinate preprocessor.
- Parameters:
top_file (Path | str) – Topology file of the simulation.
traj_file (Path | str) – Trajectory file of the simulation.
ref_file (Path | str) – Reference structure file to align the trajectory.
selection (str) – Atom selection string for the reference structure, defaults to ‘protein and name CA’.
- get() numpy.ndarray
Get coordinates of a trajectory file.
- Returns:
np.ndarray – Coordinates of the trajectory. The shape of the array is (n_frames, n_atoms, 3), where n_frames is the number of frames in the trajectory, n_atoms is the number of atoms in the selection, and 3 corresponds to x, y, and z.
- class mdlearn.data.preprocess.simulation.RmsdPreprocessor(top_file: Path | str, traj_file: Path | str, ref_file: Path | str, selection: str = 'protein and name CA')
Process RMSD from a MD trajectory.
- __init__(top_file: Path | str, traj_file: Path | str, ref_file: Path | str, selection: str = 'protein and name CA') None
Initialize the RMSD preprocessor.
- Parameters:
top_file (Path | str) – Topology file of the simulation.
traj_file (Path | str) – Trajectory file of the simulation.
ref_file (Path | str) – Reference structure file to calculate RMSD.
selection (str) – Atom selection string for the reference structure, defaults to ‘protein and name CA’.
- get() numpy.ndarray
Get RMSD to reference state of a trajectory file.
- Returns:
np.ndarray – RMSD to reference state of the trajectory. The shape of the array is (n_frames,), where n_frames is the number of frames in the trajectory.
- class mdlearn.data.preprocess.simulation.SimulationPreprocessor(top_file: Path | str, traj_file: Path | str, *args: Any, **kwargs: dict[str, Any])
Protocol for simulation data preprocessors.
- __init__(top_file: Path | str, traj_file: Path | str, *args: Any, **kwargs: dict[str, Any]) None
Initialize the simulation preprocessor.
- Parameters:
top_file (Path | str) – Topology file of the simulation.
traj_file (Path | str) – Trajectory file of the simulation.
*args (Any) – Positional arguments for the preprocessor.
**kwargs (dict[str, Any]) – Keyword arguments for the preprocessor.
- get() numpy.ndarray
Get simulation data from a trajectory file.
- Returns:
np.ndarray – Simulation data from the trajectory.
- mdlearn.data.preprocess.simulation.parallel_preprocess(topic: str, input_dir: Path | str, output_dir: Path | str, top_ext: str = '.pdb', traj_ext: str = '.dcd', num_workers: int = 10, **kwargs: Any) None
Preprocess simulation data from many trajectory files in parallel.
- Parameters:
topic (str) – Topic/name of the preprocessor.
input_dir (Path | str) – Input directory containing the trajectory files.
output_dir (Path | str) – Output directory to save the preprocessed data.
top_ext (str) – Extension of the topology files, defaults to ‘.pdb’.
traj_ext (str) – Extension of the trajectory files, defaults to ‘.dcd’.
num_workers (int) – Number of workers for parallel processing, defaults to 10.
**kwargs (Any) – Keyword arguments for the preprocessor.
- mdlearn.data.preprocess.simulation.preprocess(top_file: Path | str, traj_file: Path | str, output_dir: Path | str, topic: str, **kwargs: Any) None
Preprocess simulation data from a trajectory file.
- Parameters:
top_file (Path | str) – Topology file of the simulation.
traj_file (Path | str) – Trajectory file of the simulation.
output_dir (Path | str) – Output directory to save the preprocessed data.
topic (str) – Topic of the simulation data.
**kwargs (Any) – Keyword arguments for the preprocessor.