Base Dataset Classes¶
Here are the two base classes for MusPy datasets.
-
class
muspy.
Dataset
[source] Base class for all MusPy datasets.
To build a custom dataset, it should inherit this class and overide the methods
__getitem__
and__len__
as well as the class attribute_info
.__getitem__
should return thei
-th data sample as amuspy.Music
object.__len__
should return the size of the dataset._info
should be amuspy.DatasetInfo
instance containing the dataset information.-
classmethod
citation
()[source] Print the citation infomation.
-
classmethod
info
()[source] Return the dataset infomation.
-
save
(root: Union[str, pathlib.Path], kind: Optional[str] = 'json', n_jobs: int = 1, ignore_exceptions: bool = True)[source] Save all the music objects to a directory.
The converted files will be named by its index and saved to
root/
.Parameters: - root (str or Path) – Root directory to save the data.
- kind ({'json', 'yaml'}, optional) – File format to save the data. Defaults to ‘json’.
- n_jobs (int, optional) – Maximum number of concurrently running jobs in multiprocessing. If equal to 1, disable multiprocessing. Defaults to 1.
- ignore_exceptions (bool, optional) – Whether to ignore errors and skip failed conversions. This can be helpful if some of the source files is known to be corrupted. Defaults to False.
Notes
The original filenames can be found in the
filenames
attribute. For example, the file atfilenames[i]
will be converted and saved to{i}.json
.
-
split
(filename: Union[str, pathlib.Path, None] = None, splits: Optional[Sequence[float]] = None, random_state: Any = None) → Dict[str, List[int]][source] Return the dataset as a PyTorch dataset.
Parameters: - filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like,
the value is passed to
numpy.random.RandomState
, and the create RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
-
to_pytorch_dataset
(factory: Optional[Callable] = None, representation: Optional[str] = None, split_filename: Union[str, pathlib.Path, None] = None, splits: Optional[Sequence[float]] = None, random_state: Any = None, **kwargs) → Union[TorchDataset, Dict[str, TorchDataset]][source] Return the dataset as a PyTorch dataset.
Parameters: - factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
- representation ({'pitch', 'piano-roll', 'event', 'note'}, optional) – Target representation.
- split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like,
the value is passed to
numpy.random.RandomState
, and the create RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
Returns: - class:torch.utils.data.Dataset` or Dict of
- class:torch.utils.data.Dataset` – Converted PyTorch dataset(s).
-
to_tensorflow_dataset
(factory: Optional[Callable] = None, representation: Optional[str] = None, split_filename: Union[str, pathlib.Path, None] = None, splits: Optional[Sequence[float]] = None, random_state: Any = None, **kwargs) → Union[TFDataset, Dict[str, TFDataset]][source] Return the dataset as a TensorFlow dataset.
Parameters: - factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
- representation ({'pitch', 'piano-roll', 'event', 'note'}, optional) – Target representation.
- split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like,
the value is passed to
numpy.random.RandomState
, and the create RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
Returns: - class:tensorflow.data.Dataset` or Dict of
- class:tensorflow.data.dataset` – Converted TensorFlow dataset(s).
-
classmethod
-
class
muspy.
RemoteDataset
(root: Union[str, pathlib.Path], download_and_extract: bool = False, cleanup: bool = False)[source] Base class for remote MusPy datasets.
This class is extended from
muspy.Dataset
to support remote datasets. To build a custom dataset based on this class, please refer tomuspy.Dataset
for the docmentation of the methods__getitem__
and__len__
, and the class attribute_info
. In addition, the class attribute_sources
containing the URLs to the source files should be properly set (see Notes).Parameters: Raises: RuntimeError: – If
download_and_extract
is False but file{root}/.muspy.success
does not exist (see below).Important
muspy.Dataset.exists()
depends solely on a special file named.muspy.success
in the folder{root}/
, which serves as an indicator for the existence and integrity of the dataset. This file will automatically be created if the dataset is successfully downloaded and extracted bymuspy.Dataset.download_and_extract()
.If the dataset is downloaded manually, make sure to create the
.muspy.success
file in the folder{root}/
to prevent errors.Notes
The class attribute
_sources
is a dictionary containing the following information of each source file.- filename (str): Name to save the file.
- url (str): URL to the file.
- archive (bool): Whether the file is an archive.
- md5 (str, optional): Expected MD5 checksum of the file.
Here is an example.:
_sources = { "example": { "filename": "example.tar.gz", "url": "https://www.example.com/example.tar.gz", "archive": True, "md5": None, } }
See also
muspy.Dataset
- The base class for all MusPy datasets.
-
download
() → RemoteDatasetType[source] Download the source datasets.
Returns: Return type: Object itself.
-
download_and_extract
(cleanup: bool = False) → RemoteDatasetType[source] Extract the downloaded archives.
This is equivalent to
RemoteDataset.download().extract(cleanup)
.Parameters: cleanup (bool, optional) – Whether to remove the original archive. Defaults to False. Returns: Return type: Object itself.
-
exists
() → bool[source] Return True if the dataset exists, otherwise False.
-
extract
(cleanup: bool = False) → RemoteDatasetType[source] Extract the downloaded archive(s).
Parameters: cleanup (bool, optional) – Whether to remove the original archive. Defaults to False. Returns: Return type: Object itself.
-
source_exists
() → bool[source] Return True if all the sources exist, otherwise False.