Base Dataset Classes¶
Here are the two base classes for MusPy datasets.
-
class
muspy.
Dataset
[source] Base class for MusPy datasets.
To build a custom dataset, it should inherit this class and overide the methods
__getitem__
and__len__
as well as the class attribute_info
.__getitem__
should return thei
-th data sample as amuspy.Music
object.__len__
should return the size of the dataset._info
should be amuspy.DatasetInfo
instance storing the dataset information.-
classmethod
citation
()[source] Print the citation infomation.
-
classmethod
info
()[source] Return the dataset infomation.
-
save
(root: Union[str, pathlib.Path], kind: str = 'json', n_jobs: int = 1, ignore_exceptions: bool = True, verbose: bool = True, **kwargs)[source] Save all the music objects to a directory.
Parameters: - root (str or Path) – Root directory to save the data.
- kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
- n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
- ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
- verbose (bool, default: True) – Whether to be verbose.
- **kwargs – Keyword arguments to pass to
muspy.save()
.
-
split
(filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None) → Dict[str, List[int]][source] Return the dataset as a PyTorch dataset.
Parameters: - filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
-
to_pytorch_dataset
(factory: Callable = None, representation: str = None, split_filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None, **kwargs) → Union[TorchDataset, Dict[str, TorchDataset]][source] Return the dataset as a PyTorch dataset.
Parameters: - factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
- representation (str, optional) – Target representation. See
muspy.to_representation()
for available representation. - split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
Returns: Converted PyTorch dataset(s).
Return type: class:torch.utils.data.Dataset` or Dict of :class:torch.utils.data.Dataset`
-
to_tensorflow_dataset
(factory: Callable = None, representation: str = None, split_filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None, **kwargs) → Union[TFDataset, Dict[str, TFDataset]][source] Return the dataset as a TensorFlow dataset.
Parameters: - factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
- representation (str, optional) – Target representation. See
muspy.to_representation()
for available representation. - split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
Returns: - class:tensorflow.data.Dataset` or Dict of
- class:tensorflow.data.dataset` – Converted TensorFlow dataset(s).
-
classmethod
-
class
muspy.
RemoteDataset
(root: Union[str, pathlib.Path], download_and_extract: bool = False, overwrite: bool = False, cleanup: bool = False, verbose: bool = True)[source] Base class for remote MusPy datasets.
This class extends
muspy.Dataset
to support remote datasets. To build a custom remote dataset, please refer to the documentation ofmuspy.Dataset
for details. In addition, set the class attribute_sources
to the URLs to the source files (see Notes).Parameters: Raises: RuntimeError: – If
download_and_extract
is False but file{root}/.muspy.success
does not exist (see below).Important
muspy.Dataset.exists()
depends solely on a special file named.muspy.success
in directory{root}/_converted/
. This file serves as an indicator for the existence and integrity of the dataset. It will automatically be created if the dataset is successfully downloaded and extracted bymuspy.Dataset.download_and_extract()
. If the dataset is downloaded manually, make sure to create the.muspy.success
file in directory{root}/_converted/
to prevent errors.Notes
The class attribute
_sources
is a dictionary storing the following information of each source file.- filename (str): Name to save the file.
- url (str): URL to the file.
- archive (bool): Whether the file is an archive.
- md5 (str, optional): Expected MD5 checksum of the file.
- sha256 (str, optional): Expected SHA256 checksum of the file.
Here is an example.:
_sources = { "example": { "filename": "example.tar.gz", "url": "https://www.example.com/example.tar.gz", "archive": True, "md5": None, "sha256": None, } }
See also
muspy.Dataset
- Base class for MusPy datasets.
-
classmethod
citation
() Print the citation infomation.
-
download
(overwrite: bool = False, verbose: bool = True) → RemoteDatasetType[source] Download the dataset source(s).
Parameters: Returns: Return type: Object itself.
-
download_and_extract
(overwrite: bool = False, cleanup: bool = False, verbose: bool = True) → RemoteDatasetType[source] Download source datasets and extract the downloaded archives.
Parameters: Returns: Return type: Object itself.
-
exists
() → bool[source] Return True if the dataset exists, otherwise False.
-
extract
(cleanup: bool = False, verbose: bool = True) → RemoteDatasetType[source] Extract the downloaded archive(s).
Parameters: Returns: Return type: Object itself.
-
classmethod
info
() Return the dataset infomation.
-
save
(root: Union[str, pathlib.Path], kind: str = 'json', n_jobs: int = 1, ignore_exceptions: bool = True, verbose: bool = True, **kwargs) Save all the music objects to a directory.
Parameters: - root (str or Path) – Root directory to save the data.
- kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
- n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
- ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
- verbose (bool, default: True) – Whether to be verbose.
- **kwargs – Keyword arguments to pass to
muspy.save()
.
-
source_exists
() → bool[source] Return True if all the sources exist, otherwise False.
-
split
(filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None) → Dict[str, List[int]] Return the dataset as a PyTorch dataset.
Parameters: - filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
-
to_pytorch_dataset
(factory: Callable = None, representation: str = None, split_filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None, **kwargs) → Union[TorchDataset, Dict[str, TorchDataset]] Return the dataset as a PyTorch dataset.
Parameters: - factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
- representation (str, optional) – Target representation. See
muspy.to_representation()
for available representation. - split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
Returns: Converted PyTorch dataset(s).
Return type: class:torch.utils.data.Dataset` or Dict of :class:torch.utils.data.Dataset`
-
to_tensorflow_dataset
(factory: Callable = None, representation: str = None, split_filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None, **kwargs) → Union[TFDataset, Dict[str, TFDataset]] Return the dataset as a TensorFlow dataset.
Parameters: - factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
- representation (str, optional) – Target representation. See
muspy.to_representation()
for available representation. - split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
Returns: - class:tensorflow.data.Dataset` or Dict of
- class:tensorflow.data.dataset` – Converted TensorFlow dataset(s).