Local Dataset Classes¶
Here are the classes for local datasets.
-
class
muspy.
FolderDataset
(root: Union[str, pathlib.Path], convert: bool = False, kind: str = 'json', n_jobs: int = 1, ignore_exceptions: bool = True, use_converted: bool = None)[source] Class for datasets storing files in a folder.
This class extends
muspy.Dataset
to support folder datasets. To build a custom folder dataset, please refer to the documentation ofmuspy.Dataset
for details. In addition, set class attribute_extension
to the extension to look for when building the dataset and setread
to a callable that takes as inputs a filename of a source file and return the converted Music object.Parameters: - convert (bool, default: False) – Whether to convert the dataset to MusPy JSON/YAML files. If False, will check if converted data exists. If so, disable on-the-fly mode. If not, enable on-the-fly mode and warns.
- kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
- n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
- ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
- use_converted (bool, optional) – Force to disable on-the-fly mode and use converted data. Defaults to True if converted data exist, otherwise False.
Important
muspy.FolderDataset.converted_exists()
depends solely on a special file named.muspy.success
in the folder{root}/_converted/
, which serves as an indicator for the existence and integrity of the converted dataset. If the converted dataset is built bymuspy.FolderDataset.convert()
, the.muspy.success
file will be created as well. If the converted dataset is created manually, make sure to create the.muspy.success
file in the folder{root}/_converted/
to prevent errors.Notes
Two modes are available for this dataset. When the on-the-fly mode is enabled, a data sample is converted to a music object on the fly when being indexed. When the on-the-fly mode is disabled, a data sample is loaded from the precomputed converted data.
See also
muspy.Dataset
- Base class for MusPy datasets.
-
classmethod
citation
() Print the citation infomation.
-
convert
(kind: str = 'json', n_jobs: int = 1, ignore_exceptions: bool = True, verbose: bool = True, **kwargs) → FolderDatasetType[source] Convert and save the Music objects.
The converted files will be named by its index and saved to
root/_converted
. The original filenames can be found in thefilenames
attribute. For example, the file atfilenames[i]
will be converted and saved to{i}.json
.Parameters: - kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
- n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
- ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
- verbose (bool, default: True) – Whether to be verbose.
- **kwargs – Keyword arguments to pass to
muspy.save()
.
Returns: Return type: Object itself.
-
converted_dir
Path to the root directory of the converted dataset.
-
converted_exists
() → bool[source] Return True if the saved dataset exists, otherwise False.
-
exists
() → bool[source] Return True if the dataset exists, otherwise False.
-
get_converted_filenames
()[source] Return a list of converted filenames.
-
get_raw_filenames
()[source] Return a list of raw filenames.
-
classmethod
info
() Return the dataset infomation.
-
load
(filename: Union[str, pathlib.Path]) → muspy.music.Music[source] Load a file into a Music object.
-
on_the_fly
() → FolderDatasetType[source] Enable on-the-fly mode and convert the data on the fly.
Returns: Return type: Object itself.
-
read
(filename: Any) → muspy.music.Music[source] Read a file into a Music object.
-
save
(root: Union[str, pathlib.Path], kind: str = 'json', n_jobs: int = 1, ignore_exceptions: bool = True, verbose: bool = True, **kwargs) Save all the music objects to a directory.
Parameters: - root (str or Path) – Root directory to save the data.
- kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
- n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
- ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
- verbose (bool, default: True) – Whether to be verbose.
- **kwargs – Keyword arguments to pass to
muspy.save()
.
-
split
(filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None) → Dict[str, List[int]] Return the dataset as a PyTorch dataset.
Parameters: - filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
-
to_pytorch_dataset
(factory: Callable = None, representation: str = None, split_filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None, **kwargs) → Union[TorchDataset, Dict[str, TorchDataset]] Return the dataset as a PyTorch dataset.
Parameters: - factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
- representation (str, optional) – Target representation. See
muspy.to_representation()
for available representation. - split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
Returns: Converted PyTorch dataset(s).
Return type: class:torch.utils.data.Dataset` or Dict of :class:torch.utils.data.Dataset`
-
to_tensorflow_dataset
(factory: Callable = None, representation: str = None, split_filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None, **kwargs) → Union[TFDataset, Dict[str, TFDataset]] Return the dataset as a TensorFlow dataset.
Parameters: - factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
- representation (str, optional) – Target representation. See
muspy.to_representation()
for available representation. - split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
Returns: - class:tensorflow.data.Dataset` or Dict of
- class:tensorflow.data.dataset` – Converted TensorFlow dataset(s).
-
use_converted
() → FolderDatasetType[source] Disable on-the-fly mode and use converted data.
Returns: Return type: Object itself.
-
class
muspy.
MusicDataset
(root: Union[str, pathlib.Path], kind: str = None)[source] Class for datasets of MusPy JSON/YAML files.
Parameters: - root (str or Path) – Root directory of the dataset.
- kind ({'json', 'yaml'}, optional) – File formats to include in the dataset. Defaults to include both JSON and YAML files.
-
root
¶ Root directory of the dataset.
Type: Path
-
filenames
¶ Path to the files, relative to root.
Type: list of Path
See also
muspy.Dataset
- Base class for MusPy datasets.
-
classmethod
citation
() Print the citation infomation.
-
classmethod
info
() Return the dataset infomation.
-
save
(root: Union[str, pathlib.Path], kind: str = 'json', n_jobs: int = 1, ignore_exceptions: bool = True, verbose: bool = True, **kwargs) Save all the music objects to a directory.
Parameters: - root (str or Path) – Root directory to save the data.
- kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
- n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
- ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
- verbose (bool, default: True) – Whether to be verbose.
- **kwargs – Keyword arguments to pass to
muspy.save()
.
-
split
(filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None) → Dict[str, List[int]] Return the dataset as a PyTorch dataset.
Parameters: - filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
-
to_pytorch_dataset
(factory: Callable = None, representation: str = None, split_filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None, **kwargs) → Union[TorchDataset, Dict[str, TorchDataset]] Return the dataset as a PyTorch dataset.
Parameters: - factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
- representation (str, optional) – Target representation. See
muspy.to_representation()
for available representation. - split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
Returns: Converted PyTorch dataset(s).
Return type: class:torch.utils.data.Dataset` or Dict of :class:torch.utils.data.Dataset`
-
to_tensorflow_dataset
(factory: Callable = None, representation: str = None, split_filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None, **kwargs) → Union[TFDataset, Dict[str, TFDataset]] Return the dataset as a TensorFlow dataset.
Parameters: - factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
- representation (str, optional) – Target representation. See
muspy.to_representation()
for available representation. - split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
Returns: - class:tensorflow.data.Dataset` or Dict of
- class:tensorflow.data.dataset` – Converted TensorFlow dataset(s).
-
class
muspy.
ABCFolderDataset
(root: Union[str, pathlib.Path], convert: bool = False, kind: str = 'json', n_jobs: int = 1, ignore_exceptions: bool = True, use_converted: bool = None)[source] Class for datasets storing ABC files in a folder.
See also
muspy.FolderDataset
- Class for datasets storing files in a folder.
-
classmethod
citation
() Print the citation infomation.
-
convert
(kind: str = 'json', n_jobs: int = 1, ignore_exceptions: bool = True, verbose: bool = True, **kwargs) → FolderDatasetType Convert and save the Music objects.
The converted files will be named by its index and saved to
root/_converted
. The original filenames can be found in thefilenames
attribute. For example, the file atfilenames[i]
will be converted and saved to{i}.json
.Parameters: - kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
- n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
- ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
- verbose (bool, default: True) – Whether to be verbose.
- **kwargs – Keyword arguments to pass to
muspy.save()
.
Returns: Return type: Object itself.
-
converted_dir
Path to the root directory of the converted dataset.
-
converted_exists
() → bool Return True if the saved dataset exists, otherwise False.
-
exists
() → bool Return True if the dataset exists, otherwise False.
-
get_converted_filenames
() Return a list of converted filenames.
-
get_raw_filenames
() Return a list of raw filenames.
-
classmethod
info
() Return the dataset infomation.
-
load
(filename: Union[str, pathlib.Path]) → muspy.music.Music Load a file into a Music object.
-
on_the_fly
() → FolderDatasetType[source] Enable on-the-fly mode and convert the data on the fly.
Returns: Return type: Object itself.
-
read
(filename: Tuple[str, Tuple[int, int]]) → muspy.music.Music[source] Read a file into a Music object.
-
save
(root: Union[str, pathlib.Path], kind: str = 'json', n_jobs: int = 1, ignore_exceptions: bool = True, verbose: bool = True, **kwargs) Save all the music objects to a directory.
Parameters: - root (str or Path) – Root directory to save the data.
- kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
- n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
- ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
- verbose (bool, default: True) – Whether to be verbose.
- **kwargs – Keyword arguments to pass to
muspy.save()
.
-
split
(filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None) → Dict[str, List[int]] Return the dataset as a PyTorch dataset.
Parameters: - filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
-
to_pytorch_dataset
(factory: Callable = None, representation: str = None, split_filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None, **kwargs) → Union[TorchDataset, Dict[str, TorchDataset]] Return the dataset as a PyTorch dataset.
Parameters: - factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
- representation (str, optional) – Target representation. See
muspy.to_representation()
for available representation. - split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
Returns: Converted PyTorch dataset(s).
Return type: class:torch.utils.data.Dataset` or Dict of :class:torch.utils.data.Dataset`
-
to_tensorflow_dataset
(factory: Callable = None, representation: str = None, split_filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None, **kwargs) → Union[TFDataset, Dict[str, TFDataset]] Return the dataset as a TensorFlow dataset.
Parameters: - factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
- representation (str, optional) – Target representation. See
muspy.to_representation()
for available representation. - split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
Returns: - class:tensorflow.data.Dataset` or Dict of
- class:tensorflow.data.dataset` – Converted TensorFlow dataset(s).
-
use_converted
() → FolderDatasetType Disable on-the-fly mode and use converted data.
Returns: Return type: Object itself.