Local Dataset Classes¶
Here are the classes for local datasets.
-
class
muspy.FolderDataset(root: Union[str, pathlib.Path], convert: bool = False, kind: str = 'json', n_jobs: int = 1, ignore_exceptions: bool = True, use_converted: bool = None)[source] Class for datasets storing files in a folder.
This class extends
muspy.Datasetto support folder datasets. To build a custom folder dataset, please refer to the documentation ofmuspy.Datasetfor details. In addition, set class attribute_extensionto the extension to look for when building the dataset and setreadto a callable that takes as inputs a filename of a source file and return the converted Music object.Parameters: - convert (bool, default: False) – Whether to convert the dataset to MusPy JSON/YAML files. If False, will check if converted data exists. If so, disable on-the-fly mode. If not, enable on-the-fly mode and warns.
- kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
- n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
- ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
- use_converted (bool, optional) – Force to disable on-the-fly mode and use converted data. Defaults to True if converted data exist, otherwise False.
Important
muspy.FolderDataset.converted_exists()depends solely on a special file named.muspy.successin the folder{root}/_converted/, which serves as an indicator for the existence and integrity of the converted dataset. If the converted dataset is built bymuspy.FolderDataset.convert(), the.muspy.successfile will be created as well. If the converted dataset is created manually, make sure to create the.muspy.successfile in the folder{root}/_converted/to prevent errors.Notes
Two modes are available for this dataset. When the on-the-fly mode is enabled, a data sample is converted to a music object on the fly when being indexed. When the on-the-fly mode is disabled, a data sample is loaded from the precomputed converted data.
See also
muspy.Dataset- Base class for MusPy datasets.
-
classmethod
citation() Print the citation infomation.
-
convert(kind: str = 'json', n_jobs: int = 1, ignore_exceptions: bool = True, verbose: bool = True, **kwargs) → FolderDatasetType[source] Convert and save the Music objects.
The converted files will be named by its index and saved to
root/_converted. The original filenames can be found in thefilenamesattribute. For example, the file atfilenames[i]will be converted and saved to{i}.json.Parameters: - kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
- n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
- ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
- verbose (bool, default: True) – Whether to be verbose.
- **kwargs – Keyword arguments to pass to
muspy.save().
Returns: Return type: Object itself.
-
converted_dir Path to the root directory of the converted dataset.
-
converted_exists() → bool[source] Return True if the saved dataset exists, otherwise False.
-
exists() → bool[source] Return True if the dataset exists, otherwise False.
-
get_converted_filenames()[source] Return a list of converted filenames.
-
get_raw_filenames()[source] Return a list of raw filenames.
-
classmethod
info() Return the dataset infomation.
-
load(filename: Union[str, pathlib.Path]) → muspy.music.Music[source] Load a file into a Music object.
-
on_the_fly() → FolderDatasetType[source] Enable on-the-fly mode and convert the data on the fly.
Returns: Return type: Object itself.
-
read(filename: Any) → muspy.music.Music[source] Read a file into a Music object.
-
save(root: Union[str, pathlib.Path], kind: str = 'json', n_jobs: int = 1, ignore_exceptions: bool = True, verbose: bool = True, **kwargs) Save all the music objects to a directory.
Parameters: - root (str or Path) – Root directory to save the data.
- kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
- n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
- ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
- verbose (bool, default: True) – Whether to be verbose.
- **kwargs – Keyword arguments to pass to
muspy.save().
-
split(filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None) → Dict[str, List[int]] Return the dataset as a PyTorch dataset.
Parameters: - filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
-
to_pytorch_dataset(factory: Callable = None, representation: str = None, split_filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None, **kwargs) → Union[TorchDataset, Dict[str, TorchDataset]] Return the dataset as a PyTorch dataset.
Parameters: - factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
- representation (str, optional) – Target representation. See
muspy.to_representation()for available representation. - split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
Returns: Converted PyTorch dataset(s).
Return type: class:torch.utils.data.Dataset` or Dict of :class:torch.utils.data.Dataset`
-
to_tensorflow_dataset(factory: Callable = None, representation: str = None, split_filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None, **kwargs) → Union[TFDataset, Dict[str, TFDataset]] Return the dataset as a TensorFlow dataset.
Parameters: - factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
- representation (str, optional) – Target representation. See
muspy.to_representation()for available representation. - split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
Returns: - class:tensorflow.data.Dataset` or Dict of
- class:tensorflow.data.dataset` – Converted TensorFlow dataset(s).
-
use_converted() → FolderDatasetType[source] Disable on-the-fly mode and use converted data.
Returns: Return type: Object itself.
-
class
muspy.MusicDataset(root: Union[str, pathlib.Path], kind: str = None)[source] Class for datasets of MusPy JSON/YAML files.
Parameters: - root (str or Path) – Root directory of the dataset.
- kind ({'json', 'yaml'}, optional) – File formats to include in the dataset. Defaults to include both JSON and YAML files.
-
root¶ Root directory of the dataset.
Type: Path
-
filenames¶ Path to the files, relative to root.
Type: list of Path
See also
muspy.Dataset- Base class for MusPy datasets.
-
classmethod
citation() Print the citation infomation.
-
classmethod
info() Return the dataset infomation.
-
save(root: Union[str, pathlib.Path], kind: str = 'json', n_jobs: int = 1, ignore_exceptions: bool = True, verbose: bool = True, **kwargs) Save all the music objects to a directory.
Parameters: - root (str or Path) – Root directory to save the data.
- kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
- n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
- ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
- verbose (bool, default: True) – Whether to be verbose.
- **kwargs – Keyword arguments to pass to
muspy.save().
-
split(filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None) → Dict[str, List[int]] Return the dataset as a PyTorch dataset.
Parameters: - filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
-
to_pytorch_dataset(factory: Callable = None, representation: str = None, split_filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None, **kwargs) → Union[TorchDataset, Dict[str, TorchDataset]] Return the dataset as a PyTorch dataset.
Parameters: - factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
- representation (str, optional) – Target representation. See
muspy.to_representation()for available representation. - split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
Returns: Converted PyTorch dataset(s).
Return type: class:torch.utils.data.Dataset` or Dict of :class:torch.utils.data.Dataset`
-
to_tensorflow_dataset(factory: Callable = None, representation: str = None, split_filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None, **kwargs) → Union[TFDataset, Dict[str, TFDataset]] Return the dataset as a TensorFlow dataset.
Parameters: - factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
- representation (str, optional) – Target representation. See
muspy.to_representation()for available representation. - split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
Returns: - class:tensorflow.data.Dataset` or Dict of
- class:tensorflow.data.dataset` – Converted TensorFlow dataset(s).
-
class
muspy.ABCFolderDataset(root: Union[str, pathlib.Path], convert: bool = False, kind: str = 'json', n_jobs: int = 1, ignore_exceptions: bool = True, use_converted: bool = None)[source] Class for datasets storing ABC files in a folder.
See also
muspy.FolderDataset- Class for datasets storing files in a folder.
-
classmethod
citation() Print the citation infomation.
-
convert(kind: str = 'json', n_jobs: int = 1, ignore_exceptions: bool = True, verbose: bool = True, **kwargs) → FolderDatasetType Convert and save the Music objects.
The converted files will be named by its index and saved to
root/_converted. The original filenames can be found in thefilenamesattribute. For example, the file atfilenames[i]will be converted and saved to{i}.json.Parameters: - kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
- n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
- ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
- verbose (bool, default: True) – Whether to be verbose.
- **kwargs – Keyword arguments to pass to
muspy.save().
Returns: Return type: Object itself.
-
converted_dir Path to the root directory of the converted dataset.
-
converted_exists() → bool Return True if the saved dataset exists, otherwise False.
-
exists() → bool Return True if the dataset exists, otherwise False.
-
get_converted_filenames() Return a list of converted filenames.
-
get_raw_filenames() Return a list of raw filenames.
-
classmethod
info() Return the dataset infomation.
-
load(filename: Union[str, pathlib.Path]) → muspy.music.Music Load a file into a Music object.
-
on_the_fly() → FolderDatasetType[source] Enable on-the-fly mode and convert the data on the fly.
Returns: Return type: Object itself.
-
read(filename: Tuple[str, Tuple[int, int]]) → muspy.music.Music[source] Read a file into a Music object.
-
save(root: Union[str, pathlib.Path], kind: str = 'json', n_jobs: int = 1, ignore_exceptions: bool = True, verbose: bool = True, **kwargs) Save all the music objects to a directory.
Parameters: - root (str or Path) – Root directory to save the data.
- kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
- n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
- ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
- verbose (bool, default: True) – Whether to be verbose.
- **kwargs – Keyword arguments to pass to
muspy.save().
-
split(filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None) → Dict[str, List[int]] Return the dataset as a PyTorch dataset.
Parameters: - filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
-
to_pytorch_dataset(factory: Callable = None, representation: str = None, split_filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None, **kwargs) → Union[TorchDataset, Dict[str, TorchDataset]] Return the dataset as a PyTorch dataset.
Parameters: - factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
- representation (str, optional) – Target representation. See
muspy.to_representation()for available representation. - split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
Returns: Converted PyTorch dataset(s).
Return type: class:torch.utils.data.Dataset` or Dict of :class:torch.utils.data.Dataset`
-
to_tensorflow_dataset(factory: Callable = None, representation: str = None, split_filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None, **kwargs) → Union[TFDataset, Dict[str, TFDataset]] Return the dataset as a TensorFlow dataset.
Parameters: - factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
- representation (str, optional) – Target representation. See
muspy.to_representation()for available representation. - split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
Returns: - class:tensorflow.data.Dataset` or Dict of
- class:tensorflow.data.dataset` – Converted TensorFlow dataset(s).
-
use_converted() → FolderDatasetType Disable on-the-fly mode and use converted data.
Returns: Return type: Object itself.