Remote Dataset Classes¶
Here are the classes for remote datasets.
-
class
muspy.
RemoteFolderDataset
(root: Union[str, pathlib.Path], download_and_extract: bool = False, overwrite: bool = False, cleanup: bool = False, convert: bool = False, kind: str = 'json', n_jobs: int = 1, ignore_exceptions: bool = True, use_converted: bool = None, verbose: bool = True)[source] Base class for remote datasets storing files in a folder.
Parameters: - download_and_extract (bool, default: False) – Whether to download and extract the dataset.
- cleanup (bool, default: False) – Whether to remove the source archive(s).
- convert (bool, default: False) – Whether to convert the dataset to MusPy JSON/YAML files. If False, will check if converted data exists. If so, disable on-the-fly mode. If not, enable on-the-fly mode and warns.
- kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
- n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
- ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
- use_converted (bool, optional) – Force to disable on-the-fly mode and use converted data. Defaults to True if converted data exist, otherwise False.
See also
muspy.FolderDataset
- Class for datasets storing files in a folder.
muspy.RemoteDataset
- Base class for remote MusPy datasets.
-
classmethod
citation
() Print the citation infomation.
-
convert
(kind: str = 'json', n_jobs: int = 1, ignore_exceptions: bool = True, verbose: bool = True, **kwargs) → FolderDatasetType Convert and save the Music objects.
The converted files will be named by its index and saved to
root/_converted
. The original filenames can be found in thefilenames
attribute. For example, the file atfilenames[i]
will be converted and saved to{i}.json
.Parameters: - kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
- n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
- ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
- verbose (bool, default: True) – Whether to be verbose.
- **kwargs – Keyword arguments to pass to
muspy.save()
.
Returns: Return type: Object itself.
-
converted_dir
Path to the root directory of the converted dataset.
-
converted_exists
() → bool Return True if the saved dataset exists, otherwise False.
-
download
(overwrite: bool = False, verbose: bool = True) → RemoteDatasetType Download the dataset source(s).
Parameters: Returns: Return type: Object itself.
-
download_and_extract
(overwrite: bool = False, cleanup: bool = False, verbose: bool = True) → RemoteDatasetType Download source datasets and extract the downloaded archives.
Parameters: Returns: Return type: Object itself.
-
exists
() → bool Return True if the dataset exists, otherwise False.
-
extract
(cleanup: bool = False, verbose: bool = True) → RemoteDatasetType Extract the downloaded archive(s).
Parameters: Returns: Return type: Object itself.
-
get_converted_filenames
() Return a list of converted filenames.
-
get_raw_filenames
() Return a list of raw filenames.
-
classmethod
info
() Return the dataset infomation.
-
load
(filename: Union[str, pathlib.Path]) → muspy.music.Music Load a file into a Music object.
-
on_the_fly
() → FolderDatasetType Enable on-the-fly mode and convert the data on the fly.
Returns: Return type: Object itself.
-
read
(filename: str) → muspy.music.Music[source] Read a file into a Music object.
-
save
(root: Union[str, pathlib.Path], kind: str = 'json', n_jobs: int = 1, ignore_exceptions: bool = True, verbose: bool = True, **kwargs) Save all the music objects to a directory.
Parameters: - root (str or Path) – Root directory to save the data.
- kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
- n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
- ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
- verbose (bool, default: True) – Whether to be verbose.
- **kwargs – Keyword arguments to pass to
muspy.save()
.
-
source_exists
() → bool Return True if all the sources exist, otherwise False.
-
split
(filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None) → Dict[str, List[int]] Return the dataset as a PyTorch dataset.
Parameters: - filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
-
to_pytorch_dataset
(factory: Callable = None, representation: str = None, split_filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None, **kwargs) → Union[TorchDataset, Dict[str, TorchDataset]] Return the dataset as a PyTorch dataset.
Parameters: - factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
- representation (str, optional) – Target representation. See
muspy.to_representation()
for available representation. - split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
Returns: Converted PyTorch dataset(s).
Return type: class:torch.utils.data.Dataset` or Dict of :class:torch.utils.data.Dataset`
-
to_tensorflow_dataset
(factory: Callable = None, representation: str = None, split_filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None, **kwargs) → Union[TFDataset, Dict[str, TFDataset]] Return the dataset as a TensorFlow dataset.
Parameters: - factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
- representation (str, optional) – Target representation. See
muspy.to_representation()
for available representation. - split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
Returns: - class:tensorflow.data.Dataset` or Dict of
- class:tensorflow.data.dataset` – Converted TensorFlow dataset(s).
-
use_converted
() → FolderDatasetType Disable on-the-fly mode and use converted data.
Returns: Return type: Object itself.
-
class
muspy.
RemoteMusicDataset
(root: Union[str, pathlib.Path], download_and_extract: bool = False, overwrite: bool = False, cleanup: bool = False, kind: str = None, verbose: bool = True)[source] Base class for remote datasets of MusPy JSON/YAML files.
Parameters: - root (str or Path) – Root directory of the dataset.
- download_and_extract (bool, default: False) – Whether to download and extract the dataset.
- overwrite (bool, default: False) – Whether to overwrite existing file(s).
- cleanup (bool, default: False) – Whether to remove the source archive(s).
- kind ({'json', 'yaml'}, optional) – File formats to include in the dataset. Defaults to include both JSON and YAML files.
- verbose (bool. default: True) – Whether to be verbose.
-
root
¶ Root directory of the dataset.
Type: Path
-
filenames
¶ Path to the files, relative to root.
Type: list of Path
See also
muspy.MusicDataset
- Class for datasets of MusPy JSON/YAML files.
muspy.RemoteDataset
- Base class for remote MusPy datasets.
-
classmethod
citation
() Print the citation infomation.
-
download
(overwrite: bool = False, verbose: bool = True) → RemoteDatasetType Download the dataset source(s).
Parameters: Returns: Return type: Object itself.
-
download_and_extract
(overwrite: bool = False, cleanup: bool = False, verbose: bool = True) → RemoteDatasetType Download source datasets and extract the downloaded archives.
Parameters: Returns: Return type: Object itself.
-
exists
() → bool Return True if the dataset exists, otherwise False.
-
extract
(cleanup: bool = False, verbose: bool = True) → RemoteDatasetType Extract the downloaded archive(s).
Parameters: Returns: Return type: Object itself.
-
classmethod
info
() Return the dataset infomation.
-
save
(root: Union[str, pathlib.Path], kind: str = 'json', n_jobs: int = 1, ignore_exceptions: bool = True, verbose: bool = True, **kwargs) Save all the music objects to a directory.
Parameters: - root (str or Path) – Root directory to save the data.
- kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
- n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
- ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
- verbose (bool, default: True) – Whether to be verbose.
- **kwargs – Keyword arguments to pass to
muspy.save()
.
-
source_exists
() → bool Return True if all the sources exist, otherwise False.
-
split
(filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None) → Dict[str, List[int]] Return the dataset as a PyTorch dataset.
Parameters: - filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
-
to_pytorch_dataset
(factory: Callable = None, representation: str = None, split_filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None, **kwargs) → Union[TorchDataset, Dict[str, TorchDataset]] Return the dataset as a PyTorch dataset.
Parameters: - factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
- representation (str, optional) – Target representation. See
muspy.to_representation()
for available representation. - split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
Returns: Converted PyTorch dataset(s).
Return type: class:torch.utils.data.Dataset` or Dict of :class:torch.utils.data.Dataset`
-
to_tensorflow_dataset
(factory: Callable = None, representation: str = None, split_filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None, **kwargs) → Union[TFDataset, Dict[str, TFDataset]] Return the dataset as a TensorFlow dataset.
Parameters: - factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
- representation (str, optional) – Target representation. See
muspy.to_representation()
for available representation. - split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
Returns: - class:tensorflow.data.Dataset` or Dict of
- class:tensorflow.data.dataset` – Converted TensorFlow dataset(s).
-
class
muspy.
RemoteABCFolderDataset
(root: Union[str, pathlib.Path], download_and_extract: bool = False, overwrite: bool = False, cleanup: bool = False, convert: bool = False, kind: str = 'json', n_jobs: int = 1, ignore_exceptions: bool = True, use_converted: bool = None, verbose: bool = True)[source] Base class for remote datasets storing ABC files in a folder.
See also
muspy.ABCFolderDataset
- Class for datasets storing ABC files in a folder.
muspy.RemoteDataset
- Base class for remote MusPy datasets.
-
classmethod
citation
() Print the citation infomation.
-
convert
(kind: str = 'json', n_jobs: int = 1, ignore_exceptions: bool = True, verbose: bool = True, **kwargs) → FolderDatasetType Convert and save the Music objects.
The converted files will be named by its index and saved to
root/_converted
. The original filenames can be found in thefilenames
attribute. For example, the file atfilenames[i]
will be converted and saved to{i}.json
.Parameters: - kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
- n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
- ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
- verbose (bool, default: True) – Whether to be verbose.
- **kwargs – Keyword arguments to pass to
muspy.save()
.
Returns: Return type: Object itself.
-
converted_dir
Path to the root directory of the converted dataset.
-
converted_exists
() → bool Return True if the saved dataset exists, otherwise False.
-
download
(overwrite: bool = False, verbose: bool = True) → RemoteDatasetType Download the dataset source(s).
Parameters: Returns: Return type: Object itself.
-
download_and_extract
(overwrite: bool = False, cleanup: bool = False, verbose: bool = True) → RemoteDatasetType Download source datasets and extract the downloaded archives.
Parameters: Returns: Return type: Object itself.
-
exists
() → bool Return True if the dataset exists, otherwise False.
-
extract
(cleanup: bool = False, verbose: bool = True) → RemoteDatasetType Extract the downloaded archive(s).
Parameters: Returns: Return type: Object itself.
-
get_converted_filenames
() Return a list of converted filenames.
-
get_raw_filenames
() Return a list of raw filenames.
-
classmethod
info
() Return the dataset infomation.
-
load
(filename: Union[str, pathlib.Path]) → muspy.music.Music Load a file into a Music object.
-
on_the_fly
() → FolderDatasetType Enable on-the-fly mode and convert the data on the fly.
Returns: Return type: Object itself.
-
read
(filename: Tuple[str, Tuple[int, int]]) → muspy.music.Music Read a file into a Music object.
-
save
(root: Union[str, pathlib.Path], kind: str = 'json', n_jobs: int = 1, ignore_exceptions: bool = True, verbose: bool = True, **kwargs) Save all the music objects to a directory.
Parameters: - root (str or Path) – Root directory to save the data.
- kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
- n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
- ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
- verbose (bool, default: True) – Whether to be verbose.
- **kwargs – Keyword arguments to pass to
muspy.save()
.
-
source_exists
() → bool Return True if all the sources exist, otherwise False.
-
split
(filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None) → Dict[str, List[int]] Return the dataset as a PyTorch dataset.
Parameters: - filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
-
to_pytorch_dataset
(factory: Callable = None, representation: str = None, split_filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None, **kwargs) → Union[TorchDataset, Dict[str, TorchDataset]] Return the dataset as a PyTorch dataset.
Parameters: - factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
- representation (str, optional) – Target representation. See
muspy.to_representation()
for available representation. - split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
Returns: Converted PyTorch dataset(s).
Return type: class:torch.utils.data.Dataset` or Dict of :class:torch.utils.data.Dataset`
-
to_tensorflow_dataset
(factory: Callable = None, representation: str = None, split_filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None, **kwargs) → Union[TFDataset, Dict[str, TFDataset]] Return the dataset as a TensorFlow dataset.
Parameters: - factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
- representation (str, optional) – Target representation. See
muspy.to_representation()
for available representation. - split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
- splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
- random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or
array_like, the value is passed to
numpy.random.RandomState
, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
Returns: - class:tensorflow.data.Dataset` or Dict of
- class:tensorflow.data.dataset` – Converted TensorFlow dataset(s).
-
use_converted
() → FolderDatasetType Disable on-the-fly mode and use converted data.
Returns: Return type: Object itself.