Remote Dataset Classes

Here are the classes for remote datasets.

class muspy.RemoteFolderDataset(root: Union[str, pathlib.Path], download_and_extract: bool = False, overwrite: bool = False, cleanup: bool = False, convert: bool = False, kind: str = 'json', n_jobs: int = 1, ignore_exceptions: bool = True, use_converted: bool = None, verbose: bool = True)[source]

Base class for remote datasets storing files in a folder.

root

Root directory of the dataset.

Type:str or Path
Parameters:
  • download_and_extract (bool, default: False) – Whether to download and extract the dataset.
  • cleanup (bool, default: False) – Whether to remove the source archive(s).
  • convert (bool, default: False) – Whether to convert the dataset to MusPy JSON/YAML files. If False, will check if converted data exists. If so, disable on-the-fly mode. If not, enable on-the-fly mode and warns.
  • kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
  • n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
  • ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
  • use_converted (bool, optional) – Force to disable on-the-fly mode and use converted data. Defaults to True if converted data exist, otherwise False.

See also

muspy.FolderDataset
Class for datasets storing files in a folder.
muspy.RemoteDataset
Base class for remote MusPy datasets.
classmethod citation()

Print the citation infomation.

convert(kind: str = 'json', n_jobs: int = 1, ignore_exceptions: bool = True, verbose: bool = True, **kwargs) → FolderDatasetType

Convert and save the Music objects.

The converted files will be named by its index and saved to root/_converted. The original filenames can be found in the filenames attribute. For example, the file at filenames[i] will be converted and saved to {i}.json.

Parameters:
  • kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
  • n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
  • ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
  • verbose (bool, default: True) – Whether to be verbose.
  • **kwargs – Keyword arguments to pass to muspy.save().
Returns:

Return type:

Object itself.

converted_dir

Path to the root directory of the converted dataset.

converted_exists() → bool

Return True if the saved dataset exists, otherwise False.

download(overwrite: bool = False, verbose: bool = True) → RemoteDatasetType

Download the dataset source(s).

Parameters:
  • overwrite (bool, default: False) – Whether to overwrite existing file(s).
  • verbose (bool, default: True) – Whether to be verbose.
Returns:

Return type:

Object itself.

download_and_extract(overwrite: bool = False, cleanup: bool = False, verbose: bool = True) → RemoteDatasetType

Download source datasets and extract the downloaded archives.

Parameters:
  • overwrite (bool, default: False) – Whether to overwrite existing file(s).
  • cleanup (bool, default: False) – Whether to remove the source archive(s).
  • verbose (bool, default: True) – Whether to be verbose.
Returns:

Return type:

Object itself.

exists() → bool

Return True if the dataset exists, otherwise False.

extract(cleanup: bool = False, verbose: bool = True) → RemoteDatasetType

Extract the downloaded archive(s).

Parameters:
  • cleanup (bool, default: False) – Whether to remove the source archive after extraction.
  • verbose (bool, default: True) – Whether to be verbose.
Returns:

Return type:

Object itself.

get_converted_filenames()

Return a list of converted filenames.

get_raw_filenames()

Return a list of raw filenames.

classmethod info()

Return the dataset infomation.

load(filename: Union[str, pathlib.Path]) → muspy.music.Music

Load a file into a Music object.

on_the_fly() → FolderDatasetType

Enable on-the-fly mode and convert the data on the fly.

Returns:
Return type:Object itself.
read(filename: str) → muspy.music.Music[source]

Read a file into a Music object.

save(root: Union[str, pathlib.Path], kind: str = 'json', n_jobs: int = 1, ignore_exceptions: bool = True, verbose: bool = True, **kwargs)

Save all the music objects to a directory.

Parameters:
  • root (str or Path) – Root directory to save the data.
  • kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
  • n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
  • ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
  • verbose (bool, default: True) – Whether to be verbose.
  • **kwargs – Keyword arguments to pass to muspy.save().
source_exists() → bool

Return True if all the sources exist, otherwise False.

split(filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None) → Dict[str, List[int]]

Return the dataset as a PyTorch dataset.

Parameters:
  • filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
  • splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
  • random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
to_pytorch_dataset(factory: Callable = None, representation: str = None, split_filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None, **kwargs) → Union[TorchDataset, Dict[str, TorchDataset]]

Return the dataset as a PyTorch dataset.

Parameters:
  • factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
  • representation (str, optional) – Target representation. See muspy.to_representation() for available representation.
  • split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
  • splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
  • random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
Returns:

Converted PyTorch dataset(s).

Return type:

class:torch.utils.data.Dataset` or Dict of :class:torch.utils.data.Dataset`

to_tensorflow_dataset(factory: Callable = None, representation: str = None, split_filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None, **kwargs) → Union[TFDataset, Dict[str, TFDataset]]

Return the dataset as a TensorFlow dataset.

Parameters:
  • factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
  • representation (str, optional) – Target representation. See muspy.to_representation() for available representation.
  • split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
  • splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
  • random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
Returns:

  • class:tensorflow.data.Dataset` or Dict of
  • class:tensorflow.data.dataset` – Converted TensorFlow dataset(s).

use_converted() → FolderDatasetType

Disable on-the-fly mode and use converted data.

Returns:
Return type:Object itself.
class muspy.RemoteMusicDataset(root: Union[str, pathlib.Path], download_and_extract: bool = False, overwrite: bool = False, cleanup: bool = False, kind: str = None, verbose: bool = True)[source]

Base class for remote datasets of MusPy JSON/YAML files.

Parameters:
  • root (str or Path) – Root directory of the dataset.
  • download_and_extract (bool, default: False) – Whether to download and extract the dataset.
  • overwrite (bool, default: False) – Whether to overwrite existing file(s).
  • cleanup (bool, default: False) – Whether to remove the source archive(s).
  • kind ({'json', 'yaml'}, optional) – File formats to include in the dataset. Defaults to include both JSON and YAML files.
  • verbose (bool. default: True) – Whether to be verbose.
root

Root directory of the dataset.

Type:Path
filenames

Path to the files, relative to root.

Type:list of Path

See also

muspy.MusicDataset
Class for datasets of MusPy JSON/YAML files.
muspy.RemoteDataset
Base class for remote MusPy datasets.
classmethod citation()

Print the citation infomation.

download(overwrite: bool = False, verbose: bool = True) → RemoteDatasetType

Download the dataset source(s).

Parameters:
  • overwrite (bool, default: False) – Whether to overwrite existing file(s).
  • verbose (bool, default: True) – Whether to be verbose.
Returns:

Return type:

Object itself.

download_and_extract(overwrite: bool = False, cleanup: bool = False, verbose: bool = True) → RemoteDatasetType

Download source datasets and extract the downloaded archives.

Parameters:
  • overwrite (bool, default: False) – Whether to overwrite existing file(s).
  • cleanup (bool, default: False) – Whether to remove the source archive(s).
  • verbose (bool, default: True) – Whether to be verbose.
Returns:

Return type:

Object itself.

exists() → bool

Return True if the dataset exists, otherwise False.

extract(cleanup: bool = False, verbose: bool = True) → RemoteDatasetType

Extract the downloaded archive(s).

Parameters:
  • cleanup (bool, default: False) – Whether to remove the source archive after extraction.
  • verbose (bool, default: True) – Whether to be verbose.
Returns:

Return type:

Object itself.

classmethod info()

Return the dataset infomation.

save(root: Union[str, pathlib.Path], kind: str = 'json', n_jobs: int = 1, ignore_exceptions: bool = True, verbose: bool = True, **kwargs)

Save all the music objects to a directory.

Parameters:
  • root (str or Path) – Root directory to save the data.
  • kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
  • n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
  • ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
  • verbose (bool, default: True) – Whether to be verbose.
  • **kwargs – Keyword arguments to pass to muspy.save().
source_exists() → bool

Return True if all the sources exist, otherwise False.

split(filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None) → Dict[str, List[int]]

Return the dataset as a PyTorch dataset.

Parameters:
  • filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
  • splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
  • random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
to_pytorch_dataset(factory: Callable = None, representation: str = None, split_filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None, **kwargs) → Union[TorchDataset, Dict[str, TorchDataset]]

Return the dataset as a PyTorch dataset.

Parameters:
  • factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
  • representation (str, optional) – Target representation. See muspy.to_representation() for available representation.
  • split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
  • splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
  • random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
Returns:

Converted PyTorch dataset(s).

Return type:

class:torch.utils.data.Dataset` or Dict of :class:torch.utils.data.Dataset`

to_tensorflow_dataset(factory: Callable = None, representation: str = None, split_filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None, **kwargs) → Union[TFDataset, Dict[str, TFDataset]]

Return the dataset as a TensorFlow dataset.

Parameters:
  • factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
  • representation (str, optional) – Target representation. See muspy.to_representation() for available representation.
  • split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
  • splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
  • random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
Returns:

  • class:tensorflow.data.Dataset` or Dict of
  • class:tensorflow.data.dataset` – Converted TensorFlow dataset(s).

class muspy.RemoteABCFolderDataset(root: Union[str, pathlib.Path], download_and_extract: bool = False, overwrite: bool = False, cleanup: bool = False, convert: bool = False, kind: str = 'json', n_jobs: int = 1, ignore_exceptions: bool = True, use_converted: bool = None, verbose: bool = True)[source]

Base class for remote datasets storing ABC files in a folder.

See also

muspy.ABCFolderDataset
Class for datasets storing ABC files in a folder.
muspy.RemoteDataset
Base class for remote MusPy datasets.
classmethod citation()

Print the citation infomation.

convert(kind: str = 'json', n_jobs: int = 1, ignore_exceptions: bool = True, verbose: bool = True, **kwargs) → FolderDatasetType

Convert and save the Music objects.

The converted files will be named by its index and saved to root/_converted. The original filenames can be found in the filenames attribute. For example, the file at filenames[i] will be converted and saved to {i}.json.

Parameters:
  • kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
  • n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
  • ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
  • verbose (bool, default: True) – Whether to be verbose.
  • **kwargs – Keyword arguments to pass to muspy.save().
Returns:

Return type:

Object itself.

converted_dir

Path to the root directory of the converted dataset.

converted_exists() → bool

Return True if the saved dataset exists, otherwise False.

download(overwrite: bool = False, verbose: bool = True) → RemoteDatasetType

Download the dataset source(s).

Parameters:
  • overwrite (bool, default: False) – Whether to overwrite existing file(s).
  • verbose (bool, default: True) – Whether to be verbose.
Returns:

Return type:

Object itself.

download_and_extract(overwrite: bool = False, cleanup: bool = False, verbose: bool = True) → RemoteDatasetType

Download source datasets and extract the downloaded archives.

Parameters:
  • overwrite (bool, default: False) – Whether to overwrite existing file(s).
  • cleanup (bool, default: False) – Whether to remove the source archive(s).
  • verbose (bool, default: True) – Whether to be verbose.
Returns:

Return type:

Object itself.

exists() → bool

Return True if the dataset exists, otherwise False.

extract(cleanup: bool = False, verbose: bool = True) → RemoteDatasetType

Extract the downloaded archive(s).

Parameters:
  • cleanup (bool, default: False) – Whether to remove the source archive after extraction.
  • verbose (bool, default: True) – Whether to be verbose.
Returns:

Return type:

Object itself.

get_converted_filenames()

Return a list of converted filenames.

get_raw_filenames()

Return a list of raw filenames.

classmethod info()

Return the dataset infomation.

load(filename: Union[str, pathlib.Path]) → muspy.music.Music

Load a file into a Music object.

on_the_fly() → FolderDatasetType

Enable on-the-fly mode and convert the data on the fly.

Returns:
Return type:Object itself.
read(filename: Tuple[str, Tuple[int, int]]) → muspy.music.Music

Read a file into a Music object.

save(root: Union[str, pathlib.Path], kind: str = 'json', n_jobs: int = 1, ignore_exceptions: bool = True, verbose: bool = True, **kwargs)

Save all the music objects to a directory.

Parameters:
  • root (str or Path) – Root directory to save the data.
  • kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
  • n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
  • ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
  • verbose (bool, default: True) – Whether to be verbose.
  • **kwargs – Keyword arguments to pass to muspy.save().
source_exists() → bool

Return True if all the sources exist, otherwise False.

split(filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None) → Dict[str, List[int]]

Return the dataset as a PyTorch dataset.

Parameters:
  • filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
  • splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
  • random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
to_pytorch_dataset(factory: Callable = None, representation: str = None, split_filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None, **kwargs) → Union[TorchDataset, Dict[str, TorchDataset]]

Return the dataset as a PyTorch dataset.

Parameters:
  • factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
  • representation (str, optional) – Target representation. See muspy.to_representation() for available representation.
  • split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
  • splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
  • random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
Returns:

Converted PyTorch dataset(s).

Return type:

class:torch.utils.data.Dataset` or Dict of :class:torch.utils.data.Dataset`

to_tensorflow_dataset(factory: Callable = None, representation: str = None, split_filename: Union[str, pathlib.Path] = None, splits: Sequence[float] = None, random_state: Any = None, **kwargs) → Union[TFDataset, Dict[str, TFDataset]]

Return the dataset as a TensorFlow dataset.

Parameters:
  • factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
  • representation (str, optional) – Target representation. See muspy.to_representation() for available representation.
  • split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
  • splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
  • random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.
Returns:

  • class:tensorflow.data.Dataset` or Dict of
  • class:tensorflow.data.dataset` – Converted TensorFlow dataset(s).

use_converted() → FolderDatasetType

Disable on-the-fly mode and use converted data.

Returns:
Return type:Object itself.