gammagl.datasets.ZINC

class ZINC(root: str | None = None, subset=False, split='train', transform=None, pre_transform=None, pre_filter=None, force_reload: bool = False)[source]

The ZINC dataset from the ZINC database and the “Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules” paper, containing about 250,000 molecular graphs with up to 38 heavy atoms. The task is to regress the penalized logP (also called constrained solubility in some works), given by y = logP - SAS - cycles, where logP is the water-octanol partition coefficient, SAS is the synthetic accessibility score, and cycles denotes the number of cycles with more than six atoms. Penalized logP is a score commonly used for training molecular generation models, see, e.g., the “Junction Tree Variational Autoencoder for Molecular Graph Generation” and “Grammar Variational Autoencoder” papers.

Parameters:
  • root (str, optional) – Root directory where the dataset should be saved.

  • subset (bool, optional) – If set to True, will only load a subset of the dataset (12,000 molecular graphs), following the “Benchmarking Graph Neural Networks” paper. (default: False)

  • split (str, optional) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")

  • transform (callable, optional) – A function/transform that takes in an gammagl.data.Graph object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an gammagl.data.Graph object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an gammagl.data.Graph object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

  • (bool (force_reload) – (default: False)

  • optional) (Whether to re-process the dataset.) – (default: False)

url = 'https://www.dropbox.com/s/feo9qle74kg48gy/molecules.zip?dl=1'
split_url = 'https://raw.githubusercontent.com/graphdeeplearning/benchmarking-gnns/master/data/molecules/{}.index'
property raw_file_names

The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

property processed_dir
property processed_file_names

The name of the files in the self.processed_dir folder that must be present in order to skip processing.

download()[source]

Downloads the dataset to the self.raw_dir folder.

process()[source]

Processes the dataset to the self.processed_dir folder.