gammagl.datasets.TUDataset

class TUDataset(root: str | None = None, name: str = 'MUTAG', transform: Callable | None = None, pre_transform: Callable | None = None, pre_filter: Callable | None = None, use_node_attr: bool = False, use_edge_attr: bool = False, cleaned: bool = False, force_reload: bool = False)[source]

A variety of graph kernel benchmark datasets, .e.g. “IMDB-BINARY”, “REDDIT-BINARY” or “PROTEINS”, collected from the TU Dortmund University. In addition, this dataset wrapper provides cleaned dataset versions as motivated by the “Understanding Isomorphism Bias in Graph Data Sets” paper, containing only non-isomorphic graphs.

Note

Some datasets may not come with any node labels. You can then either make use of the argument use_node_attr to load additional continuous node attributes (if present) or provide synthetic node features using transforms such as like gammagl.transforms.Constant or gammagl.transforms.OneHotDegree.

Parameters:
  • root (str, optional) – Root directory where the dataset should be saved.

  • name (str, optional) – The name of the dataset.

  • transform (callable, optional) – A function/transform that takes in an gammagl.data.Graph object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an gammagl.data.Graph object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in an gammagl.data.Graph object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

  • use_node_attr (bool, optional) – If True, the dataset will contain additional continuous node attributes (if present). (default: False)

  • use_edge_attr (bool, optional) – If True, the dataset will contain additional continuous edge attributes (if present). (default: False)

  • cleaned (bool, optional) – If True, the dataset will contain only non-isomorphic graphs. (default: False)

  • (bool (force_reload) – (default: False)

  • optional) (Whether to re-process the dataset.) – (default: False)

Tip

Name

#graphs

#nodes

#edges

#features

#classes

MUTAG

188

~17.9

~39.6

7

2

ENZYMES

600

~32.6

~124.3

3

6

PROTEINS

1,113

~39.1

~145.6

3

2

COLLAB

5,000

~74.5

~4914.4

0

3

IMDB-BINARY

1,000

~19.8

~193.1

0

2

REDDIT-BINARY

2,000

~429.6

~995.5

0

2

url = 'https://www.chrsmrrs.com/graphkerneldatasets'
cleaned_url = 'https://raw.githubusercontent.com/nd7141/graph_datasets/master/datasets'
property raw_dir: str
property processed_dir: str
property num_node_labels: int
property num_node_attributes: int
property num_edge_labels: int
property num_edge_attributes: int
property raw_file_names: List[str]

The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

property processed_file_names: str

The name of the files in the self.processed_dir folder that must be present in order to skip processing.

download()[source]

Downloads the dataset to the self.raw_dir folder.

process()[source]

Processes the dataset to the self.processed_dir folder.