gammagl.datasets.TUDataset¶
- class TUDataset(root: str | None = None, name: str = 'MUTAG', transform: Callable | None = None, pre_transform: Callable | None = None, pre_filter: Callable | None = None, use_node_attr: bool = False, use_edge_attr: bool = False, cleaned: bool = False, force_reload: bool = False)[source]¶
A variety of graph kernel benchmark datasets, .e.g. “IMDB-BINARY”, “REDDIT-BINARY” or “PROTEINS”, collected from the TU Dortmund University. In addition, this dataset wrapper provides cleaned dataset versions as motivated by the “Understanding Isomorphism Bias in Graph Data Sets” paper, containing only non-isomorphic graphs.
Note
Some datasets may not come with any node labels. You can then either make use of the argument
use_node_attr
to load additional continuous node attributes (if present) or provide synthetic node features using transforms such as likegammagl.transforms.Constant
orgammagl.transforms.OneHotDegree
.- Parameters:
root (str, optional) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an
gammagl.data.Graph
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
gammagl.data.Graph
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)pre_filter (callable, optional) – A function that takes in an
gammagl.data.Graph
object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None
)use_node_attr (bool, optional) – If
True
, the dataset will contain additional continuous node attributes (if present). (default:False
)use_edge_attr (bool, optional) – If
True
, the dataset will contain additional continuous edge attributes (if present). (default:False
)cleaned (bool, optional) – If
True
, the dataset will contain only non-isomorphic graphs. (default:False
)(bool (force_reload) – (default:
False
)optional) (Whether to re-process the dataset.) – (default:
False
)
Tip
Name
#graphs
#nodes
#edges
#features
#classes
MUTAG
188
~17.9
~39.6
7
2
ENZYMES
600
~32.6
~124.3
3
6
PROTEINS
1,113
~39.1
~145.6
3
2
COLLAB
5,000
~74.5
~4914.4
0
3
IMDB-BINARY
1,000
~19.8
~193.1
0
2
REDDIT-BINARY
2,000
~429.6
~995.5
0
2
…
- url = 'https://www.chrsmrrs.com/graphkerneldatasets'¶
- cleaned_url = 'https://raw.githubusercontent.com/nd7141/graph_datasets/master/datasets'¶
- property raw_file_names: List[str]¶
The name of the files in the
self.raw_dir
folder that must be present in order to skip downloading.