gammagl.datasets.Planetoid

class Planetoid(root: str | None = None, name: str = 'cora', split: str = 'public', num_train_per_class: int = 20, num_val: int = 500, num_test: int = 1000, transform: Callable | None = None, pre_transform: Callable | None = None, force_reload: bool = False)[source]

The citation network datasets “Cora”, “CiteSeer” and “PubMed” from the “Revisiting Semi-Supervised Learning with Graph Embeddings” paper. Nodes represent documents and edges represent citation links. Training, validation and test splits are given by binary masks.

Parameters:
  • root (str, optional) – Root directory where the dataset should be saved.

  • name (str, optional) – The name of the dataset ("Cora", "CiteSeer", "PubMed").

  • split (str, optional) –

    The type of dataset split ("public", "full", "random"). If set to "public", the split will be the public fixed split from the “Revisiting Semi-Supervised Learning with Graph Embeddings” paper. If set to "full", all nodes except those in the validation and test sets will be used for training (as in the “FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling” paper). If set to "random", train, validation, and test sets will be randomly generated, according to num_train_per_class, num_val and num_test. (default: "public")

  • num_train_per_class (int, optional) – The number of training samples per class in case of "random" split. (default: 20)

  • num_val (int, optional) – The number of validation samples in case of "random" split. (default: 500)

  • num_test (int, optional) – The number of test samples in case of "random" split. (default: 1000)

  • transform (callable, optional) – A function/transform that takes in an gammagl.data.Graph object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an gammagl.data.Graph object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • (bool (force_reload) – (default: False)

  • optional) (Whether to re-process the dataset.) – (default: False)

Tip

Name

#nodes

#edges

#features

#classes

Cora

2,708

10,556

1,433

7

CiteSeer

3,327

9,104

3,703

6

PubMed

19,717

88,648

500

3

url = 'https://github.com/kimiyoung/planetoid/raw/master/data'
property raw_dir: str
property processed_dir: str
property raw_file_names: List[str]

The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

property processed_file_names: str

The name of the files in the self.processed_dir folder that must be present in order to skip processing.

download()[source]

Downloads the dataset to the self.raw_dir folder.

process()[source]

Processes the dataset to the self.processed_dir folder.