gammagl.layers.conv.HardGATConv¶

class HardGATConv(in_channels, out_channels, k=8, heads=1, concat=True, negative_slope=0.2, dropout_rate=0.0, add_bias=True)[source]¶

The graph hard attentional operator from the “Graph Representation Learning via Hard and Channel-Wise Attention Networks” paper

\[\begin{split}\begin{aligned} &y=\frac{\left|X^T p\right|}{\|p\|}\\ &\text { for } i=1,2, \cdots, N \text { do }\\ &\quad\quad id x_i=\text { Ranking }_k\left(A_{: i} \circ y\right) \quad \in \mathbb{R}^k\\ &\quad\quad\hat{X}_i=X\left(:, i d x_i\right) \quad \in \mathbb{R}^{d \times k}\\ &\quad\quad\tilde{y}_i=\operatorname{sigmoid}\left(y\left(i d x_i\right)\right) \quad \in \mathbb{R}^k\\ &\quad\quad\tilde{X}_i=\hat{X}_i \operatorname{diag}\left(\tilde{y}_i\right) \quad \in \mathbb{R}^{d \times k}\\ &\quad\quad z_i=\operatorname{attn}\left(x_i, \tilde{X}_i, \tilde{X}_i\right) \quad \in \mathbb{R}^d\\ &Z=\left[z_1, z_2, \ldots, z_N\right]\in \mathbb{R}^{d \times N} \end{aligned}\end{split}\]

where the attn operation is the same as GAT, and the process is as follows.

\[\mathbf{x}^{\prime}_i = \alpha_{i,i}\mathbf{\Theta}\mathbf{x}_{i} + \sum_{j \in \mathcal{N}(i)} \alpha_{i,j}\mathbf{\Theta}\mathbf{x}_{j},\]

where the attention coefficients \(\alpha_{i,j}\) are computed as

\[\alpha_{i,j} = \frac{ \exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top} [\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_j] \right)\right)} {\sum_{k \in \mathcal{N}(i) \cup \{ i \}} \exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top} [\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_k] \right)\right)}.\]

Parameters:

in_channels (int, tuple) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method. A tuple corresponds to the sizes of source and target dimensionalities.
out_channels (int) – Size of each output sample.
heads (int, optional) – Number of multi-head-attentions. (default: 1)
k (int, optional) – Number of neighbors to attention (default: 8)
concat (bool, optional) – If set to False, the multi-head attentions are averaged instead of concatenated. (default: True)
negative_slope (float, optional) – LeakyReLU angle of the negative slope. (default: 0.2)
dropout_rate (float, optional) – Dropout probability of the normalized attention coefficients which exposes each node to a stochastically sampled neighborhood during training. (default: 0)
add_self_loops (bool, optional) – If set to False, will not add self-loops to the input graph. (default: True)
add_bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

select_topk(edge_index, value)[source]¶

message(x, edge_index, edge_weight=None, num_nodes=None)[source]¶

Function that construct message from source nodes to destination nodes.

Parameters:

x (tensor) – input node feature.
edge_index (tensor) – edges from src to dst.
edge_weight (tensor, optional) – weight of each edge.

Returns:

tensor – output message
Returns – the message matrix, and the shape is [num_edges, message_dim]

forward(x, edge_index, num_nodes)[source]¶

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.