research
Graph Attention Networks

Graph Attention Neural Network

https://arxiv.org/pdf/1710.10903 (opens in a new tab)

so the Graph network are used for for the domain which does not have the grid-like structure and instead relais on irregular data structure. This can be seen in data of social network, biological network or brain connectomes, the data which are usually presented in the form of graph

Resercher in this paper introduced attention-based-architure to perform node classification of graph-structure data. this idea was to compute hidden representations of each node by going through the neighbour, by following self-attention strategy

2. GAT Architure

In this section they explain how to design building blocks layer used to construct graph attention network and outline theoritical and pratical benifits and limitation

2.1 Graph Attention layer

  1. Input Node Features:

    • The input is a set of node features denoted as h={h~1,h~2,…,h~N}h = \{ \tilde{h}_1, \tilde{h}_2, \dots, \tilde{h}_N \}.
    • Each node feature h~i\tilde{h}_i belongs to RF\mathbb{R}^F, where:
      • NN: Number of nodes.
      • FF: Number of features per node.
  2. Output Node Features:

    • The layer produces a new set of node features denoted as hβ€²={h~1β€²,h~2β€²,…,h~Nβ€²}h' = \{ \tilde{h}'_1, \tilde{h}'_2, \dots, \tilde{h}'_N \}.
    • Each new node feature h~iβ€²\tilde{h}'_i belongs to RFβ€²\mathbb{R}^{F'}, where:
      • Fβ€²F': Number of features per node in the output (which may differ from FF).
  3. Node-to-Node Mapping:

    • The number of nodes NN remains the same between input and output.
    • The transformation occurs at the feature level, converting features of cardinality FF into features of cardinality Fβ€²F'.
  4. Feature Transformation:

    • The layer applies a transformation to the node features, resulting in potentially different cardinality for the output features.

  1. Need for Expressive Power:

    • To transform input features into higher-level features, at least one learnable linear transformation is necessary.
  2. Initial Step – Shared Linear Transformation:

    • A shared linear transformation is applied to every node.
    • This transformation is parametrized by a weight matrix W∈RFβ€²Γ—FW \in \mathbb{R}^{F' \times F}, where:
      • FF: Number of input features per node.
      • Fβ€²F': Number of output features per node.
  3. Self-Attention Mechanism:

    • Self-attention is performed on the nodes.
    • A shared attentional mechanism a:RFβ€²Γ—RFβ€²β†’Ra: \mathbb{R}^{F'} \times \mathbb{R}^{F'} \to \mathbb{R} is used to compute attention coefficients.
  4. Attention Coefficients:

    • Attention coefficients are computed as:
      eij=a(Wh~i,Wh~j),e_{ij} = a(W \tilde{h}_i, W \tilde{h}_j),
      where:
      • Wh~iW \tilde{h}_i: Transformed features of node ii.
      • Wh~jW \tilde{h}_j: Transformed features of node jj.

Here is the text rewritten with your preferred LaTeX syntax:


  1. Importance of Node jj to Node ii:

    • The attention mechanism determines the importance of node jj's features to node ii.
  2. General Formulation:

    • In its most general form, the model allows every node to attend to every other node, disregarding the graph's structural information.
  3. Masked Attention:

    • To preserve graph structure, masked attention is used.
    • Attention coefficients eije_{ij} are computed only for nodes j∈Nij \in N_i, where:
      • NiN_i: Neighborhood of node ii in the graph.
  4. Neighborhood Definition:

    • For all experiments, NiN_i includes the first-order neighbors of node ii, including ii itself.
  5. Normalization with Softmax:

    • Attention coefficients are normalized across all choices of jj using the softmax function:
      Here is the equation in your specified LaTeX version:

Ξ±ij=softmaxj(eij)=exp⁑(eij)βˆ‘k∈Niexp⁑(eik)\alpha_{ij} = \text{softmax}_j(e_{ij}) = \frac{\exp(e_{ij})}{\sum_{k \in N_i} \exp(e_{ik})}

  1. Purpose of Normalization:
    • Normalization ensures that attention coefficients Ξ±ij\alpha_{ij} are easily comparable across different nodes.

This version adheres to your specified LaTeX formatting style.

Unread Papers

  • Graph Neural Networks (GNNs) were introduced in Gori et al. (2005) and Scarselli et al. (2009)

  • which propose to use gated recurrent units (Cho et al., 2014)

  • Hamilton et al. (2017) introduced GraphSAGE, a method for computing node representations in an inductive manner

  • Attention mechanisms have become almost a de facto standard in many sequence-based tasks (Bahdanau et al., 2015; Gehring et al., 2016).

  • Other related approaches include locally linear embedding (LLE) (Roweis & Saul, 2000) and memory networks (Weston et al., 2014).