Skip to content

Attention Models

  • $$a^{} = (a^{}_r, a^{}_l)$$ which are features that indicate attention to a word on the right and on the left.

  • Context: c^{} = \sum_{t} \alpha^ a

  • \alpha^<1, t'> are the attention weights that define the context of the first word.

  • \sum_{t'} \alpha^ = 1

  • \alpha^{} = amount of attention y^{} should py to a^

  • \alpha^{} = \frac{\exp(e^{}}{\sum_{t'} \exp(e^{}}.

  • These ominous $e^{}$ are trained using $s^{}$ and $a^$ as input for a simple feed-forward NN.

  • Runs in quadratic time.

References/Further Reading

  • {bibs:bahdanau2014neural}
  • {bibs:xu2015show}