Attention Models
-
$$a^{
} = (a^{ }_r, a^{ }_l)$$ which are features that indicate attention to a word on the right and on the left. -
Context: c^{} = \sum_{t} \alpha^ a
-
\alpha^<1, t'> are the attention weights that define the context of the first word.
-
\sum_{t'} \alpha^ = 1
-
\alpha^{
} = amount of attention y^{ } should py to a^ -
\alpha^{
} = \frac{\exp(e^{ }}{\sum_{t'} \exp(e^{ }}. -
These ominous $e^{
}$ are trained using $s^{ }$ and $a^ $ as input for a simple feed-forward NN. -
Runs in quadratic time.
References/Further Reading
- {bibs:bahdanau2014neural}
- {bibs:xu2015show}