Attention Is All You Need (Neural Networks)

8 points by idibidiartists 9 years ago · 3 comments

Reader

rerx 9 years ago

This is super interesting. I believe the general expectation was that convolutional neural networks would soon surpass recurrent neural networks in machine translation tasks, but this is an entirely novel approach.

visarga 9 years ago

This, and graph based neural nets are very different from CNN and LSTM. They learn to split a scene into objects and then learn how they interact. In this way a lot of variation in the input is factorized out and only relations between compatible types of objects are learned. It leads to stronger generalization.
If you think about it, when we are going to do full reasoning, how is the data to be represented? Embeddings and flat lists/matrices are not appropriate for the way objects interrelate. It has to be a kind of graph. Here they used multiple attentions instead, which kind-of work the same way as graphs, attention heads being similar to links between objects.
Once we have data represented as graphs we can also do simulation - we apply the rules of each object iteratively on the graph. The graph can be seen as an automata, where each object updates its state by integrating information from its neighbors. Automata are general Turing machines - they can represent and simulate any computation. With simulation we can do optimal solutions search. It opens a lot of doors for AI.
My money is on simulation and graphs for the next level of AI.
- gmitscha 9 years ago
  
  I do not think graphs is where we're heading. I think flat vectors are fine, and I would argue multi-head attention is not THAT different from gated RNNs like LSTM. The multiplication with weights, which are the outcome of a softmaxed dot-product, is similar to the input gate of LSTM.

Settings

Attention Is All You Need (Neural Networks)

Keyboard Shortcuts