Universal Sentence Encoder

89 points by andrewg 8 years ago · 35 comments

Reader

nl 8 years ago

Interesting. There's a big need for better vector representations of things in-between words (for which Word2Vec/Glove/FastText work well) and documents (which to me seems impossible. Yes I know about Doc2Vec etc, but really.. it works ok for paragraphs).

Facebook's InferSent[1] has worked reasonably well for me for a variety of sentence level tasks, but I don't have anything I can point to to say that it is really substantially better than averaging word embeddings.

More options is good.

(Also, is Kurzweil part of Google Brain or separate. He doesn't really have nay background in NLP does he?)

[1] https://github.com/facebookresearch/InferSent

jerf 8 years ago

"Also, is Kurzweil part of Google Brain or separate. He doesn't really have nay background in NLP does he?"
From Wikipedia: "Raymond "Ray" Kurzweil (/ˈkɜːrzwaɪl/ KURZ-wyl; born February 12, 1948) is an American author, computer scientist, inventor and futurist. Aside from futurism, he is involved in fields such as optical character recognition (OCR), text-to-speech synthesis, speech recognition technology, and electronic keyboard instruments.... Kurzweil was the principal inventor of... the first print-to-speech reading machine for the blind,[3] the first commercial text-to-speech synthesizer,[4]... and the first commercially marketed large-vocabulary speech recognition."
He's been in the general space of NLP for quite a while.
slashcom 8 years ago

For the record, good old fashioned bag of words representations (tf-idf, LDA, LSA) still provide useful representations for documents. Obviously we hope to do better, but recently people act like there's no way of turning a document into a vector.
- nl 8 years ago
  
  Bag of word representations work fine for some applications.
  The reason people want better representations is for the applications where they don’t. For example, Bag of words doesn’t capture agreement or disagree well, whereas better representations can.

JustFinishedBSG 8 years ago

1. This is more Technical Report worthy than paper worthy...

2. "by Ray Kurzweil's Team", although accurate I find that fetishization of certain stars to pretty insulting to the other authors, we already have a convention and it's "Cer et al. (2018)"

PaulHoule 8 years ago

At least Ray has the decency to be listed last on the author list!
Personally I think the idea of this paper is pretty good, but the evaluation is weak.
- wolfgke 8 years ago
  
  > At least Ray has the decency to be listed last on the author list!
  Just do it like in mathematics: Authors in alphabetical order.
  - josephjrobison 8 years ago
    
    Usually the actual lead author is first, the assistant authors follow, and the advisor is listed last.
    At least that’s how it is in (psychology and other?) PhD programs.
    So Ray may only be supervising or contributing a small portion and is likely listed on all papers his team publishes.
    
    l1n 8 years ago
    
    Same in Biology
  - PaulHoule 8 years ago
    
    One senior physicist I worked with advocated alphabetical order whenever he would come first in it!
  - lobster_johnson 8 years ago
    
    Physics, too, which causes another interesting side effect: https://www.thetimes.co.uk/article/to-get-ahead-in-physics-y...
- paradroid 8 years ago
  
  In psychology the senior author comes first. Here we have mixed paradigms in authorship. Putting Kurzweil last is definitely intentional.

igravious 8 years ago

“We present models for encoding sentences into embedding vectors that specifically target transfer learning to other NLP tasks. The models are efficient and result in accurate performance on diverse transfer tasks. Two variants of the encoding models allow for trade-offs between accuracy and compute resources. For both variants, we investigate and report the relationship between model complexity, resource consumption, the availability of transfer task training data, and task performance. Comparisons are made with baselines that use word level transfer learning via pretrained word embeddings as well as baselines do not use any transfer learning. We find that transfer learning using sentence embeddings tends to outperform word level transfer. With transfer learning via sentence embeddings, we observe surprisingly good performance with minimal amounts of supervised training data for a transfer task. We obtain encouraging results on Word Embedding Association Tests (WEAT) targeted at detecting model bias. Our pre-trained sentence encoding models are made freely available for download and on TF Hub.”

Awesome. Now what does all that mean in English?

rahimnathwani 8 years ago

They made a way to take any sentence, and output a small array of numbers that represent its essence. You can use their model to find the essence of your own sentences. And then use it either directly (e.g. compare the essence of two sentences to see if they're saying roughly the same thing) or use it as a starting point for the model you need (e.g. if you're building a system to convert English sentences into French, your neural network might generate the essence of the English sentence as part of its work. By using the pre-trained model, you have a better starting point for that part of the network than just random numbers, so your training time will be greatly reduced).
- laboo 8 years ago
  
  What do you mean by "its essence"? Is this a semantic essence?
  - tree_of_item 8 years ago
    
    The array of numbers represents some opaque statistical property of the sentence with respect to the others in the corpus the model was trained from. The hope is that this property will correlate with what we believe to be the sentence's meaning.
  - ttul 8 years ago
    
    Yes

irontoby 8 years ago

> Awesome. Now what does all that mean in English?

Well, simply put:

  [ccebb 677ce 28f77 86558 2d7cc d67b4 e8f31 8c393 ae867 13593 aa869 3c265],
  [c0021 72510 cee7a 31580 554d3 d49a6 306b9 c1f2c 60c1a 1157c f44c8 31273],
  [682f2 6a4df dc970 3c106 2107c 3dfd5 1506a 6f1b5 af428 829f8 11d06 797dc],
  [d6f84 25e73 76558 6feb0 c67d4 fcc73 b5c8d af4db 2f647 82247 852e7 fc010],
  [f08a8 2ed8f c71bb 12043 5f0f9 190c8 f2ae8 7b30a 4a574 269d0 03be0 a363c],
  [b38c2 10031 37ada 504a8 f2919 3b82b 258fc 5673f c939c a0ef1 46be5 a50d6],
  [93fcd e19f7 0558f e01a6 8beb1 d54b9 9ad20 d6185 adf9b 876a1 a1a94 c9197],
  [92b49 ed290 7a072 fdf1d a61a8 65124 a2025 27153 afa71 a27db 29a2a e5b47],
  [2793f 7171f b18c9 e1945 d31d5 edb66 a1ee0 d9982 e8442 7795d bd4e4 30b41]

tomku 8 years ago

They have an algorithm that takes sentences in textual form and produces a different representation of each sentence that (they claim) is easier for certain language-oriented machine learning tasks to work with. Previous work focused on producing that different representation at the word level, but theirs works on complete sentences.
thaumaturgy 8 years ago

I had been under the impression that you could just feed text into neural nets, and then ... magic!
But, no. As it turns out, the very first problem you encounter when trying to implement ML on text is that you need to transform the text into some set of numbers (the "vectors"), with the elements in the set matching the number of nodes in your input layer.
This is a tricky thing to do. You're essentially trying to "hash" the text in a way which uniquely represents the text you're working with and also gives the neural net something it can operate on. Which is to say, you can't just use a common hashing algorithm, because the neural net won't be able to learn anything from the random output of the hashing algorithm.
There are several different approaches being used for this. One of them, mentioned elsethread, is "bag-of-words", where you build a big dictionary of word-to-number associations and then do some variety of transformations on that. Another is "feature extraction", where you might try to input a value representing properties like the length of the sentence, the number of words, the vocabulary level of the words, and so on. (This would probably be a bad approach for most ML goals on long text.)
This paper presents another approach.
saas_co_de 8 years ago

> Awesome. Now what does all that mean in English?
Singularity any day now

mlevental 8 years ago

>Our pre-trained sentence encoding models are made freely available for download and on TF Hub.

what is tf hub? I assume it stands for tensor flow hub but what is that

eruditepanda 8 years ago

It looks like an internal site, this is the link it is referring to: https://tfhub.dev/google/universal-sentence-encoder/1
- sp821543 8 years ago
  
  https://www.tensorflow.org/hub/modules/google/universal-sent...
  - eruditepanda 8 years ago
    
    It looks like there is a link to a Colab notebook (Google's hosted JupyterHub environment, also called Datalab): https://colab.research.google.com/github/tensorflow/hub/blob...
  - mlevental 8 years ago
    
    so does this work? am i getting redirected back to that page when i click the link because they're checking my user agent? i don't have tf installed on this machine in order to check but does getting the model through the tf api work?
- quizotic 8 years ago
  
  404?
  - ynniv 8 years ago
    
    Did you miss “internal”?
    
    eruditepanda 8 years ago
    
    lol, although I might have to take some blame by putting a link in my comment to begin with.
    Note: Keep in mind that some folks publish on Arxiv because it is far easier than going through a traditional publication process. As such, you sometimes get not-as-polished works like this, although they might update the article to fix some of those references.
andrewgOP 8 years ago

https://www.tensorflow.org/hub/
metalin1234 8 years ago

Seems to be model-zoo with tight tf integration
Seems to be announcing today at the TF summit this afternoon: https://www.tensorflow.org/dev-summit/schedule/
pip/github links not yet activated: https://pypi.python.org/pypi/tensorflow-hub/0.1.0

golergka 8 years ago

As someone who has done a ML course, did a primitive Word2Vec but doesn't really follow the field all that close - how important is this and how does it compare to what came before?

pcf 8 years ago

"..transfer learning to other NLP tasks" – NLP as in neuro-linguistic programming?

If so, can someone explain how this project is related to NLP? Thanks!

girvo 8 years ago

Natural language processing/parsing

Settings

Universal Sentence Encoder

Keyboard Shortcuts