Introduction to Tree-LSTMs

5 min read Original article ↗

More Related Content

【LT資料】 Neural Network 素人なんだけど何とかご機嫌取りをしたい

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

(DL輪読)Matching Networks for One Shot Learning

What's hot

PyTorch, PixyzによるGenerative Query Networkの実装

Deep Learning for Chatbot (2/4)

【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...

Self-Attention with Linear Complexity

PFI Seminar 2012/03/15 カーネルとハッシュの機械学習

【論文紹介】How Powerful are Graph Neural Networks?

LSTM (Long short-term memory) 概要

TensorFlow XLAは、 中で何をやっているのか?

[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...

Efficient Neural Architecture Search via Parameters Sharing @ ICML2018読み会

深層強化学習の分散化・RNN利用の動向〜R2D2の紹介をもとに〜

Trip down the GPU lane with Machine Learning

[DL輪読会] MoCoGAN: Decomposing Motion and Content for Video Generation

生成系ニューラルネットワークまとめ Summary of Generative Neural Network

【DL輪読会】Aspect-based Analysis of Advertising Appeals for Search Engine Advert...

Similar to Introduction to Tree-LSTMs

Long Short Term Memory LSTM

Sequence learning and modern RNNs

Long Short Term Memory (Neural Networks)

DEEP LEARNING -Recurrent Neural Networks

Long and short term memory presesntation

Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)

Advanced_NLP_with_Transformers_PPT_final 50.pptx

Natural Language Processing Advancements By Deep Learning: A Survey

NLP bài tập lớn sentiment analysis slide

introduction to machine learning for students.pptx

lstmhh hjhj uhujikj iijiijijiojijijijijiji

LSTM and its Applications

DL for sentence classification project Write-up

Survey on Text Prediction Techniques

Long short term memory on tensorflow using python

IRJET- Survey on Generating Suggestions for Erroneous Part in a Sentence

NextWordPrediction_ppt[1].pptx

Neural Architectures for Named Entity Recognition

Recently uploaded

Introduction to cybersecurity and pentesting

THREE PATHS TO PERSITENT AWS COMPROMISE A pentester's playbook.pptx

How to Evaluate a High Performance Database

communication-skills-with-technology tools

Core Components of Internet of Things (IoT)

100 Insights After the 200th Issue of NewMind AI Journal

Why Most GenAI Projects Fail to Scale and How to Become One of the Success St...

The Job Market in 2026: Product Management Is Dead. Long Live Product Building.

Beyond the Grid: Crafting a Custom Android Launcher from Scratch

Lab 4.2 Multi-cloud Deployments - 2nd Sight Lab Cloud Security Class

2DArrays_Matrix data structure and algorithm

Lab 5.4 Cloud Incident - 2nd Sight Lab Cloud Security Class

Emancipatory Information Retrieval: Radically Reorienting Information Retriev...

UiPath Autonomous Agents | Building and Orchestrating Agents End-to-End

operationa research chapter 3 PowerPoint

cold plasma technology for Waste water treatment.pptx

Lab 3.4 S3 + Encryption - 2nd Sight Lab Cloud Security Class

Lab 2.2 DNS In The Cloud - 2nd Sight Lab Cloud Security Class

SharePoint to SharePoint Online Migration Tool

Ai Chatbot development: Guide for Modern Businesses

Introduction to Tree-LSTMs

  • 1.

    Improved Semantic RepresentationsFrom Tree-Structured Long Short-Term Memory Networks by Kai Sheng Tai, Richard Socher, Christopher D. Manning Daniel Perez tuvistavie CTO @ Claude Tech M2 @ The University of Tokyo October 2, 2017

  • 2.

    Distributed representation ofwords Idea Encode each word using a vector in Rd , such that words with similar meanings are close in the vector space. 2

  • 3.
  • 4.
  • 5.

    Basic RNN cell Ina plain RNN, ht is computed as follow ht = tanh(Wxt + Uht−1 + b) given, g(xt, ht−1) = Wxt + Uht−1 + b, 5

  • 6.

    Basic RNN cell Ina plain RNN, ht is computed as follow ht = tanh(Wxt + Uht−1 + b) given, g(xt, ht−1) = Wxt + Uht−1 + b, Issue Because of vanishing gradients, gradients do not propagate well through the network: impossible to learn long-term dependencies 5

  • 7.

    Long short-term memory(LSTM) Goal Improve RNN architecture to learn long term dependencies Main ideas • Add a memory cell which does not suffer vanishing gradient • Use gating to control how information propagates 6

  • 8.
  • 9.
  • 10.
  • 11.
  • 12.

    Tree-structured LSTMs Goal Improve encodingof sentences by using their structures Models • Child-sum tree LSTM Sums over all the children of a node: can be used for any number of children • N-ary tree LSTM Use different parameters for each node: better granularity, but maximum number of children per node must be fixed 9

  • 13.

    Child-sum tree LSTM Childrenoutputs and memory cells are summed Child-sum tree LSTM at node j with children k1 and k2 10

  • 14.

    Child-sum tree LSTM Properties •Does not take into account children order • Works with variable number of children • Shares gates weight (including forget gate) between children Application Dependency Tree-LSTM: number of dependents is variable 11

  • 15.

    N-ary tree LSTM Giveng (n) k (xt, hl1 , · · · , hlN ) = W (n)xt + N l=1 U (n) kl hjl + b(n) Binary tree LSTM at node j with children k1 and k2 12

  • 16.

    N-ary tree LSTM Properties •Each node must have at most N children • Fine-grained control on how information propagates • Forget gate can be parameterized so that siblings affect each other Application Constituency Tree-LSTM: using a binary tree LSTM 13

  • 17.

    Sentiment classification Task Predict sentimentˆyj of node j Sub-tasks • Binary classification • Fine-grained classification over 5 classes Method • Annotation at node level • Uses negative log-likelihood error ˆpθ(y|{x}j ) = softmax W (s) hj + b(s) ˆyj = arg max y ˆpθ(y|{x}j ) 14

  • 18.

    Sentiment classification results ConstituencyTree-LSTM performs best on fine-grained sub-task Method Fine-grained Binary CNN-multichannel 47.4 88.1 LSTM 46.4 84.9 Bidirectional LSTM 49.1 87.5 2-layer Bidirectional LSTM 48.5 87.2 Dependency Tree-LSTM 48.4 85.7 Constituency Tree-LSTM - randomly initialized vectors 43.9 82.0 - Glove vectors, fixed 49.7 87.5 - Glove vectors, tuned 51.0 88.0 15

  • 19.

    Semantic relatedness Task Predict similarityscore in [1, K] between two sentences Method Similarity between sentences L and R annotated with score ∈ [1, 5] • Produce representations hL and hR • Compute distance h+ and angle h× between hL and hR • Compute score using fully connected NN hs = σ W (×) h× + W (+) h+ + b(h) ˆpθ = softmax W (p) hs + b(p) ˆy = rT ˆpθ r = [1, 2, 3, 4, 5] • Error is computed using KL-divergence 16

  • 20.

    Semantic relatedness results DependencyTree-LSTM performs best for all measures Method Pearson’s r MSE LSTM 0.8528 0.2831 Bidirectional LSTM 0.8567 0.2736 2-layer Bidirectional LSTM 0.8558 0.2762 Constituency Tree-LSTM 0.8582 0.2734 Dependency Tree-LSTM 0.8676 0.2532 17

  • 21.

    Summary • Tree-LSTMs allowto encode tree topologies • Can be used to encode sentences parse trees • Can capture longer and more fine-grained words dependencies 18

  • 22.

    References Christopher Olah. Understanding lstmnetworks, 2015. Kai Sheng Tai, Richard Socher, and Christopher D Manning. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. 2015. 19