Introduction to Tree-LSTMs

5 min read Original article ↗

More Related Content

クラシックな機械学習の入門  5. サポートベクターマシン

{tidytext}と{RMeCab}によるモダンな日本語テキスト分析

深層強化学習入門 2020年度Deep Learning基礎講座「強化学習」

Skip gram shirakawa_20141121

[DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent

[DL輪読会]Bayesian Uncertainty Estimation for Batch Normalized Deep Networks

What's hot

[DL輪読会]Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-...

Dimensionality reduction with t-SNE(Rtsne) and UMAP(uwot) using R packages.

PRML上巻勉強会 at 東京大学 資料 第1章後半

PyTorchLightning ベース Hydra+MLFlow+Optuna による機械学習開発環境の構築

[DL輪読会]Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs (CV...

[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...

Rを用いたLTV(Life Time Value)の推定

딥 러닝 자연어 처리 학습을 위한 PPT! (Deep Learning for Natural Language Processing)

Similar to Introduction to Tree-LSTMs

Sequence learning and modern RNNs

Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)

Survey on Text Prediction Techniques

Long Short Term Memory LSTM

DL for sentence classification project Write-up

Long and short term memory presesntation

introduction to machine learning for students.pptx

LSTM and its Applications

NextWordPrediction_ppt[1].pptx

NLP bài tập lớn sentiment analysis slide

IRJET- Survey on Generating Suggestions for Erroneous Part in a Sentence

lstmhh hjhj uhujikj iijiijijiojijijijijiji

Natural Language Processing Advancements By Deep Learning: A Survey

Long Short Term Memory (Neural Networks)

Long short term memory on tensorflow using python

Neural Architectures for Named Entity Recognition

Advanced_NLP_with_Transformers_PPT_final 50.pptx

DEEP LEARNING -Recurrent Neural Networks

Recently uploaded

A Practical Guide to Debugging Your FME Workflows with Confidence

Understanding Artificial Intelligence: Types, Mechanisms, and Future Impact

Chatbot Abuse Risks For Minors Need Real Safety

Oracle SOA Basics Service Architecture Training v1.0

Best Places to Buy Real Human-Created Gmail Accounts for SEO Outreach.docx

Comprehensive Overview of Internet of Things: Concepts, Applications, and Cha...

TopMate ES32 Electric Scooter: A Compact and Practical Mobility Solution for ...

Best Salesforce CTI Solution for High-Volume Sales Teams

Barcelona Tech Job Fair 2025: Attendee Demographics & Insights Report

The Drift/Fidelity Index: Measuring Reality Alignment in AI-Mediated Systems

Unlocking Data Potential_ Integrating Salesforce with BigQuery for Enterpri...

When-Security-Failures-Become-Legal-Risks.pptx.pdf

Data in Motion: Building Trusted Data Pipelines.pdf

Cultural Evolution in Multi-Agent LLM Networks: Dynamics, Risks, and Future R...

Sumsub Joins World Economic Forum Unicorn Community to Address Al Fraud

TechFems Barcelon World Cafe - Shaping our Future

digital network filed Interview Questions123.docx

Keynote -MeetUp -FinOps Academy - Costra

5 Advantages of Colocation for Corporate IT Infrastructure.pdf

Shape your dream product by Rotational Moulding.ppsx

Introduction to Tree-LSTMs

  • 1.

    Improved Semantic RepresentationsFrom Tree-Structured Long Short-Term Memory Networks by Kai Sheng Tai, Richard Socher, Christopher D. Manning Daniel Perez tuvistavie CTO @ Claude Tech M2 @ The University of Tokyo October 2, 2017

  • 2.

    Distributed representation ofwords Idea Encode each word using a vector in Rd , such that words with similar meanings are close in the vector space. 2

  • 3.
  • 4.
  • 5.

    Basic RNN cell Ina plain RNN, ht is computed as follow ht = tanh(Wxt + Uht−1 + b) given, g(xt, ht−1) = Wxt + Uht−1 + b, 5

  • 6.

    Basic RNN cell Ina plain RNN, ht is computed as follow ht = tanh(Wxt + Uht−1 + b) given, g(xt, ht−1) = Wxt + Uht−1 + b, Issue Because of vanishing gradients, gradients do not propagate well through the network: impossible to learn long-term dependencies 5

  • 7.

    Long short-term memory(LSTM) Goal Improve RNN architecture to learn long term dependencies Main ideas • Add a memory cell which does not suffer vanishing gradient • Use gating to control how information propagates 6

  • 8.
  • 9.
  • 10.
  • 11.
  • 12.

    Tree-structured LSTMs Goal Improve encodingof sentences by using their structures Models • Child-sum tree LSTM Sums over all the children of a node: can be used for any number of children • N-ary tree LSTM Use different parameters for each node: better granularity, but maximum number of children per node must be fixed 9

  • 13.

    Child-sum tree LSTM Childrenoutputs and memory cells are summed Child-sum tree LSTM at node j with children k1 and k2 10

  • 14.

    Child-sum tree LSTM Properties •Does not take into account children order • Works with variable number of children • Shares gates weight (including forget gate) between children Application Dependency Tree-LSTM: number of dependents is variable 11

  • 15.

    N-ary tree LSTM Giveng (n) k (xt, hl1 , · · · , hlN ) = W (n)xt + N l=1 U (n) kl hjl + b(n) Binary tree LSTM at node j with children k1 and k2 12

  • 16.

    N-ary tree LSTM Properties •Each node must have at most N children • Fine-grained control on how information propagates • Forget gate can be parameterized so that siblings affect each other Application Constituency Tree-LSTM: using a binary tree LSTM 13

  • 17.

    Sentiment classification Task Predict sentimentˆyj of node j Sub-tasks • Binary classification • Fine-grained classification over 5 classes Method • Annotation at node level • Uses negative log-likelihood error ˆpθ(y|{x}j ) = softmax W (s) hj + b(s) ˆyj = arg max y ˆpθ(y|{x}j ) 14

  • 18.

    Sentiment classification results ConstituencyTree-LSTM performs best on fine-grained sub-task Method Fine-grained Binary CNN-multichannel 47.4 88.1 LSTM 46.4 84.9 Bidirectional LSTM 49.1 87.5 2-layer Bidirectional LSTM 48.5 87.2 Dependency Tree-LSTM 48.4 85.7 Constituency Tree-LSTM - randomly initialized vectors 43.9 82.0 - Glove vectors, fixed 49.7 87.5 - Glove vectors, tuned 51.0 88.0 15

  • 19.

    Semantic relatedness Task Predict similarityscore in [1, K] between two sentences Method Similarity between sentences L and R annotated with score ∈ [1, 5] • Produce representations hL and hR • Compute distance h+ and angle h× between hL and hR • Compute score using fully connected NN hs = σ W (×) h× + W (+) h+ + b(h) ˆpθ = softmax W (p) hs + b(p) ˆy = rT ˆpθ r = [1, 2, 3, 4, 5] • Error is computed using KL-divergence 16

  • 20.

    Semantic relatedness results DependencyTree-LSTM performs best for all measures Method Pearson’s r MSE LSTM 0.8528 0.2831 Bidirectional LSTM 0.8567 0.2736 2-layer Bidirectional LSTM 0.8558 0.2762 Constituency Tree-LSTM 0.8582 0.2734 Dependency Tree-LSTM 0.8676 0.2532 17

  • 21.

    Summary • Tree-LSTMs allowto encode tree topologies • Can be used to encode sentences parse trees • Can capture longer and more fine-grained words dependencies 18

  • 22.

    References Christopher Olah. Understanding lstmnetworks, 2015. Kai Sheng Tai, Richard Socher, and Christopher D Manning. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. 2015. 19