More Related Content
クラシックな機械学習の入門 5. サポートベクターマシン
{tidytext}と{RMeCab}によるモダンな日本語テキスト分析
深層強化学習入門 2020年度Deep Learning基礎講座「強化学習」
Skip gram shirakawa_20141121
[DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent
[DL輪読会]Bayesian Uncertainty Estimation for Batch Normalized Deep Networks
What's hot
[DL輪読会]Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-...
Dimensionality reduction with t-SNE(Rtsne) and UMAP(uwot) using R packages.
PRML上巻勉強会 at 東京大学 資料 第1章後半
PyTorchLightning ベース Hydra+MLFlow+Optuna による機械学習開発環境の構築
[DL輪読会]Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs (CV...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
Rを用いたLTV(Life Time Value)の推定
딥 러닝 자연어 처리 학습을 위한 PPT! (Deep Learning for Natural Language Processing)
Similar to Introduction to Tree-LSTMs
Sequence learning and modern RNNs
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Survey on Text Prediction Techniques
Long Short Term Memory LSTM
DL for sentence classification project Write-up
Long and short term memory presesntation
introduction to machine learning for students.pptx
LSTM and its Applications
NextWordPrediction_ppt[1].pptx
NLP bài tập lớn sentiment analysis slide
IRJET- Survey on Generating Suggestions for Erroneous Part in a Sentence
lstmhh hjhj uhujikj iijiijijiojijijijijiji
Natural Language Processing Advancements By Deep Learning: A Survey
Long Short Term Memory (Neural Networks)
Long short term memory on tensorflow using python
Neural Architectures for Named Entity Recognition
Advanced_NLP_with_Transformers_PPT_final 50.pptx
DEEP LEARNING -Recurrent Neural Networks
Recently uploaded
A Practical Guide to Debugging Your FME Workflows with Confidence
Understanding Artificial Intelligence: Types, Mechanisms, and Future Impact
Chatbot Abuse Risks For Minors Need Real Safety
Oracle SOA Basics Service Architecture Training v1.0
Best Places to Buy Real Human-Created Gmail Accounts for SEO Outreach.docx
Comprehensive Overview of Internet of Things: Concepts, Applications, and Cha...
TopMate ES32 Electric Scooter: A Compact and Practical Mobility Solution for ...
Best Salesforce CTI Solution for High-Volume Sales Teams
Barcelona Tech Job Fair 2025: Attendee Demographics & Insights Report
The Drift/Fidelity Index: Measuring Reality Alignment in AI-Mediated Systems
Unlocking Data Potential_ Integrating Salesforce with BigQuery for Enterpri...
When-Security-Failures-Become-Legal-Risks.pptx.pdf
Data in Motion: Building Trusted Data Pipelines.pdf
Cultural Evolution in Multi-Agent LLM Networks: Dynamics, Risks, and Future R...
Sumsub Joins World Economic Forum Unicorn Community to Address Al Fraud
TechFems Barcelon World Cafe - Shaping our Future
digital network filed Interview Questions123.docx
Keynote -MeetUp -FinOps Academy - Costra
5 Advantages of Colocation for Corporate IT Infrastructure.pdf
Shape your dream product by Rotational Moulding.ppsx
Introduction to Tree-LSTMs
- 1.
Improved Semantic RepresentationsFrom Tree-Structured Long Short-Term Memory Networks by Kai Sheng Tai, Richard Socher, Christopher D. Manning Daniel Perez tuvistavie CTO @ Claude Tech M2 @ The University of Tokyo October 2, 2017
- 2.
Distributed representation ofwords Idea Encode each word using a vector in Rd , such that words with similar meanings are close in the vector space. 2
- 3.
- 4.
- 5.
Basic RNN cell Ina plain RNN, ht is computed as follow ht = tanh(Wxt + Uht−1 + b) given, g(xt, ht−1) = Wxt + Uht−1 + b, 5
- 6.
Basic RNN cell Ina plain RNN, ht is computed as follow ht = tanh(Wxt + Uht−1 + b) given, g(xt, ht−1) = Wxt + Uht−1 + b, Issue Because of vanishing gradients, gradients do not propagate well through the network: impossible to learn long-term dependencies 5
- 7.
Long short-term memory(LSTM) Goal Improve RNN architecture to learn long term dependencies Main ideas • Add a memory cell which does not suffer vanishing gradient • Use gating to control how information propagates 6
- 8.
- 9.
- 10.
- 11.
- 12.
Tree-structured LSTMs Goal Improve encodingof sentences by using their structures Models • Child-sum tree LSTM Sums over all the children of a node: can be used for any number of children • N-ary tree LSTM Use different parameters for each node: better granularity, but maximum number of children per node must be fixed 9
- 13.
Child-sum tree LSTM Childrenoutputs and memory cells are summed Child-sum tree LSTM at node j with children k1 and k2 10
- 14.
Child-sum tree LSTM Properties •Does not take into account children order • Works with variable number of children • Shares gates weight (including forget gate) between children Application Dependency Tree-LSTM: number of dependents is variable 11
- 15.
N-ary tree LSTM Giveng (n) k (xt, hl1 , · · · , hlN ) = W (n)xt + N l=1 U (n) kl hjl + b(n) Binary tree LSTM at node j with children k1 and k2 12
- 16.
N-ary tree LSTM Properties •Each node must have at most N children • Fine-grained control on how information propagates • Forget gate can be parameterized so that siblings affect each other Application Constituency Tree-LSTM: using a binary tree LSTM 13
- 17.
Sentiment classification Task Predict sentimentˆyj of node j Sub-tasks • Binary classification • Fine-grained classification over 5 classes Method • Annotation at node level • Uses negative log-likelihood error ˆpθ(y|{x}j ) = softmax W (s) hj + b(s) ˆyj = arg max y ˆpθ(y|{x}j ) 14
- 18.
Sentiment classification results ConstituencyTree-LSTM performs best on fine-grained sub-task Method Fine-grained Binary CNN-multichannel 47.4 88.1 LSTM 46.4 84.9 Bidirectional LSTM 49.1 87.5 2-layer Bidirectional LSTM 48.5 87.2 Dependency Tree-LSTM 48.4 85.7 Constituency Tree-LSTM - randomly initialized vectors 43.9 82.0 - Glove vectors, fixed 49.7 87.5 - Glove vectors, tuned 51.0 88.0 15
- 19.
Semantic relatedness Task Predict similarityscore in [1, K] between two sentences Method Similarity between sentences L and R annotated with score ∈ [1, 5] • Produce representations hL and hR • Compute distance h+ and angle h× between hL and hR • Compute score using fully connected NN hs = σ W (×) h× + W (+) h+ + b(h) ˆpθ = softmax W (p) hs + b(p) ˆy = rT ˆpθ r = [1, 2, 3, 4, 5] • Error is computed using KL-divergence 16
- 20.
Semantic relatedness results DependencyTree-LSTM performs best for all measures Method Pearson’s r MSE LSTM 0.8528 0.2831 Bidirectional LSTM 0.8567 0.2736 2-layer Bidirectional LSTM 0.8558 0.2762 Constituency Tree-LSTM 0.8582 0.2734 Dependency Tree-LSTM 0.8676 0.2532 17
- 21.
Summary • Tree-LSTMs allowto encode tree topologies • Can be used to encode sentences parse trees • Can capture longer and more fine-grained words dependencies 18
- 22.
References Christopher Olah. Understanding lstmnetworks, 2015. Kai Sheng Tai, Richard Socher, and Christopher D Manning. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. 2015. 19