More Related Content
【LT資料】 Neural Network 素人なんだけど何とかご機嫌取りをしたい
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
(DL輪読)Matching Networks for One Shot Learning
What's hot
PyTorch, PixyzによるGenerative Query Networkの実装
Deep Learning for Chatbot (2/4)
【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...
Self-Attention with Linear Complexity
PFI Seminar 2012/03/15 カーネルとハッシュの機械学習
【論文紹介】How Powerful are Graph Neural Networks?
LSTM (Long short-term memory) 概要
TensorFlow XLAは、 中で何をやっているのか?
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
Efficient Neural Architecture Search via Parameters Sharing @ ICML2018読み会
深層強化学習の分散化・RNN利用の動向〜R2D2の紹介をもとに〜
Trip down the GPU lane with Machine Learning
[DL輪読会] MoCoGAN: Decomposing Motion and Content for Video Generation
生成系ニューラルネットワークまとめ Summary of Generative Neural Network
【DL輪読会】Aspect-based Analysis of Advertising Appeals for Search Engine Advert...
Similar to Introduction to Tree-LSTMs
Long Short Term Memory LSTM
Sequence learning and modern RNNs
Long Short Term Memory (Neural Networks)
DEEP LEARNING -Recurrent Neural Networks
Long and short term memory presesntation
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Natural Language Processing Advancements By Deep Learning: A Survey
NLP bài tập lớn sentiment analysis slide
introduction to machine learning for students.pptx
lstmhh hjhj uhujikj iijiijijiojijijijijiji
LSTM and its Applications
DL for sentence classification project Write-up
Survey on Text Prediction Techniques
Long short term memory on tensorflow using python
IRJET- Survey on Generating Suggestions for Erroneous Part in a Sentence
NextWordPrediction_ppt[1].pptx
Neural Architectures for Named Entity Recognition
Recently uploaded
Introduction to cybersecurity and pentesting
THREE PATHS TO PERSITENT AWS COMPROMISE A pentester's playbook.pptx
How to Evaluate a High Performance Database
communication-skills-with-technology tools
Core Components of Internet of Things (IoT)
100 Insights After the 200th Issue of NewMind AI Journal
Why Most GenAI Projects Fail to Scale and How to Become One of the Success St...
The Job Market in 2026: Product Management Is Dead. Long Live Product Building.
Beyond the Grid: Crafting a Custom Android Launcher from Scratch
Lab 4.2 Multi-cloud Deployments - 2nd Sight Lab Cloud Security Class
2DArrays_Matrix data structure and algorithm
Lab 5.4 Cloud Incident - 2nd Sight Lab Cloud Security Class
Emancipatory Information Retrieval: Radically Reorienting Information Retriev...
UiPath Autonomous Agents | Building and Orchestrating Agents End-to-End
operationa research chapter 3 PowerPoint
cold plasma technology for Waste water treatment.pptx
Lab 3.4 S3 + Encryption - 2nd Sight Lab Cloud Security Class
Lab 2.2 DNS In The Cloud - 2nd Sight Lab Cloud Security Class
SharePoint to SharePoint Online Migration Tool
Ai Chatbot development: Guide for Modern Businesses
Introduction to Tree-LSTMs
- 1.
Improved Semantic RepresentationsFrom Tree-Structured Long Short-Term Memory Networks by Kai Sheng Tai, Richard Socher, Christopher D. Manning Daniel Perez tuvistavie CTO @ Claude Tech M2 @ The University of Tokyo October 2, 2017
- 2.
Distributed representation ofwords Idea Encode each word using a vector in Rd , such that words with similar meanings are close in the vector space. 2
- 3.
- 4.
- 5.
Basic RNN cell Ina plain RNN, ht is computed as follow ht = tanh(Wxt + Uht−1 + b) given, g(xt, ht−1) = Wxt + Uht−1 + b, 5
- 6.
Basic RNN cell Ina plain RNN, ht is computed as follow ht = tanh(Wxt + Uht−1 + b) given, g(xt, ht−1) = Wxt + Uht−1 + b, Issue Because of vanishing gradients, gradients do not propagate well through the network: impossible to learn long-term dependencies 5
- 7.
Long short-term memory(LSTM) Goal Improve RNN architecture to learn long term dependencies Main ideas • Add a memory cell which does not suffer vanishing gradient • Use gating to control how information propagates 6
- 8.
- 9.
- 10.
- 11.
- 12.
Tree-structured LSTMs Goal Improve encodingof sentences by using their structures Models • Child-sum tree LSTM Sums over all the children of a node: can be used for any number of children • N-ary tree LSTM Use different parameters for each node: better granularity, but maximum number of children per node must be fixed 9
- 13.
Child-sum tree LSTM Childrenoutputs and memory cells are summed Child-sum tree LSTM at node j with children k1 and k2 10
- 14.
Child-sum tree LSTM Properties •Does not take into account children order • Works with variable number of children • Shares gates weight (including forget gate) between children Application Dependency Tree-LSTM: number of dependents is variable 11
- 15.
N-ary tree LSTM Giveng (n) k (xt, hl1 , · · · , hlN ) = W (n)xt + N l=1 U (n) kl hjl + b(n) Binary tree LSTM at node j with children k1 and k2 12
- 16.
N-ary tree LSTM Properties •Each node must have at most N children • Fine-grained control on how information propagates • Forget gate can be parameterized so that siblings affect each other Application Constituency Tree-LSTM: using a binary tree LSTM 13
- 17.
Sentiment classification Task Predict sentimentˆyj of node j Sub-tasks • Binary classification • Fine-grained classification over 5 classes Method • Annotation at node level • Uses negative log-likelihood error ˆpθ(y|{x}j ) = softmax W (s) hj + b(s) ˆyj = arg max y ˆpθ(y|{x}j ) 14
- 18.
Sentiment classification results ConstituencyTree-LSTM performs best on fine-grained sub-task Method Fine-grained Binary CNN-multichannel 47.4 88.1 LSTM 46.4 84.9 Bidirectional LSTM 49.1 87.5 2-layer Bidirectional LSTM 48.5 87.2 Dependency Tree-LSTM 48.4 85.7 Constituency Tree-LSTM - randomly initialized vectors 43.9 82.0 - Glove vectors, fixed 49.7 87.5 - Glove vectors, tuned 51.0 88.0 15
- 19.
Semantic relatedness Task Predict similarityscore in [1, K] between two sentences Method Similarity between sentences L and R annotated with score ∈ [1, 5] • Produce representations hL and hR • Compute distance h+ and angle h× between hL and hR • Compute score using fully connected NN hs = σ W (×) h× + W (+) h+ + b(h) ˆpθ = softmax W (p) hs + b(p) ˆy = rT ˆpθ r = [1, 2, 3, 4, 5] • Error is computed using KL-divergence 16
- 20.
Semantic relatedness results DependencyTree-LSTM performs best for all measures Method Pearson’s r MSE LSTM 0.8528 0.2831 Bidirectional LSTM 0.8567 0.2736 2-layer Bidirectional LSTM 0.8558 0.2762 Constituency Tree-LSTM 0.8582 0.2734 Dependency Tree-LSTM 0.8676 0.2532 17
- 21.
Summary • Tree-LSTMs allowto encode tree topologies • Can be used to encode sentences parse trees • Can capture longer and more fine-grained words dependencies 18
- 22.
References Christopher Olah. Understanding lstmnetworks, 2015. Kai Sheng Tai, Richard Socher, and Christopher D Manning. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. 2015. 19