The ICLR Blog Track ·

7 min read Original article ↗

Deep Neural Nets: 33 years ago and 33 years from now (Invited Post)

26 Mar 2022 | historical computer-vision reproducibility Karpathy, Andrej

The Yann LeCun et al. (1989) paper Backpropagation Applied to Handwritten Zip Code Recognition is I believe of some historical significance because it is, to my knowledge, the earliest real-world application of a neural net trained end-to-end with backpropagation. Except for the tiny dataset (7291 16x16 grayscale images of digits) and the tiny neural network used (only 1,000 neurons), this paper reads remarkably modern today, 33 years later - it lays out a dataset, describes the neural net architecture, loss function, optimization, and reports the experimental classification error rates over training and test sets. It’s all very recognizable and type checks as a modern deep learning paper, except it is from 33 years ago. So I set out to reproduce the paper 1) for fun, but 2) to use the exercise as a case study on the nature of progress in deep learning.


A Deeper Look at Zero-Cost Proxies for Lightweight NAS

25 Mar 2022 | deep-learning automated-machine-learning architecture-search White, Colin; Khodak, Mikhail; Tu, Renbo; Shah, Shital; Bubeck, Sébastien; Dey, Debadeepta

Imagine you have a brand new dataset, and you are trying to find a neural network that achieves high validation accuracy on this dataset. You choose a neural network, but after 3 hours of training, you find that the validation accuracy is only 85%. After more choices of neural networks — and many GPU-hours — you finally find one that has an accuracy of 93%. Is there an even better neural network? And can this whole process become faster?


Normalization is dead, long live normalization!

25 Mar 2022 | normalization skip-connections residual-networks deep-learning Hoedt, Pieter-Jan; Hochreiter, Sepp; Klambauer, Günter

Since the advent of Batch Normalization (BN), almost every state-of-the-art (SOTA) method uses some form of normalization. After all, normalization generally speeds up learning and leads to models that generalize better than their unnormalized counterparts. This turns out to be especially useful when using some form of skip connections, which are prominent in Residual Networks (ResNets), for example. However, Brock et al. (2021a) suggest that SOTA performance can also be achieved using ResNets without normalization!


Understanding Few-Shot Multi-Task Representation Learning Theory

25 Mar 2022 | multi-task-learning few-shot-learning learning-theory Bouniot, Quentin; Redko, Ievgen

Multi-Task Representation Learning (MTR) is a popular paradigm to learn shared representations from multiple related tasks. It has demonstrated its efficiency for solving different problems, ranging from machine translation for natural language processing to object detection in computer vision. On the other hand, Few-Shot Learning is a recent problem that seeks to mimic the human capability to quickly learn how to solve a target task with little supervision. For this topic, researchers have turned to meta-learning that learns to learn a new task by training a model on a lot of small tasks. As meta-learning still suffers from a lack of theoretical understanding for its success in few-shot tasks, an intuitively appealing approach would be to bridge the gap between it and multi-task learning to better understand the former using the results established for the latter. In this post, we dive into a recent ICLR 2021 paper by S. Du, W. Hu, S. Kakade, J. Lee and Q. Lei, that demonstrated novel learning bounds for multi-task learning in the few-shot setting and go beyond it by establishing the connections that allow to better understand the inner workings of meta-learning algorithms as well.


Representation Change in Model-Agnostic Meta-Learning

25 Mar 2022 | meta-learning representation-learning domain-adaptation cross-domain Goerttler, Thomas (TU Berlin); Müller, Luis (TU Berlin); Obermayer, Klaus (TU Berlin)

Last year, an exciting adaptation of one of the most popular optimization-based meta-learning approaches, model-agnostic meta-learning (MAML) [Finn et al., 2017], was proposed in

   ▶  Jaehoon Oh, Hyungjun Yoo, ChangHwan Kim, Se-Young Yun (ICLR, 2021) BOIL: Towards Representation Change for Few-shot Learning

The authors adapt MAML by freezing the last layer to force body only inner learning (BOIL). Interestingly, this is complementary to ANIL (almost no inner loop) proposed in

   ▶  Aniruddh Raghu, Maithra Raghu, Samy Bengio, Oriol Vinyals (ICLR, 2020) Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML

Both papers attempt to understand the success of MAML and improve it. Oh et al. [2021] compare BOIL, ANIL, and MAML and show that both improve the performance of MAML. Albeit, BOIL outperforms ANIL, especially when the task distribution varies between training and testing.


The 37 Implementation Details of Proximal Policy Optimization

25 Mar 2022 | proximal-policy-optimization reproducibility reinforcement-learning implementation-details tutorial Huang, Shengyi; Dossa, Rousslan Fernand Julien; Raffin, Antonin; Kanervisto, Anssi; Wang, Weixun

Jon is a first-year master’s student who is interested in reinforcement learning (RL). In his eyes, RL seemed fascinating because he could use RL libraries such as Stable-Baselines3 (SB3) to train agents to play all kinds of games. He quickly recognized Proximal Policy Optimization (PPO) as a fast and versatile algorithm and wanted to implement PPO himself as a learning experience. Upon reading the paper, Jon thought to himself, “huh, this is pretty straightforward.” He then opened a code editor and started writing PPO. CartPole-v1 from Gym was his chosen simulation environment, and before long, Jon made PPO work with CartPole-v1. He had a great time and felt motivated to make his PPO work with more interesting environments, such as the Atari games and MuJoCo robotics tasks. “How cool would that be?” he thought.


Euclidean geometry meets graph, a geometric deep learning perspective

25 Mar 2022 | graphs geometric-deep-learning Wang, Zichen, Amazon Web Services; Shi, Yunzhi, Amazon Web Services; Chen, Xin, Amazon Web Services

Graph neural networks (GNN) have been an active area of machine learning research to tackle various problems in graph data. A graph is a powerful way of representing relationships among entities as nodes connected by edges. Sometimes nodes and edges can have spatial features, such as 3D coordinates of nodes and directions along edges. How do we reason over the topology of graphs while considering those geometric features? In this post, we discuss Learning from Protein Structure with Geometric Vector Perceptrons (Jing et al. 2021), published in ICLR 2021.


Symbolic Binding in Neural Networks through Factorized Memory Systems

25 Mar 2022 | symbolic memory binding Daigavane, Ameya; Khurana, Ansh; Bhardwaj, Shweta; Aggarwal, Gaurav

In this blog post, we describe the paper 'Emergent Symbols through Binding in External Memory', which introduces the Emergent Symbol Binding Network (ESBN), a recurrent neural network with an augmented factorized memory system, to solve several logical reasoning tasks. We describe the design choices of the ESBN in great detail to compare with other sequence models, and perform a qualitative validation of the generalization claims from the original paper.


Does Adam Converge and When?

25 Mar 2022 | adam optimization deep-learning Zhang, Yushun; Chen, Congliang; Luo, Zhi-Quan

In this blog post, we revisit the (non-)convergence behavior of Adam. Especially, we briefly review the non-convergence results by Reddi et al. [14] and the convergence results by Shi et al. [17]. Do this two results contradict to each other? If not, does the convergence analysis in Shi et al. [17] match the practical setting of Adam? How large is the gap between theory and practice? In this blog, we will discuss these questions from multiple different perspectives. We will show that the gap is actually non-negligible, and the discussion on the convergence of Adam is far from being concluded.


Recent Advances in Deep Learning for Routing Problems

25 Mar 2022 | graphs deep-learning combinatorial-optimization travelling-salesperson-problem Joshi, Chaitanya K.; Anand, Rishabh

TL;DR Developing neural network-driven solvers for combinatorial optimization problems such as the Travelling Salesperson Problem have seen a surge of academic interest recently. This blogpost presents a Neural Combinatorial Optimization pipeline that unifies several recently proposed model architectures and learning paradigms into one single framework. Through the lens of the pipeline, we analyze recent advances in deep learning for routing problems and provide new directions to stimulate future research towards practical impact.


PPLM Revisited: Steering and Beaming a Lumbering Mammoth to Control Text Generation

25 Mar 2022 | natural-language-generation reproducibility generalization Nguyen, Van Bach; Trienes, Jan; Nauta, Meike; Pathak, Shreyasi; Youssef, Paul; Imangaliyev, Sultan; Schlötterer, Jörg; Seifert, Christin

In this blogpost, we examine to which extent PPLM can control Language Models by investigating reproducibility, the impact of the prompt vs. BoW, effect of using weighted BoW and style control. Want a summary only? Check our TL;DR.