Abstract:The results of training a neural network are heavily dependent on the architecture chosen; and even a modification of only its size, however small, typically involves restarting the training process. In contrast to this, we begin training with a small architecture, only increase its capacity as necessary for the problem, and avoid interfering with previous optimization while doing so. We thereby introduce a natural gradient based approach which intuitively expands both the width and depth of a neural network when this is likely to substantially reduce the hypothetical converged training loss. We prove an upper bound on the ``rate'' at which neurons are added, and a computationally cheap lower bound on the expansion score. We illustrate the benefits of such Self-Expanding Neural Networks with full connectivity and convolutions in both classification and regression problems, including those where the appropriate architecture size is substantially uncertain a priori.
Submission history
From: Rupert Mitchell [view email]
[v1]
Mon, 10 Jul 2023 12:49:59 UTC (7,625 KB)
[v2]
Tue, 11 Jul 2023 14:47:54 UTC (7,625 KB)
[v3]
Fri, 9 Feb 2024 14:02:28 UTC (24,369 KB)