Main
Neurons receive synaptic inputs from tens of other neurons in the roundworm Caenorhabditis elegans1, hundreds to thousands in the fruit fly2 and thousands to tens of thousands in mice, monkeys and humans3. As the number of inputs grows, the space of possible computations explodes exponentially. To tame this complexity, simplified models have long assumed that the output of a neuron depends directly on individual inputs, without interactions between inputs4,5,6,7,8,9. While this picture forms the foundation for our understanding of neural computation, both in the brain and artificial networks, it remains unclear whether direct dependencies alone can explain the activity of real neurons.
Across different species and neural systems, it is now possible to make long, stable recordings of large contiguous populations of neurons10,11,12,13,14. This means that, for each neuron, we may have access to its output and all of its inputs simultaneously. But how can we determine whether the activity of each neuron arises from simple dependencies on individual inputs or, instead, requires complex interactions between inputs? To answer this question, we need a framework to quantify the minimal consequences of direct dependencies.
Minimal consequences of direct dependencies
Neurons in the brain receive inputs at their dendrites and then execute a binary function: they either remain silent (y = 0) or generate a discrete impulse (y = 1) known as an action potential or spike15,16 (Fig. 1a). This mapping from inputs to output involves intricate details of membrane potential dynamics and cell morphology, which vary between neurons, brain regions and species17,18,19. Yet, for every neuron, these details culminate in a table of firing probabilities P(y = 1∣x1, …, xn), which define the function that the cell performs on its n inputs x1, …, xn. This description, while general enough to capture every neuron, is also hopelessly complex; it requires specifying a different firing probability for each combination of inputs, a number that grows exponentially with n (Fig. 1b). To understand the functions performed by real neurons, we thus need simplifying hypotheses.
a, Activity (dots) of an output neuron y and n inputs x1, …, xn. Within a window of width Δt, each neuron binarizes into active (y = 1) or silent (y = 0)15,16. To study activity on the fastest available timescale, Δt is defined by the sampling interval of a given experiment. b, Top: the full input–output dependence is defined by the probability of activity in response to each of the 2n combinations of inputs. Bottom: the simplest dependencies reflect the direct responses to each input individually, the number of which grows linearly with n. c, The minimal model, which has maximum entropy consistent with these direct dependencies20,21, is equivalent to a logistic artificial neuron7. d, Hierarchy of entropies, where the difference Stot − Sdir quantifies the amount of variability captured by direct dependencies, Sdir − Strue the variability due to higher-order and time-delayed dependencies, and Strue the latent variability that cannot be explained by the inputs33. e, Logical function with error probability ϵ and binary inputs that are drawn independently at random. f,g, Hierarchy of entropies versus error rate for the AND and OR functions (f) and the XOR function (g). For AND and OR (f), the functions are almost exactly captured by direct dependencies (Sdir ≈ Strue), while for XOR (g), direct dependencies explain none of the output variability (Sdir = Stot).
The simplest relationships between the output y and inputs xi are contained in the direct dependencies P(y∣xi); these capture everything about the activity that does not involve interactions between inputs. But how can we tell whether these simple dependencies are enough to describe a real neuron? Even if we measure all the direct dependencies P(y∣x1), …, P(y∣xn), there are still an infinite number of possible functions P(y∣x1, …, xn) consistent with these constraints. Our problem, therefore, is to find the model that matches these simple dependencies, but is maximally random with regard to higher-order dependencies involving two, three or more inputs. We show that this minimal model is the one with maximum entropy consistent with the average activity of y and its correlations with the inputs xi (Methods)20,21. This maximum entropy model is known to take the form
$$P(y=1| {x}_{1},\ldots ,{x}_{n})=\sigma \left(b+\mathop{\sum }\limits_{i=1}^{n}{w}_{i}{x}_{i}\right),$$
(1)
where σ(⋅) is the logistic function22,23. The parameters b and wi must be computed so that the model matches the measured direct dependencies, and we provide an algorithm that converges efficiently, even for large n (Methods).
Logistic models (and other generalized linear models) have been used extensively to study the statistical dependencies of neural activity on other neurons as well as latent variables such as stimuli, behaviour and arousal24,25,26,27. In fact, these models have been shown to capture key phenomenological features of neuronal spiking and firing rates28,29,30,31. The maximum entropy principle adds to this context a concrete mathematical connection between biological and artificial neurons: a real neuron with purely direct dependencies is equivalent to an artificial neuron with bias b, linear weights wi and logistic activation function7 (Fig. 1c). Consequently, all other models—for example, with different activation functions, dependencies on latent variables, temporal dependencies or regularized parameters—must involve sources of order beyond direct dependencies.
We are now prepared to study the amount of variability captured by direct dependencies. The total variability of a neuron is quantified by the entropy Stot of its output with no knowledge of the inputs32. By contrast, the true entropy of a neuron Strue quantifies the latent variability in its activity that cannot be explained by the inputs21 (Methods). With knowledge of only the direct dependencies, the entropy Sdir of the minimal model in equation (1) sits between these two extremes (Methods), yielding a hierarchy Stot ≥ Sdir ≥ Strue ≥ 0. In this way, the difference Stot − Sdir defines the amount of variability captured only by direct dependencies. The remaining variability Sdir can arise from any other dependence on the inputs (including higher-order and time-delayed dependencies) or latent variables that are not included in the inputs33 (Fig. 1d). Importantly, if the direct entropy Sdir becomes small, then so too does the true entropy Strue. In this limit, the model (and, thus, the neuron itself) becomes equivalent to a McCulloch–Pitts neuron4, or a perceptron34, and all the variability is explained by direct dependencies (Methods).
To gain intuition, consider an output y that performs a logical function on two binary inputs x1 and x2 with error rate ϵ (Fig. 1e). For AND and OR gates, as ϵ increases, the output becomes more stochastic, leading to higher variability (Fig. 1f). Across all error rates ϵ, we find that the direct entropy Sdir lies close to the true entropy Strue, such that the model provides a tight approximation to the true computation. As errors vanish and the output becomes deterministic (ϵ = 0), the minimal model becomes exact (Sdir = Strue) and all the variability in y is explained by direct dependencies (Sdir = 0). This reflects the fact that AND and OR are linearly separable functions and thus are exactly described by a perceptron, which (as discussed above) is defined purely by direct dependencies35. For comparison, consider the XOR function, the classic example of a higher-order dependence that relies irreducibly on the specific combination of inputs x1 and x2 (Fig. 1g). We find that direct dependencies provide no information about the function (Sdir = Stot), such that all the variability arises from either higher-order dependencies or latent stochasticity. Together, these results demonstrate how stochastic functions can be decomposed into their constituent parts.
Identifying optimal inputs
To study real neurons, we must first specify their inputs. In large-scale recordings, while synaptic connectivity is rarely known, we can infer the optimal inputs that provide the best description of a neuron’s output. In a population of N neurons, for a given output neuron, we would like to select the n < N inputs that, once included in the maximum entropy model (equation (1)), reduce our uncertainty about the output Sdir as much as possible (Methods). This minimax entropy problem is generally intractable36,37,38; however, we provide an efficient algorithm that greedily identifies the locally optimal inputs at each step (Methods). The result is a set of n inputs that, by minimizing the model uncertainty Sdir, also maximize the amount of variability captured by direct dependencies Stot − Sdir.
Consider a large population of neurons in the mouse hippocampus11 (Methods). These cells play key roles in encoding the animal’s location, mapping features in its environment and storing memories of past events39,40,41,42; yet, it remains unclear whether these functions arise from simple input–output dependencies. For a given output neuron, we infer the optimal n inputs and the corresponding maximum entropy model (Fig. 2a). We consider only inputs that co-activate with the output at least once during a recording, which guarantees that the direct dependencies are well defined (Methods). As the number of inputs increases, the minimal model quickly becomes expressive, making increasingly accurate predictions for the output activity (Fig. 2b). In fact, despite being maximally random with respect to interactions between inputs and time-delayed dependencies, the direct entropy Sdir drops exponentially with the number of inputs (Fig. 2c). This means that, with only a relatively small number of inputs, direct dependencies capture an exponentially large amount of variability Stot − Sdir.
a, Population of N = 1,485 neurons in the mouse hippocampus recorded as the animal runs along a virtual track11,42 (Methods). For a randomly selected output neuron (circle), we illustrate the n = 100 optimal input neurons (blue) and an equal number of random inputs (red). b, Within a randomly selected five-minute window, we plot the activity of the output neuron (top) and activity probabilities predicted by the maximum entropy model (equation (1)) for increasing numbers of optimal inputs (left) and random inputs (right). c, Direct entropy Sdir versus the number of inputs n for optimal inputs (blue) and random inputs (red). d, Co-activity rates between the output and all other neurons versus those predicted by the independent model with no inputs. e, With n* = 350 optimal inputs, the model correctly predicts all remaining co-activity rates, and thus all direct dependencies on other neurons in the population (Methods). f, With the same number of random inputs, the model fails to predict many correlations. In d–f, lines indicate equality, and shaded regions indicate experimental errors (two standard deviations).
By contrast, with random inputs, the maximum entropy model fails dramatically, explaining effectively none of the variability until almost all the inputs are included (Fig. 2c). This suggests that, by including all possible inputs, we risk overfitting the output, thus leading to an artificially low entropy Sdir. To avoid overfitting, we introduce the following regularization: we select the minimal number of inputs n* needed to predict all other direct dependencies in the population (Methods; Supplementary Fig. 1). This ensures that we do not fit any input–output dependencies that the model already predicts. The result is a combined framework for inferring the minimal model of direct dependencies (equation (1)) with the minimal set of inputs.
For the neuron in Fig. 2, among its correlations with other neurons, 66% are significant, meaning that 34% can be predicted with no inputs at all (Fig. 2d). We identify a minimal set of n* = 350 inputs that are sufficient to predict all of the direct dependencies on other neurons (Fig. 2e). With only n*/(N − 1) = 24% of the possible inputs, these direct dependencies alone explain (Stot − Sdir)/Stot = 89% of the neuron’s variability. For comparison, with the same number of randomly selected inputs, many of the correlations with other neurons remain unexplained (Fig. 2f). These findings establish that the vast majority of variability in activity, at least for one cell, is captured by a relatively small number of direct dependencies.
Direct dependencies across systems and species
We repeat this calculation for many neurons spanning different brain regions and species. In all cases, we study large recordings of spatially contiguous populations with recurrent connectivity, such that each neuron may receive synaptic inputs from the others. Across the N = 1,485 neurons in the hippocampal recording11,42 (Fig. 2a), we confirm that the direct entropy Sdir drops exponentially with the number of inputs (Fig. 3a). This decrease is so sharp that, for the median neuron, the first input explains 17% of the variability, and only 15 inputs are needed to explain 50% of the variability. With n* = 214 inputs (only 14% of the population), the maximum entropy model correctly predicts the direct dependencies on all other neurons. These ‘complete’ models, which capture all of a neuron’s direct dependencies, explain over 90% of a neuron’s entropy Stot. This leaves less than 10% of the variability for higher-order dependencies, time-delayed dependencies and latent stochasticity combined.
a, Direct entropy Sdir normalized by total entropy Stot for n inputs chosen optimally (blue) or randomly (red) across all N = 1,485 hippocampal neurons in Fig. 2 (refs. 11,42). The dashed line indicates the minimal number of inputs n* needed to capture all the direct dependencies for the median neuron; note that this number varies between neurons. b–d, Normalized model entropy Sdir/Stot versus number of inputs n for 100 random output neurons within a population of N = 11,445 cells in the mouse visual cortex during responses to natural images (nat. im.) (b) and spontaneous (spont.) activity (c)10, and for N = 128 neurons in the brain of C. elegans (d)12. See Methods for experimental details. In a–d, lines and shaded regions represent medians and interquartile ranges across output neurons. e,f, For the complete models in a–d, we compare the minimal number of inputs n* needed to capture all direct dependencies (e) and the normalized entropies Sdir/Stot (f). Points represent individual output neurons.
In an even larger population of N > 104 cells in the mouse visual cortex10, for each neuron, one might expect that more inputs are needed to capture all of the direct dependencies. However, when responding to natural images, the median neuron requires only n* = 108 inputs (less than 1% of the entire population) to predict the remaining 99% of its direct dependencies (Fig. 3b). Moreover, just as in the hippocampus, these complete models explain 91% of each neuron’s variability. For spontaneous activity in the same population, we observe nearly identical results (Fig. 3c). Thus, across the hippocampus and visual cortex, neurons are consistently described by input–output functions that (1) involve purely direct dependencies and (2) are nearly deterministic—in other words, as perceptrons.
Finally, in the roundworm C. elegans, one can record from the entire brain. This means that, for each cell, we have access to nearly all its synaptic and extrasynaptic inputs12,43. For the median neuron, a minimal set of n* = 5 inputs is sufficient to predict all other direct dependencies in the brain. These remarkably simple computations explain 62% of each neuron’s variability Stot (Fig. 3d). Together, these findings (summarized in Fig. 3e,f) comprise our main result: that neuronal activity, spanning multiple systems and species, is explained by simple direct dependencies on only a small number of inputs. We confirm that these results hold for the average neuron (rather than median), are robust to downsampling the neural activity and remain consistent across time (Supplementary Figs. 2–4).
Higher-order and time-delayed dependencies
Complex functions require networks of artificial neurons7,34,35. In real neurons, however, interactions between dendrites and extrasynaptic signals can lead to higher-order dependencies that are responsible for gating and linearly non-separable functions such as XOR44,45,46,47,48,49,50,51,52 (Fig. 1g). Similarly, neurons can integrate inputs over time to execute important temporal computations and produce complex dynamics53,54,55,56. However, the above results suggest that, with knowledge of only the direct, equal-time dependencies on individual inputs, one should be able to predict the higher-order dependencies on combinations of inputs as well as the time-delayed dependencies on past inputs. If true, this would paint a surprisingly simple picture in which higher-order and time-delayed dependencies arise naturally from direct, instantaneous dependencies.
To explain the kth-order dependence P(y∣x1, …, xk), it is sufficient to predict the correlations between the output y and all subsets of the k inputs (Methods). For each second-order dependence P(y∣xi, xj), because our complete models capture all of the direct dependencies (by either fitting or prediction), all that remains is the triplet correlation between y, xi and xj. In the hippocampus, the complete models predict 99.85% of the triplet correlations (within experimental errors), leaving only 0.15% of the second-order dependencies unexplained by simpler direct dependencies (Fig. 4a). For comparison, with the same numbers of random inputs, direct dependencies fail to explain many of the second-order dependencies (Fig. 4a, inset). Returning to optimal inputs, the accuracy of the complete models increases as we study dependencies of even higher order. Direct dependencies fail to predict only 0.11% of the quadruplet correlations (Fig. 4b), and this fraction drops to 0.08% for quintuplet correlations (Fig. 4c). These results in the mouse hippocampus are recapitulated in the mouse cortex and C. elegans (Fig. 4d–g). We therefore find that the vast majority of higher-order dependencies can be understood as arising from simple direct dependencies, without relying on interactions between inputs.
a, For the hippocampal population11,42, we plot triplet co-activity rates predicted by the complete models versus those measured in data. The inset shows the co-activity rates predicted by maximum entropy models with the same numbers of random inputs. b,c, Quadruplet (b) and quintuplet (c) co-activity rates predicted by models with optimal inputs and random inputs (inset). In a–c, for each output neuron, we consider 100 correlations with randomly selected groups of neurons. Lines indicate equality, shaded regions indicate experimental errors (two standard deviations), dark points are correct (within errors) and light points are incorrect. d–g, Fractions of correlations not predicted by direct dependencies in the mouse hippocampus (d)11, the mouse visual cortex during responses to natural images (e) and spontaneous activity (f)10, and the brain of C. elegans (g)12. See Supplementary Fig. 5 for details. h–j, Comparison of time-delayed co-activity rates predicted by the complete models versus measured in data for delays of 0.1 s (h), 1 s (i) and 10 s (j) in the mouse hippocampus. Insets show predictions of maximum entropy models with the same numbers of random inputs. Lines indicate equality, shaded regions indicate experimental errors (two standard deviations), dark points are correct (within errors) and light points are incorrect. k–n, Fractions of time-delayed correlations not predicted by direct dependencies in the hippocampus (k)11, the visual cortex responding to natural images (l) and during spontaneous activity (m), and C. elegans (n).
Thus far, we have focused on the instantaneous dependencies between neurons within the same window of time (Fig. 1a). Yet, the activity of each neuron y at time t may depend on the states of other neurons xi at previous times \({t}^{{\prime} } < t\). To predict these time-delayed dependencies \(P(y(t)| {x}_{i}({t}^{{\prime} }))\) using our complete models, it is sufficient to predict the time-delayed correlations between y and xi (Methods). In the hippocampal recording, time is discretized into windows of length Δt = 0.03 s (refs. 11,42). Despite being maximally random with regard to correlations longer than Δt, the complete models still predict 99.6% of the correlations with time delay 0.1 s (Fig. 4h), 99.1% with delay 1 s (Fig. 4i) and 99.0% with delay 10 s (Fig. 4j). We observe similarly high accuracy in the visual cortex and C. elegans (Fig. 4k–n). By contrast, with random inputs, direct dependencies fail to predict orders of magnitude more of the time-delayed correlations (Fig. 4k–n). Thus, we find that most time-delayed dependencies are explained by simple instantaneous dependencies, with no information about the neural dynamics.
Inferred neural network and robustness
Given their equivalence with logistic artificial neurons, we can study the structure of the inferred maximum entropy models in the context of neural computation (Fig. 1c). In the hippocampal population, all cells have negative biases b, leading them to favour silence over activity (Fig. 5a). Meanwhile, because the weights wi induce correlations between model neurons (Fig. 5b), it is tempting to interpret them as synaptic connection strengths. Fundamentally, however, these weights are defined to match the direct dependencies P(y∣xi) measured in data, which may arise from extrasynaptic signals or shared dependencies on latent variables (such as unobserved neurons)11,12,42,43,50,57. Yet, despite profound difficulties in deriving connectivity from activity58, we find that the inferred weights wi exhibit four key features of synaptic connections. First, as discussed above, the weights are sparse, with only a small number of inputs n* needed to explain all of a neuron’s direct dependencies (Fig. 3e). Second, the distribution of magnitudes is heavy-tailed (specifically log-normal), with some rare weights that are orders of magnitude stronger than average (Fig. 5c). Third, the weights are evenly split between positive and negative, suggesting a delicate balance between excitatory and inhibitory interactions (Fig. 5c). Finally, unlike most existing maximum entropy models16,36,42, the weights are highly directed, with the weight from neuron i to neuron j differing substantially from its inverse. These sparse, heavy-tailed, balanced and directed weights are universal features of synaptic connectivity observed across brain regions and species2,3,59,60,61,62.
a, Firing rate versus inferred bias b for each neuron in the hippocampal population and distribution of inferred biases (inset)11. b, Probability density of the correlation coefficient and corresponding input weight wi over all input–output pairs. c, Distribution of inferred weights wi over all input–output pairs in the hippocampal population. d, Direct information Idir versus number of inputs n* across all neurons; the dashed line indicates linear fit. e, Information Idir versus total entropy Stot across all neurons; the dashed line indicates linear fit. f, Illustration of robustness analysis. We remove (or ablate) neurons from the population by marginalizing over their states and study the predicted activity for the remaining neurons (Methods). g, Information Idir as a function of the fraction of inputs remaining for each neuron; the dashed line illustrates the value for the full models with all inputs. h, Prediction error for the complete models with different fractions of the inputs removed; the dashed lines indicate the values for the original models with all inputs (bottom) and independent models with no inputs (top). In g and h, values are averaged over neurons and 100 repeats of simulated ablations (Methods). See Supplementary Fig. 6 for analyses of the mouse visual cortex and C. elegans.
The connections between neurons enable the flow of information; for each neuron, the mutual information between inputs and output is equal to the drop in entropy Itrue = Stot − Strue (refs. 21,32). While this information is impossible to estimate directly from data, the maximum entropy models provide a tractable lower bound, equal to the amount of variability explained by direct dependencies Idir = Stot − Sdir ≤ Itrue (Methods). Moreover, because direct dependencies capture nearly all of the variability in activity (Fig. 3f), we know that this lower bound is tight, with 0.9Itrue ≲ Idir ≤ Itrue for the mouse hippocampus and visual cortex. Across neurons, we find that this direct information increases linearly with the number of inputs n*, with each input communicating 0.01 bits s−1 to the output on average (Fig. 5d).
By symmetry, Idir also defines the amount of information that each neuron encodes about the rest of the population. For each bit generated by a neuron, we find that a consistent 0.87 bits encode information about its inputs (Fig. 5e). This large proportion of information concentrated on a small number of inputs indicates a highly redundant neural code. As in a Hopfield network, specifying the states of a small number of cells should be sufficient to predict the rest5,9,16. To test this hypothesis, we can artificially remove, or ablate, some cells within a population by marginalizing over their states63 (Methods). For each of the remaining neurons, we then investigate the impact on the complete model (Fig. 5f). As each neuron loses more of its inputs, the flow of information from inputs to output undergoes a sharp transition (Fig. 5g). Above this transition, neurons can lose nearly 90% of their inputs without impacting the flow of information, while below the transition, almost no information is communicated. Similarly, we can remove most of the inputs to a neuron before our model fails to accurately predict its activity (Fig. 5h). These findings demonstrate that the inferred neural network is strikingly robust, with each neuron maintaining nearly the same output activity even after losing the vast majority of its inputs.
Discussion
Despite intricate morphologies and biophysical dynamics17,18,19,44,45,46,47,48,49,51,52, neurons have long been studied using models of simple dependencies4,5,6,7,9,34. Here, we develop a framework to study whether neuronal activity arises from the simplest possible dependencies: those that capture the responses to individual inputs, but contain no information about interactions between inputs. Across the mouse hippocampus and visual cortex10,11,42, these direct dependencies explain over 90% of the variability in neuronal activity (Fig. 3), leaving only 10% for interactions between inputs, time-delayed dependencies and latent variables (Fig. 1). Moreover, the inferred models—which are equivalent to artificial neurons—predict the higher-order dependencies on combinations of inputs and the time-delayed dependencies on past inputs (Fig. 4) and recover salient features of synaptic connectivity (Fig. 5).
These results raise future questions about the nature of dependencies between neurons. As experiments advance to record from larger populations across species, neural systems and imaging modalities10,11,12,13,14,64, does neuronal activity consistently arise from direct dependencies? Of particular interest are electrophysiological recordings, which have sufficient temporal resolution to resolve individual spikes65,66 (Supplementary Fig. 7). However, current large-scale recordings (for example, using Neuropixels64,67) probe spatially elongated or discontiguous populations, potentially limiting the study of direct dependencies between neurons. In addition, while our results suggest that most time-delayed dependencies are explained by instantaneous dependencies (Fig. 4), one can immediately generalize our framework to include those that are not24 (Supplementary Fig. 8). What do these significant time-delayed dependencies reveal about neural dynamics? Finally, as discussed above, the inferred neural network reflects not only causal interactions, but also functional correlations due to latent variables11,12,42,43,50,57,58. If the underlying population is defined by an Ising model—equivalent to a stochastic Hopfield network5 or Boltzmann machine68—we show that the inferred weights recover the true underlying interactions (Supplementary Information). As experiments mapping the wiring between neurons continue to advance1,2,3,69,70, how does the inferred functional connectivity relate to underlying synaptic connectivity? The framework presented here provides the tools to begin answering these questions.
Methods
Maximum entropy model
Consider a binary output y ∈ {0, 1} and a set of n binary inputs x = {x1, …, xn} ∈ {0, 1}n. From experiments, we have L samples of activity y(ℓ) and x(ℓ), where ℓ = 1, …, L. From these data, we can estimate the direct dependencies P(y∣xi) for all inputs i = 1, …, n. We want to derive the model P(y∣x) that is consistent with these direct dependencies and has maximum entropy
$$S(P)=-{\left\langle \mathop{\sum }\limits_{y}P(y| {\bf{x}})\log P(y| {\bf{x}})\right\rangle }_{{\bf{x}}},$$
(2)
where \({\langle f({\bf{x}})\rangle }_{{\bf{x}}}=\frac{1}{L}{\sum }_{\ell }f({\bf{x}}(\ell ))\) denotes an empirical average over the inputs, and (unless otherwise specified) we use log base two such that entropy is measured in bits20,21. Each direct dependence P(y∣xi) is uniquely defined by the averages \(\langle y\rangle =\frac{1}{L}{\sum }_{\ell }y(\ell )\) and \(\langle {x}_{i}\rangle =\frac{1}{L}{\sum }_{\ell }{x}_{i}(\ell )\) and the pairwise correlation \(\langle y{x}_{i}\rangle =\frac{1}{L}{\sum }_{\ell }y(\ell ){x}_{i}(\ell )\). For every model P(y∣x), we have \({\langle {x}_{i}\rangle }_{P}=\langle {x}_{i}\rangle\), where \({\langle f(y,{\bf{x}})\rangle }_{P}={\langle {\sum }_{y}\,\,f(y,{\bf{x}})P(y| {\bf{x}})\rangle }_{{\bf{x}}}\) denotes a model average. Thus, one only needs to constrain the average firing rate 〈y〉 and the correlations 〈yxi〉 for all inputs. This maximum entropy model is known to take the logistic form
$$P(y| {\bf{x}})=\frac{1}{Z({\bf{x}})}{{\rm{e}}}^{y\left(b+\mathop{\sum }\limits_{i}{w}_{i}{x}_{i}\right)},$$
(3)
where \(Z({\bf{x}})=1+{{\rm{e}}}^{b+{\sum }_{i}{w}_{i}{x}_{i}}\) ensures normalization22,23. The bias b and weights wi are Lagrange multipliers that force the constraints 〈y〉P = 〈y〉 and \({\langle y{x}_{i}\rangle }_{P}=\langle y{x}_{i}\rangle\). As discussed above, these maximum entropy models are equivalent to logistic models, which are a special case of generalized linear models that have provided key insights into neural dynamics24,25,26,27,28,29,30,31.
Computing model parameters
Even with the functional form for the model, one must still compute the bias b and weights wi so that the model matches the experimental average 〈y〉 and correlations 〈yxi〉 for all inputs i. To do so, we minimize the Kullback–Leibler (KL) divergence DKL(Q∣∣P) between the model and the empirical distribution Q(y∣x), which is equivalent to maximum likelihood estimation21. Specifically, we perform gradient descent in the KL divergence, with gradients given by ∇bDKL(Q∣∣P) = 〈y〉P − 〈y〉 and \({\nabla }_{{w}_{i}}{D}_{{\rm{KL}}}(Q| | P)={\langle y{x}_{i}\rangle }_{P}-\langle y{x}_{i}\rangle\). This algorithm converges efficiently, even for very large n.
Information in direct dependencies
The true entropy of the neuron Strue defines the latent variability that cannot be explained by dependencies on the inputs; however, unless the number of inputs n is small, Strue cannot be estimated directly from data. The total variability of the neuron with no knowledge of the inputs is defined by the entropy Stot of the marginal P(y) (ref. 32). Between these two extremes, with knowledge of only the direct dependencies P(y∣xi), the entropy (in nats) of the maximum entropy model (equation (3)) is given by
$${S}_{\mathrm{dir}}={\langle \log Z({\bf{x}})\rangle }_{{\bf{x}}}-b\langle y\rangle -\mathop{\sum }\limits_{i}{w}_{i}\langle y{x}_{i}\rangle .$$
(4)
These entropies form a hierarchy Stot ≥ Sdir ≥ Strue ≥ 0. The difference Itrue = Stot − Strue is the true mutual information between the inputs and the output21. The difference Idir = Stot − Sdir, which lower-bounds Itrue, is the mutual information between inputs and output in the model (equation (1)). Finally, due to the maximum entropy form of the model, the KL divergence with the true firing probabilities Ptrue(y∣x) also simplifies to a difference in entropies DKL(Ptrue∣∣P) = Sdir − Strue (ref. 21). Thus, in the limit that the model entropy Sdir becomes small, we know that DKL(Ptrue∣∣P) also becomes small, and the model is exact. Moreover, in this limit, the output becomes a deterministic function of the inputs \(P(y=1| {\bf{x}})=\varTheta \left(b+{\sum }_{i}{w}_{i}{x}_{i}\right)\), where Θ(⋅) is the step function. Together, these observations reveal that, if Sdir is small, then we have Sdir ≈ Strue ≈ 0, and the neuron itself becomes equivalent to a McCulloch–Pitts neuron or perceptron4,7,34.
Optimal inputs
For a given output neuron, we seek the n inputs that produce the most accurate model (equation (1)). As discussed above, the KL divergence between the model and the true firing probabilities reduces to a difference in entropies DKL(Ptrue∣∣P) = Sdir − Strue. Thus, the optimal inputs, which give the most accurate predictions for the output, are the ones that produce the maximum entropy model P(y∣x) with minimum entropy Sdir. This is an instance of the minimax entropy principle, which provides a general strategy for selecting optimal constraints in maximum entropy models36,37,38.
Greedy algorithm
Searching for the optimal n inputs among the N − 1 possibilities is generally infeasible. Instead, we propose a greedy algorithm for growing a locally optimal set of inputs. We begin with the independent model P(y), which has no inputs. We then fit a different model P(y∣xi) for each of the N − 1 possible inputs; among these, the optimal input is the one that produces the model with minimum entropy Sdir. Repeating this process, we greedily select the optimal input (which minimizes the entropy Sdir) at each step until we reach the desired number of inputs n.
Approximate change in entropy
The above algorithm involves fitting O(nN) separate models: one for each of the O(N) possible new inputs during each of the n steps. To improve efficiency, rather than fitting a different model for each possible input, we can approximate the drop in entropy ΔSdir analytically. Using perturbation theory, for a candidate neuron i we expand the change in entropy in the limit of small prediction errors \(\langle y{x}_{i}\rangle -{\langle y{x}_{i}\rangle }_{P}\), yielding an analytic approximation for ΔSdir (Supplementary Information). Using this approximation to select the optimal input at each step, the greedy algorithm only requires fitting O(n) models.
Neural data
Our framework can be used to investigate any binarized recordings of neuronal activity across different neural systems and species. Because we are interested in understanding the mapping from inputs to output, we focus on large recordings of spatially contiguous populations, where, for each neuron, we may have access to some or most of its inputs. Such recordings are made possible by calcium imaging, wherein animals are genetically modified so that their neurons fluoresce in response to changes in calcium concentration, which in turn follows the electrical activity of the cells. This fluorescence is recorded using an optical microscope with sample period Δt. To study activity on the fastest available timescale, we use the sample period Δt to binarize each neuron into active (xi = 1) or silent (xi = 0; Fig. 1a).
We study four recordings of neuronal activity, each measured in previous experiments: one in the mouse hippocampus, two in the mouse visual cortex and one in the brain of the roundworm C. elegans. In the hippocampus, we study N = 1,485 neurons in the CA1 region as the mouse runs along a virtual track (Fig. 2a); activity is recorded with scanning period Δt = 1/30 s (ref. 11). In the visual cortex, we study N = 11,445 neurons recorded with scanning period Δt = 2/3 s as the mouse is exposed to two separate visual stimuli: natural images (Fig. 3b) or a grey screen to measure spontaneous activity10 (Fig. 3c). In C. elegans, we study N = 128 neurons comprising nearly the entire brain recorded as the animal moves freely with period Δt = 1/1.7 s (Fig. 3d)12. In the hippocampus and C. elegans, we construct models for all neurons, while in the visual cortex, we study 100 randomly selected output neurons.
Due to the sizes of the populations, some neurons never co-fire during the length of a given recording, leading to vanishing correlations 〈yxi〉 = 0. To avoid overfitting and divergences in the model parameters, for each output neuron y, we consider only inputs xi that co-fired with the output at least once, thus yielding positive correlations 〈yxi〉 > 0.
Minimal set of inputs
Given an output y and a desired number of inputs n, the greedy algorithm identifies the locally optimal set of inputs x = {x1, …, xn}; however, we still need a principled method for choosing n. At each stage of the greedy algorithm, we have a model P(y∣x) with n inputs. We use this model to predict the correlations \({\langle y{x}_{i}\rangle }_{P}\) with the other N − n − 1 neurons in the population. If all of these predictions are correct—that is, if they match the true correlations 〈yxi〉 within experimental errors—then including another input amounts to fitting statistical noise. Thus, for each output neuron, we continue selecting inputs greedily until we reach a number n* for which the model predicts all other correlations. Specifically, we terminate the greedy algorithm when \(| \langle y{x}_{i}\rangle -{\langle y{x}_{i}\rangle }_{P}| \le 2\sqrt{\langle y{x}_{i}\rangle/L }\) for all neurons i with positive correlations 〈yxi〉 > 0, where \(\sqrt{\langle y{x}_{i}\rangle /L}\) is the standard error of 〈yxi〉 (assuming Poisson statistics). In this way, n* defines the minimal number of inputs needed for the model to match all of the (positive) direct dependencies, by either fitting or prediction. We confirm that this avoids overfitting (Supplementary Fig. 1).
Higher-order and time-delayed dependencies
The statistical dependencies between inputs and output are encoded in correlations. As discussed above, the direct dependence P(y∣xi) is uniquely defined by the averages 〈y〉 and 〈xi〉 and the correlation 〈yxi〉. Similarly, assuming stationarity the time-delayed dependence \(P(y(t)| {x}_{i}({t}^{{\prime} }))\), where \({t}^{{\prime} } < t\), is defined by 〈y〉, 〈xi〉, and the time-delayed correlation \(\langle y(t){x}_{i}({t}^{{\prime} })\rangle\). The second-order dependence P(y∣xi, xj) is defined by the direct dependencies P(y∣xi) and P(y∣xj) plus the triplet correlation 〈yxi xj〉. Thus, given a model that matches the direct dependencies, predicting the second-order dependencies amounts to predicting the triplet correlations, as in Fig. 4a. More generally, given a model that matches all of the (k − 1)th-order dependencies, predicting kth-order dependencies amounts to predicting the corresponding (k + 1)th-order correlations (Fig. 4b,c).
Ablation robustness
To study the robustness of the inferred models, we artificially remove (or ablate) inputs and study how this impacts the predicted output63 (Fig. 5f). Given a model P(y∣x), we remove an input i by marginalizing over its activity, yielding a new model
$$\widetilde{P}(y| {\bf{x}})=\frac{{\sum }_{{x}_{i}}P(y| {\bf{x}})Q({\bf{x}})}{{\sum }_{{x}_{i}}Q({\bf{x}})},$$
(5)
where Q(x) is the empirical distribution over inputs. Note that we do not refit the model parameters; we simply marginalize the original model over the ablated inputs. After removing a given fraction of inputs, we compute the mutual information \({S}_{\mathrm{tot}}-S(\widetilde{P})\) between the output and the remaining inputs (Fig. 5g) as well as the prediction error \(\frac{1}{L}{\sum }_{\ell }(1-\widetilde{P}(y(\ell )| {\bf{x}}(\ell )))\) (Fig. 5h). In practice, we marginalize over a specified fraction of the population and repeat the above analysis for each remaining neuron as the output. We then average over all of the output neurons and 100 random realizations of this marginalization process.
Data availability
The data analysed in this article are openly available via GitHub at https://github.com/ChrisWLynn/Minimal_computation.
Code availability
The code used to perform the analyses in this article is openly available via GitHub at https://github.com/ChrisWLynn/Minimal_computation.
References
White, J. G., Southgate, E., Thomson, J. N., Brenner, S. et al. The structure of the nervous system of the nematode Caenorhabditis elegans. Philos. Trans. R. Soc. Lond. B 314, 1–340 (1986).
Lin, A. et al. Network statistics of the whole-brain connectome of Drosophila. Nature 634, 153–165 (2024).
Loomba, S. et al. Connectomic comparison of mouse and human cortex. Science 377, eabo0924 (2022).
McCulloch, W. S. & Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5, 115–133 (1943).
Hopfield, J. J. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl Acad. Sci. USA 79, 2554–2558 (1982).
Rosenblatt, F. Principles of Neurodynamics: Perceptrons and the Theory of Brain (Spartan Books, 1962).
Hertz, J., Krogh, A. & Palmer, R. G. Introduction to the Theory of Neural Computation (Addison–Wesley, 1991).
Hopfield, J. J. & Tank, D. W. Computing with neural circuits: a model. Science 233, 625–633 (1986).
Amit, D. J., Gutfreund, H. & Sompolinsky, H. Spin-glass models of neural networks. Phys. Rev. A 32, 1007 (1985).
Stringer, C., Pachitariu, M., Steinmetz, N., Carandini, M. & Harris, K. D. High-dimensional geometry of population responses in visual cortex. Nature 571, 361–365 (2019).
Gauthier, J. L. & Tank, D. W. A dedicated population for reward coding in the hippocampus. Neuron 99, 179–193 (2018).
Dag, U. et al. Dissecting the functional organization of the C. elegans serotonergic system at whole-brain scale. Cell 186, 2574–2592 (2023).
Demas, J. et al. High-speed, cortex-wide volumetric recording of neuroactivity at cellular resolution using light beads microscopy. Nat. Methods 18, 1103–1111 (2021).
Urai, A. E., Doiron, B., Leifer, A. M. & Churchland, A. K. Large-scale neural recordings call for new insights to link brain and behavior. Nat. Neurosci. 25, 11–19 (2022).
Rieke, F., Warland, D., Van Steveninck, R. d. R. & Bialek, W. Spikes: Exploring the Neural Code (MIT Press, 1999).
Schneidman, E., Berry II, M. J., Segev, R. & Bialek, W. Weak pairwise correlations imply strongly correlated network states in a neural population. Nature 440, 1007–1012 (2006).
Jan, Y.-N. & Jan, L. Y. Branching out: Mechanisms of dendritic arborization. Nat. Rev. Neurosci. 11, 316–328 (2010).
Poirazi, P. & Papoutsi, A. Illuminating dendritic function with computational models. Nat. Rev. Neurosci. 21, 303–321 (2020).
Petersen, C. C. Whole-cell recording of neuronal membrane potential during behavior. Neuron 95, 1266–1281 (2017).
Jaynes, E. T. Information theory and statistical mechanics. Phys. Rev. 106, 620 (1957).
Cover, T. M. & Thomas, J. A. Elements of Information Theory (John Wiley & Sons, 2012).
Berger, A., Della Pietra, S. A. & Della Pietra, V. J. A maximum entropy approach to natural language processing. Comput. Linguist. 22, 39–71 (1996).
Huang, F.-L., Hsieh, C.-J., Chang, K.-W. & Lin, C.-J. Iterative scaling and coordinate descent methods for maximum entropy models. J. Mach. Learn. Res. 11, 815–848 (2010).
Pillow, J. W. et al. Spatio-temporal correlations and visual signalling in a complete neuronal population. Nature 454, 995–999 (2008).
Goris, R. L., Movshon, J. A. & Simoncelli, E. P. Partitioning neuronal variability. Nat. Neurosci. 17, 858–865 (2014).
Stevenson, I. H., Rebesco, J. M., Miller, L. E. & Körding, K. P. Inferring functional connections between neurons. Curr. Opin. Neurobiol. 18, 582–588 (2008).
Weber, A. I. & Pillow, J. W. Capturing the dynamical repertoire of single neurons with generalized linear models. Neural Comput. 29, 3260–3289 (2017).
Ostojic, S. & Brunel, N. From spiking neuron models to linear-nonlinear models. PLoS Comput. Biol. 7, e1001056 (2011).
Mensi, S., Naud, R. & Gerstner, W. From stochastic nonlinear integrate-and-fire to generalized linear models. NeurIPS 24, 1926–1934 (2011).
Churchland, M. M. et al. Stimulus onset quenches neural variability: a widespread cortical phenomenon. Nat. Neurosci. 13, 369–378 (2010).
Priebe, N. J. & Ferster, D. Mechanisms of neuronal computation in mammalian visual cortex. Neuron 75, 194–208 (2012).
Strong, S. P., Koberle, R., de Ruyter van Steveninck, R. R. & Bialek, W. Entropy and information in neural spike trains. Phys. Rev. Lett. 80, 197 (1998).
Schneidman, E., Still, S., Berry II, M. J. & Bialek, W. Network information and connected correlations. Phys. Rev. Lett. 91, 238701 (2003).
Block, H.-D. The perceptron: a model for brain functioning. I. Rev. Mod. Phys. 34, 123 (1962).
Muroga, S. Threshold Logic and Its Applications (John Wiley & Sons, 1972).
Lynn, C. W., Yu, Q., Pang, R., Bialek, W. & Palmer, S. E. Exactly solvable statistical physics models for large neuronal populations. Phys. Rev. Res. 7, L022039 (2025).
Lynn, C. W., Yu, Q., Pang, R., Palmer, S. E. & Bialek, W. Exact minimax entropy models of large-scale neuronal activity. Phys. Rev. E 111, 054411 (2025).
Carcamo, D. P., Weaver, N. J., Dixit, P. D. & Lynn, C. W. Minimax entropy: the statistical physics of optimal models. Phys. Rev. E 112, 061001 (2025).
Moser, E. I., Kropff, E. & Moser, M.-B. Place cells, grid cells, and the brain’s spatial representation system. Annu. Rev. Neurosci. 31, 69–89 (2008).
O’Keefe, J. & Conway, D. H. Hippocampal place units in the freely moving rat: why they fire where they fire. Exp. Brain Res. 31, 573–590 (1978).
Hafting, T., Fyhn, M., Molden, S., Moser, M.-B. & Moser, E. I. Microstructure of a spatial map in the entorhinal cortex. Nature 436, 801–806 (2005).
Meshulam, L., Gauthier, J. L., Brody, C. D., Tank, D. W. & Bialek, W. Collective behavior of place and non-place neurons in the hippocampal network. Neuron 96, 1178–1191 (2017).
Randi, F., Sharma, A. K., Dvali, S. & Leifer, A. M. Neural signal propagation atlas of Caenorhabditis elegans. Nature 623, 406–414 (2023).
Beniaguev, D., Segev, I. & London, M. Single cortical neurons as deep artificial neural networks. Neuron 109, 2727–2739 (2021).
Gidon, A. et al. Dendritic action potentials and computation in human layer 2/3 cortical neurons. Science 367, 83–87 (2020).
Losonczy, A. & Magee, J. C. Integrative properties of radial oblique dendrites in hippocampal CA1 pyramidal neurons. Neuron 50, 291–307 (2006).
Polsky, A., Mel, B. W. & Schiller, J. Computational subunits in thin dendrites of pyramidal cells. Nat. Neurosci. 7, 621–627 (2004).
Takahashi, N. et al. Locally synchronized synaptic inputs. Science 335, 353–356 (2012).
London, M. & Häusser, M. Dendritic computation. Annu. Rev. Neurosci. 28, 503–532 (2005).
Sykova, E. Extrasynaptic volume transmission and diffusion parameters of the extracellular space. Neurosci. 129, 861–876 (2004).
Poirazi, P., Brannon, T. & Mel, B. W. Pyramidal neuron as two-layer neural network. Neuron 37, 989–999 (2003).
Park, P. et al. Dendritic excitations govern back-propagation via a spike-rate accelerometer. Nat. Commun. 16, 1333 (2025).
Adelson, E. H. & Bergen, J. R. Spatiotemporal energy models for the perception of motion. J. Opt. Soc. Am. A 2, 284–299 (1985).
Shadlen, M. N. & Newsome, W. T. Neural basis of a perceptual decision in the parietal cortex (area LIP) of the rhesus monkey. J. Neurophys. 86, 1916–1936 (2001).
Loewenstein, Y. & Sompolinsky, H. Temporal integration by calcium dynamics in a model neuron. Nat. Neurosci. 6, 961–967 (2003).
Shi, Y.-L., Zeraati, R., Laboratory, I. B., Levina, A. & Engel, T. A. Brain-wide organization of intrinsic timescales at single-neuron resolution. Preprint at bioRxiv https://doi.org/10.1101/2025.08.30.673281 (2025).
Morrell, M. C., Sederberg, A. J. & Nemenman, I. Latent dynamical variables produce signatures of spatiotemporal criticality in large biological systems. Phys. Rev. Lett. 126, 118302 (2021).
Das, A. & Fiete, I. R. Systematic errors in connectivity inferred from activity in strongly recurrent networks. Nat. Neurosci. 23, 1286–1296 (2020).
Lynn, C. W., Holmes, C. M. & Palmer, S. E. Heavy-tailed neuronal connectivity arises from Hebbian self-organization. Nat. Phys. 20, 484–491 (2024).
Liu, G. Local structural balance and functional interaction of excitatory and inhibitory synapses in hippocampal dendrites. Nat. Neurosci. 7, 373–379 (2004).
Van Vreeswijk, C. & Sompolinsky, H. Chaos in neuronal networks with balanced excitatory and inhibitory activity. Science 274, 1724–1726 (1996).
Lynn, C. W. & Bassett, D. S. The physics of brain network structure, function and control. Nat. Rev. Phys. 1, 318 (2019).
Meyes, R., Lu, M., de Puiseau, C. W. & Meisen, T. Ablation studies in artificial neural networks. Preprint at https://arxiv.org/abs/1901.08644 (2019).
Steinmetz, N. A. et al. Neuropixels 2.0: a miniaturized high-density probe for stable, long-term brain recordings. Science 372, eabf4588 (2021).
Buzsáki, G. Large-scale recording of neuronal ensembles. Nat. Neurosci. 7, 446–451 (2004).
Wei, Z. et al. A comparison of neuronal population dynamics measured with calcium imaging and electrophysiology. PLoS Comput. Biol. 16, e1008198 (2020).
Steinmetz, N. A., Koch, C., Harris, K. D. & Carandini, M. Challenges and opportunities for large-scale electrophysiology with Neuropixels probes. Curr. Opin. Neurobiol. 50, 92–100 (2018).
Ackley, D. H., Hinton, G. E. & Sejnowski, T. J. A learning algorithm for Boltzmann machines. Cog. Sci. 9, 147–169 (1985).
Dorkenwald, S. et al. Neuronal wiring diagram of an adult brain. Nature 634, 124–138 (2024).
Shapson-Coe, A. et al. A petavoxel fragment of human cerebral cortex reconstructed at nanoscale resolution. Science 384, eadk4858 (2024).
Acknowledgements
We thank P. Dixit, B. Machta, D. Clark, T. Geiller, M. Leighton, F. Mignacco, D. Carcamo, N. Weaver and Q. Yu for enlightening discussions and comments on earlier versions of the article. We also acknowledge support from the National Institutes of Health (NIH/NIGMS R35GM160188) and the Department of Physics, Quantitative Biology Institute and Wu Tsai Institute at Yale University.
Ethics declarations
Competing interests
The author declares no competing interests.
Peer review
Peer review information
Nature Physics thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Lynn, C.W. Simple input–output dependencies explain neuronal activity. Nat. Phys. (2026). https://doi.org/10.1038/s41567-026-03306-3
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41567-026-03306-3