1196 Discussion Thread | Erdős Problems

16 min read Original article ↗

PROVED (LEAN) This has been solved in the affirmative and the proof verified in Lean.

Is it true that, for any $x$, if $A\subset [x,\infty)$ is a primitive set of integers (so that no distinct elements of $A$ divide each other) then\[\sum_{a\in A}\frac{1}{a\log a}< 1+o(1),\]where the $o(1)$ term $\to 0$ as $x\to \infty$?

A conjecture of Erdős, Sárközy, and Szemerédi. Lichtman [Li23] has proved that\[\sum_{a\in A}\frac{1}{a\log a}< e^{\gamma}\frac{\pi}{4}+o(1)\approx 1.399+o(1).\]This was solved by GPT-5.4 Pro (prompted by Price), which proved that for any primitive set $A\subset \mathbb{N}$\[\sum_{\substack{a\in A\\ a>x}}\frac{1}{a\log a}\leq 1+O\left(\frac{1}{\log x}\right).\]See the comment section for further refinements and discussion.

Lichtman [Li20] proved that if $A$ is the set of all integers with exactly $k$ prime factors (so that $A\subset [2^k,\infty)$ and $A$ is a primitive set) then\[\sum_{a\in A}\frac{1}{a\log a}\geq 1+O(k^{-1/2+o(1)}),\]and suggested that the true rate of decay may be $O(2^{-k})$. Gorodetsky, Lichtman, and Wong [GLW24] have proved that\[\sum_{a\in A}\frac{1}{a\log a}= 1-(c+o(1))k^22^{-k}\]where $c\approx 0.0656$ is an explicit constant.

See also [164] for the case $x=1$.

View the LaTeX source

This page was last edited 15 April 2026. View history

External data from the database - you can help update this
Formalised statement? Yes
Related OEIS sequences: Possible

Order by oldest first or newest first. (The most recent comments are highlighted in a red border.)

  • The anticipated paper on this problem by Alexeev, Barreto, Li, Lichtman, Price, Shah, Tang, and Tao is now on arXiv!

  • A feature of this solution that seems worth emphasizing is that it is not merely a Markov-chain trick, but a dual certificate for the divisibility poset.

    The cleanest way I now understand the proof is as follows. Put a weighted directed edge\[
    nq\to n,\qquad
    w(nq,n)=\frac{\Lambda(q)}{nq\log^2(nq)}.
    \]Then the von Mangoldt identity gives the exact outflow\[
    \operatorname{Out}(n)
    =
    \frac{1}{n\log n},
    \]while Mertens' theorem gives the asymptotic inflow\[
    \operatorname{In}(n)
    =
    \sum_q\frac{\Lambda(q)}{nq\log^2(nq)}
    =
    \frac{1}{n\log n}
    +
    O\!\left(\frac{1}{n\log^2 n}\right).
    \]Thus the flow is almost divergence-free on the tail \(n\ge x\), with total divergence\[
    \sum_{n\ge x}|\operatorname{div}(n)|
    \ll
    \frac1{\log x}.
    \]Now take\[
    \Omega=\{d\ge x:\ d\mid a \text{ for some } a\in A\}.
    \]The primitivity of \(A\) is used in exactly one decisive place: every edge entering \(a\in A\) comes from outside \(\Omega\), because if \(aq\in\Omega\), then \(a\mid aq\mid b\) for some \(b\in A\), forcing \(a=b\), impossible for \(q>1\). Hence the inflow into \(\Omega\) sees\[
    \sum_{a\in A}\frac1{a\log a}
    \]up to the \(O(1/\log x)\) error. On the other hand, every edge leaving \(\Omega\) must cross below \(x\), and the same Mertens computation gives boundary outflow\[
    1+O\!\left(\frac1{\log x}\right).
    \]The discrete divergence theorem then immediately gives\[
    \sum_{a\in A}\frac1{a\log a}
    \le
    1+O\!\left(\frac1{\log x}\right).
    \]This formulation makes the role of \(\Lambda\) especially transparent. It is doing two independent jobs at once: locally, it gives the exact divisor identity\[
    \sum_{q\mid n}\Lambda(q)=\log n,
    \]and globally, it gives the correct Mertens normalization at the boundary. That combination is precisely what removes the extra constants appearing in earlier approaches.

    In this form, the proof feels like a weighted LYM/Sperner certificate for the divisibility lattice: construct a nearly divergence-free flow whose chains are divisibility chains, observe that an antichain can be crossed only once, and compute the boundary mass. It would be very interesting to know how far this flow-certificate viewpoint extends, especially to restricted prime supports, \(k\)-almost-prime layers, and Banks--Martin type extremal questions.

  • We have built what one might call the von Mangoldt downward process $n \mapsto n/q$ (with transition probability $\Lambda(q)/\log n$), the von Mangoldt measure $\nu$, and the von Mangoldt upward process $n \mapsto qn$ (with transition probability $\Lambda(q) \nu(nq) / \nu(n) \log (nq)$, all powered by the basic identity $\sum_{q|n} \Lambda(q) = \log n$. But this process can "jump" over primes due to the fact that the support of $\Lambda$ contains prime powers in addition to just primes.

    But suppose we take the fundamental theorem of arithmetic identity $\sum_{p|n} v_p(n) \log p = \log n$ instead, where $v_p(n)$ is the number of times $p$ divides $n$. This then gives a "downward prime process" in which $n$ transitions to $n/p$ with probability $v_p(n) \log p / \log n$, and then should give a "prime measure" $\tilde \nu$ that is invariant under this process and asymptotic to $1/n \log n$, as well as an "upward prime process" in which $n$ transitions to $np$ with probability $v_p(np) \log p \tilde \nu(np) / \log(np) \tilde \nu(n)$. This should be very similar to the von Mangoldt processes, but now the inequality $\sum_{a \in A} \tilde \nu(a) \leq 1$ should be an identity in the case $A$ is the set of products of $k$ primes (possibly repeated), because this process does not jump over primes. (This is similar to the modification of $\nu$ proposed by Will, but has a slight difference, in that when repeated primes occur in the factorization, Will often factors them out consecutively in the downward process, whereas in this process they will be factored out at random times.)

    Given how nice the formula for $\nu$ was - basically the Laplace transform of $1/\zeta$ - perhaps one can hope that there is a similarly clean formula for $\tilde \nu$? Among other things, this should lead to an improvement of the $1/\log x$ type error term (indeed, one might now hope to not just improve the constant, but get better asymptotic decay of the error term than just one power of the logarithm).

    A variant question: suppose we restrict the von Mangoldt downward process $n \mapsto n/q$ to squarefree numbers (note that the property of being squarefree is preserved by the flow). What is the new canonical measure $\nu_{sf}$ attached to this restricted process? My initial guess is that it will have to do with the Laplace transform of $\zeta(2s)/\zeta(s)$.

  • I got Gemini to do a literature review on the GPT-generated proofs provided by Arb Research, and connections to existing probabilistic constructions of prime factorizations, particularly focusing on the work of Arratia. I summarize some of the points of that review here.

    First, we recall the zeta distribution $Z_s$, which for any $s>1$ is a probability distribution on the natural numbers with law

    $$ {\bf P}(Z_s = n) = \frac{1}{\zeta(s) n^s}.$$

    Crucially, because of the Euler identity $\zeta(s) = \prod_p (1-p^{-s})^{-1}$ (as well as the fundamental theorem of arithmetic identity $\frac{1}{n^s} = \prod_p p^{-v_p(n) s}$), $Z_s$ has a very clean prime factorization
    $$ Z_s = \prod_p p^{e_{p,s}}$$
    where the $e_{p,s}$ are independent geometric random variables with mean $p^s$, thus
    $$ {\bf P}(e_{p,s} = k) = p^{-ks} (1 - p^{-s}).$$

    For $s$ close to one, $Z_s$ tends to obey the size law $\log Z_s = O(\frac{1}{s-1})$. So we expect $Z_s$ to go to infinity as $s$ tends to $1$. But it turns out we can make this intuition far more precise and useful, coupling all these individual zeta random variables together into a single "zeta process" with nice properties.

    A geometric random variable of mean $p^s$ can also be thought of as the longest consecutive success time of a sequence of independent events, each having a success probability of $p^{-s}$. One example of such an event is when an exponential variable of rate $\log p$ exceeds $s$. Thus, if for each prime $p$ we create an infinite sequence $E_{p,1}, E_{p,2}, \dots$ of exponential clocks of rate $\log p$, i.e., independent random variables with the exponential distribution
    $$ {\bf P}(E_{p,k} \geq s) = p^{-s}$$
    then we can couple all the zeta variables $Z_s$ together by defining the $e_{p,s}$ to be the last time $E_{p,k}$ stays consistently above $s$,
    $$ e_{p,s} = \max \{ k: E_{p,1},\dots, E_{p,k} \geq s \}$$
    or equivalently the first time $E_{p,k}$ dips below $s$, minus $1$,
    $$ e_{p,s} = \min \{ k: E_{p,k} < s \} - 1,$$
    and then setting $Z_s := \prod_p p^{e_{p,s}}$. By doing so, the $e_{p,s}$ are now non-increasing in $s$, and hence the zeta variables have formed a continuous divisibility chain: $Z_{s_2} | Z_{s_1}$ when $1 < s_2 < s_1$. Thus, as $s$ decreases down to $1$, $Z_s$ will mostly stay constant, but experience upward "jumps" every so often when it gets multiplied by a copy of itself. Or: as $s$ increases up to $1$, $Z_s$ is mostly constant, but occasionally "jumps" downward to a factor of itself, caused by one of the $e_{p,k}$ dropping to some lower value.

    How often does this drop occur? Let $s>1$, and $ds$ be infinitesimal. If $Z_s = n$ for some natural number $n$, then there will be a drop between $Z_s$ and $Z_{s+ds}$ if, for some prime $p$, one of the exponential random variables $E_{p,i}$, $1 \leq i \leq e_{p,s}$ is not just greater than or equal to $s$, but also less than $s+ds$. Conditionally on the first statement, the second statement occurs with probability $- \frac{\frac{d}{ds} p^{-s}\ ds}{p^{-s}} = \log p\ ds$ up to higher order terms, for a given $p$ and $i$. Thus, up to higher order terms, the probability of a jump is
    $$ \sum_p \sum_{i=1}^{e_{p,s}} \log p\ ds = \log n\ ds.$$
    Thus, if one increases $s$ from a starting point with $Z_s = n$, there is an exponential clock with rate $\log n$ to time when a drop occurs. The same analysis shows that the value $Z_{s'}$ that $Z_s$ drops to is determined by the downwards von Mangoldt process, i.e., it will drop to $n/q$ with probability $\Lambda(q)/\log n$. So this process is a "Poissonization" of the von Mangoldt process, where we have reindexed using continuous time with exponential clock spacings, rather than indexed by the natural numbers. (AI search suggests that this type of Poissonization trick goes back to this 1968 paper of Athreya and Karlin.) The upwards jump process can also be analyzed in a similar fashion (comparing $Z_s$ to $Z_{s-ds}$), but the formulae are a little bit messier (in particular the transition probabilities now depend on $s$ in an arithmetic fashion).

    So we have now generated an infinite divisibility chain process. How often is any given natural number $n$ hit? Any such hit will create a (unique) transition point where $Z_s=n$ and $Z_{s+ds} \neq n$ for some infinitesimal $ds$. By the above analysis, the probability that this occurs for a given $s, ds$ is approximately $\frac{1}{\zeta(s) n^s} \log n\ ds$, and so the hitting probability is
    $$ \int_1^\infty \frac{1}{\zeta(s) n^s} \log n\ ds$$
    which is one of our formulae for $\nu(n)$! So, because any divisibility chain hits a primitive set $A$ at most once, we again get a proof of
    $$ \sum_{n \in A} \nu(n) \leq 1$$
    which when combined with routine asymptotics for $\nu$ gives a solution to #1169.

    At each $s>1$, the drop in $Z_s$ should occur in time of the order of $\frac{1}{\log Z_s}$, which is in turn of the order of $s-1$. So we should expect about one such drop for every dyadic shell $[1+e^{-m-1}, 1+e^{-m}]$ of $s>1$ on the average. This is broadly consistent with the Hardy-Ramanujan law $\omega(n) \sim \log\log n$ given the heuristic $\log Z_s \sim \frac{1}{s-1}$. It should also "explain" the Erdos-Kac law via some sort of martingale central limit theorem, but I haven't worked out the details of this.

    The work of Arratia, Barbour, and others analyzed individual zeta distributions $Z_s$ for fixed choices of $s$ to achieve various couplings between the prime factorization of large random natural numbers at fixed scales, and the Poisson-Dirichlet process, giving yet another proof of Billingsley's theorem; I would imagine that this approach would also recover other basic results in the anatomy of integers such Dickman's theorem on the distribution of smooth numbers. The novelty here uncovered by the AI proofs (together with subsequent human analysis) is that these zeta distributions $Z_s$ can be coupled together in a natural way to generate an infinite divisibility process, which turns out to be particularly pleasant to work with when Poissonized into a continuous process rather than a discrete one.

    The description of this "zeta process" in terms of the independent clock variables $E_{p,k}$ makes it quite tractable for all sorts of probabilistic computations. It may be worthwhile exploring other results or problems in the anatomy of integers in terms of this process, and specifically in terms of these clock variables.

    EDIT: Here is a quiver diagram reflecting my current mental "map" between all these concepts.

  • I've managed to rearrange the proof to not explicitly mention any Markov chains at all (although they are very much beneath the surface), and instead rely on a discrete version of the divergence theorem, namely:

    Discrete divergence theorem Let $G$ be a directed weighted graph, with each edge $e$ having a "flow" $w_e \geq 0$, and only finitely many outgoing edges from any vertex. Define the "divergence" $\mathrm{div}_v$ at a vertex to be the sum of the flows out of $v$ minus the flows into $v$ (this is also known as the (negative) excess at $v$). Then, for any finite set $\Omega$ of vertices, the sum $\sum_{v \in \Omega} \mathrm{div}_v$ of the divergences in $\Omega$ is equal to the outflow of $\Omega$ (the sum of all the flows from a vertex in $\Omega$ to a vertex outside of $\Omega$) minus the inflow (the sum of all the flows from a vertex outside of $\Omega$ to one inside $\Omega$).

    Proof. Sum up all the outflows minus inflows from the vertices in $\Omega$: the contributions of any flow from one vertex in $\Omega$ to another cancels out. Infinite numbers of inflows are not a problem as all such contributions have the same sign. $\Box$

    Proof of #1196. Consider the graph whose vertices are the natural numbers, and for which there is an edge from $nq$ to $n$ with flow $\frac{\Lambda(q)}{nq \log^2(nq)}$ for any $n \geq 1$ and $q>1$. The fundamental identity $\sum_{q|n} \Lambda(q) = \log n$ tells us that the outflow from any $n>1$ is $\frac{1}{n \log n}$. A Mertens theorem calculation shows that the inflow $\sum_q \frac{\Lambda(q)}{nq \log^2(nq)}$ is $\frac{1}{n \log n} + O(\frac{1}{n\log^2 n})$. So the divergence at $n$ is $O(1/n \log^2 n)$. In particular the total $\ell^1$ norm of the divergence on vertices $n \geq x$ is $O(1/\log x)$. By the divergence theorem, we conclude that for any finite set $\Omega$ of numbers greater than $x$, the difference between the outflow and inflow of $\Omega$ is $O(1/\log x)$.

    Now let $A$ be a primitive set consisting of elements exceeding $x$, which we can assume (by the usual limiting argument) to be finite. We set $\Omega$ to be all the factors of elements of $A$ that exceed $x$; this is a finite set that contains $A$ (we consider every element to be a factor of itself). Every element $n$ of $A$ contributes an inflow of $\sum_q \frac{\Lambda(q)}{nq \log^2(nq)} = \frac{1}{n \log n} + O(\frac{1}{n \log^2 n})$, since every $nq$ lies outside of $\Omega$ thanks to the primitivity of $A$. So the total inflow is at least $\sum_{n \in A} \frac{1}{n \log n} - O(1/\log x)$. On the other hand, the total outflow is at most $\sum_{q,n: n < x \leq nq} \frac{\Lambda(q)}{nq \log^2(nq)}$, which by another application of Mertens theorem is $1 + O(1/\log x)$. Thus, by the divergence theorem, we conclude $\sum_{n \in A} \frac{1}{n \log n} \leq 1 + O(1/\log x)$. $\Box$

    Remark 1 One could, if desired, tweak the weights from $\frac{\Lambda(q)}{nq \log^2(nq)}$ to $\frac{\Lambda(q)}{\log(nq)} \nu(nq)$, to make the flow perfectly divergence-free (except at $n=1$ where it is a sink of inflow $\nu(1)=1$), with an inflow and outflow of exactly $\nu(n)$ at every $n > 1$, and remove the requirement that the elements of $\Omega$ exceed $x$; this then gives the exact inequality $\sum_{n \in A} \nu(n) \leq 1$.

    Remark 2 Every flow graph generates an outflow Markov chain and inflow Markov chain (except at vertices where the outflow or inflow vanishes). For the graph under consideration here, these correspond to the upward and downward Markov chains in previous arguments. But it seems it is really the flow graph which is the fundamental object, not the two Markov chains.

    Remark 3 (added later) Perhaps with some tinkering with these flow weights, one can improve on the current best lower bound of $1 - \frac{\gamma}{\log x} + O(\frac{1}{\log^2 x})$ for $\sum_{n \in A} \frac{1}{n \log n}$.

    Remark 4 (added later) This may end up being one of these math problems where the a posteriori difficulty of the problem is actually significantly lower than the a priori difficulty. (Other examples of such problems include the finite field Kakeya conjecture and the sensitivity conjecture. Within the field of anatomy of integers, the Hardy-Ramanujan law is of this form; the first proofs consisted of quite lengthy calculations, but the modern proofs are basically a routine application of the second moment method and Mertens' theorem.)

    Remark 5 (added later) I had previously stated the opinion that the AI-generated proof had inadvertently highlighted a tighter connection between the anatomy of integers and the theory of Markov chains than had previously been explicitly noted in the literature. Based on further developments, I would like to update that opinion to the following: the AI-generated proof artefact, when combined with subsequent (and mostly human-generated) analysis, has revealed a tight connection between the anatomy of integers and flow network theory that does not, to my knowledge, have any explicit precursor in the literature (although related uses of Markov chains in adjacent settings do appear in that literature). With this reframing, the role of Markov chains is secondary: any flow network canonically generates two useful Markov chains, one for inflows and one for outflows, but it is possible to use flow network theory without direct mention of these flows, by using other tools in the theory such as the divergence theorem or MAXFLOW-MINCUT.

    (Workflow: Given the analogy between this problem and Sperner's theorem / the LYM inequality, I started thinking about proofs of the latter. It felt like flows were playing a key role in our arguments, so I experimented first with the MAXFLOW-MINCUT theorem without much success, then hit upon using the divergence theorem instead, taking a cue from how vector fields (which are continuous analogues of flows) are utilized in PDE or in geometric inequalities. I will leave it as an exercise to the reader how to prove the LYM inequality (and hence Sperner's theorem) via the discrete divergence theorem. EDIT: My one use of AI tooling in this process was to help filter out the unviable MAXFLOW-MINCUT approach: I asked ChatGPT to try to reprove Sperner's theorem using this result, and it instead gave me a proof using the Hall marriage theorem which was not appropriate for what I had in mind, causing me to abandon this direction.)

All comments are the responsibility of the user. Comments appearing on this page are not verified for correctness. Please keep posts mathematical and on topic.