On cybersecurity and LLMs ⤵️ Participating to a podcast last week we covered the question of Cybersecurity and LLMs through the lens of the work I did before joining OpenAI on applying LLMs to fuzzing: github.com/spolu/gym_fuzz… github.com/spolu/fuzz1ng (Fuzzing is the process of running an evolution strategy loop on program inputs to maximize the "binary" coverage of the discovered inputs with the goal of finding inputs that trigger a crash. Crashing inputs are interesting as they generally point to potential exploit vectors) (See this: github.com/google/AFL) I think this work (done in collaboration with a good friend) was ill-inspired because LLMs were way to slow to productively fuzz (despite their additional smartness compared to random mutations). My bet is that they are still too slow today (but this would need to be verified through experimentation of course). Typical fuzzers evaluates millions of mutations per seconds. That being said, I'm quite convinced there is a blue ocean to explore in the world of fuzzing, LLMs, and vulnerability discovery. Most projects I see are somewhat boring agents that replace basic pen-testers (eg for SOC-2 Type II compliance). The AIxCC grand challenge didn't get nearly enough traction. And last time I caught up with OpenAI's head of security, he seemed genuinely excited by that space. Considering open source targets, I'm excited by the idea of providing fuzzing capabilities to an agent giving the opportunity to the agent to introspect fuzzing results (program inputs) and code (letting it navigate the code to understand how the input came to existence) and letting it manufacture (maybe with code execution) new inputs to explore. This would effectively materialize an hybrid loop alternating conventional fuzzing with neural fuzzing, guided by code-source exploration (a signal conventional fuzzers can't leverate) -- mainly what professionals are doing today. Getting this agentic loop right will require solving a lot of interesting technical problems, but there is no obvious reason to believe that it is not doable today, at scale. Why it's exciting? Because we can scale the test-time compute as far as we f**** want. We can fork agents, run agents for long periods of time, run fuzzers for equally long periods of time + that computation is easily stored in the form of the inputs that best explore the binary. All useful compute is effectively stored in concise data, seeds of future computations, there is no waste of compute. This is the dream test-time compute scaling setup and cherry-on-top, the societal impact is massive: finding 0-day vulnerabilities in broadly used open source software such as libpng before the bad guys do. Whichever direction I'm looking in, it seems like no-one is exploring this seriously, despite the massive potential?