On-the-fly code generation won’t fly

5 min read Original article ↗

Gaurav Singh

Its safe to say that everyone has heard of OpenClaw (previously named MoltBot and ClawDBot) by now, or at least everyone living in the AI world. It has taken the world by storm. Many people use it as a personal assistant that handles all kinds of tasks through simple text messages, almost like talking to a friend. You can ask it to save images, translate text from those images, search for restaurants, make reservations, and many other things.

All from a chat interface.

Even the founder of OpenClaw admitted he was surprised when it accomplished tasks he didn’t expect it to handle.

The Threat to SaaS?

Today, there are hundreds of thousands of specialized apps built to perform these exact tasks.

Take fitness apps as an example. They:

  • Create personalized workout routines
  • Remind you what to eat and when
  • Schedule exercises
  • Track progress

Now there’s an obvious threat to such apps — which make up a large portion of the SaaS world — from software like OpenClaw.

In one interview, the OpenClaw founder even said that products like OpenClaw would reduce the need for many existing apps. Why download and pay $1.99/month for a fitness app when you can simply ask OpenClaw:

“Create a schedule based on my fitness goals and remind me throughout the day.”

OpenClaw can generate the schedule, set reminders, and execute the plan — all without a dedicated app.

Functionally, it’s clear: it’s easier to talk to an LLM than to install and manage multiple SaaS tools.

But we might be missing something subtle.

What’s Actually Happening Under the Hood?

When you talk to an LLM to perform a task, it executes a sequence of instructions, similar to what a deterministic SaaS app would do.

The difference?

  • In a traditional app, the code is written once and reused.
  • With an LLM, the “code” is generated on the fly and executed immediately.

It’s almost as if instead of using an app, you now have a programmer who creates a minimal app for your task every time you ask for it.

This is incredibly flexible. But it’s not very efficient.

Every time you ask the LLM to do something — even for the 100th time — it regenerates the solution from scratch and executes it.

A traditional app, by contrast:

  1. Writes the code once (possibly with LLM assistance).
  2. Reuses that code every time.
  3. Avoids repeated expensive LLM calls.

From a systems perspective, that’s far more efficient.

The Economics of LLMs vs Apps

Another important aspect of the app economy: most revenue comes from loyal power users.

Apps typically make little to no money from occasional users. Casual users don’t want to pay for something they use rarely.

Now compare this with LLM-based execution.

Every LLM call has a base cost — electricity, hardware, infrastructure. Even if hardware gets cheaper, electricity imposes a hard floor on costs.

Estimating conservatively:

  • 1M input tokens ≈ $0.50
  • 1M output tokens ≈ $4.00

(Output tokens are significantly more expensive due to model architecture.)

Even if token prices drop, electricity costs won’t vanish. Eventually, after the hype fades, margins matter again. Companies must optimize costs to stay competitive.

And in that world, repeatedly regenerating code on the fly becomes expensive compared to pre-generated, reusable software.

Which suggests something important:

On-the-fly code execution will likely be replaced by pre-generated code — just integrated differently.

Two Ways Apps Could Evolve

There are two main paths forward.

1. LLM-Assisted App Creation

Humans use tools like Claude Code or Cursor to build SaaS products with heavy LLM assistance. These apps are then deployed and reused repeatedly.

This is still SaaS, but built faster and potentially offered at lower prices due to competition and reduced development costs.

2. Memory-Enabled Agents

Instead of regenerating code each time, agents could have memory.

If an agent has executed a task before, it doesn’t need to regenerate the solution. It can:

  • Retrieve past generated code
  • Reuse it
  • Output a pointer instead of rewriting everything

Since output tokens are expensive, this dramatically reduces cost and latency.

To eliminate LLM calls entirely, smaller models could classify and trigger stored interaction sequences directly. This would sacrifice some flexibility but could be ideal in ultra-low-latency or cost-sensitive environments.

Why Apps Won’t Disappear

The concept of an app isn’t going away.

It’s simply inefficient to rebuild something from scratch every time, once it has already been built.

Generating a calendar app every time you want to store an invite doesn’t make sense, even if generation is cheaper than before.

The real question is not:

Will apps disappear?

But rather:

Will apps live inside the memory and learning layer of agents — or remain as separately deployed systems?

When to Use What?

We believe:

  • For instruction sets that are not highly complex and not long-horizon, a strong memory and learning layer is enough.
  • For highly complex, heterogeneous, long-horizon tasks, deployed apps are better.

Imagine trying to generate Salesforce on the fly for each use case. It’s impractical.

In such cases, the agent should call a deployed tool — not recreate it dynamically.

What We’re Building

We are two ML PhDs from University College London building exactly this kind of memory and learning layer at versanovatech.com.

Our system enables agents to:

  • Remember experiences
  • Share knowledge
  • Learn over time
  • Improve continuously

The goal is simple: agents that get better every day at their job.

We would love to hear your thoughts? 😀