This is a quick story about how I developed an AI coding skill for myself and realized it unintentionally competes with startups that do something very different, raise millions, and make money by saving time and cutting costs for software developers.
I am a practicing AI product builder. After working for three months across ten different projects, I discovered that this AI coding skill also does the following:
I do a lot of rapid prototyping, proof-of-concept exploration, and MVP validation with real customers. That means my clients and I sometimes reimagine and redesign the same idea multiple times a day. We pivot often, so reusing previously developed code does not always make sense. From an engineering perspective, it can be easier to start from scratch. But from a product perspective, that means similar requirements have to be given to AI over and over again. PRD documents help and are always a good start, but as always, the devil is in the details. During a coding session, more nuances bubble up, and product requirements change, evolve, and drift from the original document. The problem with starting over from the original PRD is that valuable information gets lost. It is painful. I had to type or dictate everything from memory (btw, check out LocalFlow, my free, private, open-source alternative to paid voice dictation tools). That was the user pain that started all of this.
This skill started as an AI coding skill that keeps the product requirements document up to date across coding sessions. It worked like a charm and let me switch between implementations quickly. The next thing I added was a change log, mostly so I could stop inventing clever Git commit messages every time. Fast forward: at some point, I realized that even within the same codebase, I still had to repeat myself, steer-correct, and hard-override things that had already been discussed but were lost across separate sessions. That is how the Product Traceability skill started tracking pivotal decisions. Finally, Claude Code itself suggested adding a traceability matrix to make it easier to trace how requirements map to code, tests, and decisions. Together with a few Git hooks, producing and maintaining these four files is essentially the Product Traceability skill:
The skill is open source: https://github.com/vmysla/agent-skill-product-traceability
MIT licensed. One-command install. After installation, Claude Code gets a standing rule: “After changing code, config, or structure, update the trace files.” That means every future session can start by reading the repo’s own memory instead of reverse-engineering the project from scratch.
Without realizing it, I had given my AI coding assistant agentic memory. Once I saw that, I wanted to know whether it had a material impact on my day-to-day workflow.
To find evidence, I ran a retroactive analysis using historical Claude Code session logs. Luckily, I had ten projects of different complexity: AI agent skills, websites, complex web apps, and mobile applications.
Disclaimer: the numbers below come from a pre/post analysis and would require a larger A/B comparison for statistical significance. Please treat every data point as directional unless stated otherwise.
The first discovery was that passing these four files to AI, along with the context skill, improves caching. Specifically, +35 percentage points in estimated prompt-cache hit rate, improving it from 5% to 40%.
According to my analysis, this translates into 30-45% savings on input-token costs. The agent reads four compact files near the start of the session. Those files are high-signal and relatively stable, which makes them friendly to prompt caching. Instead of paying full price to rediscover project context every turn, more of that context can be reused.
Cache saves both time and money. It adds up quickly. For a small startup with two agile teams, or roughly ten developers, using Claude Opus, this can mean $10K/year that could be invested into marketing or a proper team-building event.
Another powerful productivity component is that AI agents typically have to rediscover your codebase at the beginning of each session. Without traceability, a fresh session in a real codebase begins with a familiar ritual: ls, find, and grep for entry points. In a complex repo, the AI often gets it wrong on the first pass and re-greps with a different query. In practice, this discovery work takes 1 to 5 minutes. Without enough context, it can be triggered often and become a taxing recurring fee that wastes hands-on engineering time. When the AI guesses the architecture from a grep result, it is often wrong on the first pass and has to backtrack. When those mistakes are missed, you get so-called AI slop code and broken applications. AI coding agent sessions with the Product Traceability skill contain -30% fewer pre-edit exploration steps (Read/Grep/Glob calls).
The expensive part is not just paying for tokens. The expensive part is making high-cost agents repeatedly rediscover context your repo could have remembered for free.
I tried to boil all the data into a single “task completion rate” number. The closest metric I could get from the logs is a synthetic “Session Productivity Rate.”
Session Productivity Rate measures the percentage of AI coding sessions that reach a concrete productive outcome instead of ending in exploration, confusion, context-window exhaustion, or abandoned orientation.
A productive session is one where the agent produces at least one meaningful forward-moving result, such as:
code edit
bug fix
implemented feature
passing test improvement
useful refactor
updated trace docs tied to a real code/config change
clear implementation plan that leads directly to a later code change
A non-productive session is one that mostly burns context without landing work:
repo exploration only
repeated grepping/reading with no edit
wrong-file investigation
abandoned task
context-window exhaustion
re-litigating already-settled architecture
“I need more context” loops
Session Productivity Rate showed one of the clearest differences. In non-trivial sessions, repos using product traceability reached a productive edit in 92% of sessions. Repos without traceability reached a productive edit in 64% of sessions. That is a +28 percentage point lift, or roughly a 44% relative improvement in sessions that made it to actual code-writing behavior.
Even more important, the failure rate dropped from 36% without traceability to 8% with traceability — about a 78% reduction in sessions that stalled before an edit.
This is not a perfect task-completion metric, because an edit can still be wrong, but it is the closest proxy in the logs: did the agent actually start changing the repo, or did it burn the session exploring? Based on the data, context-window exhaustion was the dominant issue without the skill. The main difference was that traceability repos gave the agent project memory before it started working.
Lastly, I found a few other interesting metrics that qualitatively reflect my positive engineering experience:
From medium to low re-litigated decisions per session.
35x higher mention rate for decision and requirement keywords in projects with traceability, showing that the docs are actually referenced mid-session.
6x increase in median session depth without re-initiation or interruption due to context-window limitations.
+77% average transcript length per session.
It is impressive how much one can achieve with so little. Agentic AI coding tools are still in their infancy, and if you know how things should be done, the impact on code quality and engineering productivity can be tremendous.
Of course, these numbers should be carefully validated, but I can qualitatively conclude that the skill works and does its job well. I feel more productive. AI turns are shorter. Agents make fewer wrong moves, and my frustration during coding sessions has materially decreased. I can trust AI more and get better products at the end.
You can install this skill from GitHub. It works best for new projects where product traceability evolves with the project from day one. As you code with this skill enabled, you should see its impact grow over time and improve the quality of your coding sessions.
Additionally, I recommend that everyone experiment with existing enterprise-level agentic memory products.
● Selection bias. Traceability projects are bigger, more mature projects, so over time they are likely to cache better. Causation is mixed.
● No API-level usage data in JSONL. Cache-read percentages are estimates from transcript structure, not measured cache_read_input_tokens.
● Session length ≠ savings. “6× longer sessions” measures sustained engagement, not per-turn efficiency.
● Conservative vs. optimistic range. The lower bound (~30% context saved) assumes only the PRD is reused; the upper bound (~50%) assumes the full traceability quartet plus changelog history is reused across many turns.


