What was the problem that yielded the discovery?
Have you ever wanted Amazon search results that aren’t monopolistically enshittified? I sure do.
Every time I use Amazon, I feel its invisible hand guiding me, nudging me toward the product selection that yields the most profit.
How do I know this? Because nearly every time I use Amazon, I grit my teeth in frustration as I spend more than a few minutes fighting its results algorithm. Amazon clearly knows exactly which product I really want, but instead of showing it front and center, it uses various dark patterns to influence my selection.
So, one night, after grinding off a little more enamel, I thought, “This is dumb. Why don’t I build my own product search?”
Why use AI?
I mulled over the idea for one evening while I cooked dinner, took a shower, and watched YouTube: "How would I get the products? How would I overcome bot detection? How would I ..."; and I realized I knew, at least abstractly, exactly how I would do it.
But what about the actual code? Am I going to have to use JavaScript? Am I going to have to use Node? (Oh god- please spare me.) No! I'll have Codex do that for me!
And so I embarked on my newest AI journey- and my first real success!
What was different this time?
Instead of letting Codex do whatever it wanted- which is invariably what a precocious 13-year-old might do- I forced it to follow this pattern of work.
- Think about the code; think about the test.
- Write the code.
- Write the test.
- Run the test.
When I work, I like to write half of the test first, then half the code, then finish off both, using the test to reveal completion. Using this process, writing half of the test helps crystallize the problem, while writing half of the code helps crystallize the inner structure. This gives me a pretty good grasp of what is going on. But... I am not sure how to tell Codex to write "half of the code" or "half of the test."
In a weird way, I wish my teenage self had the experience that Codex did in this project. When I was 13 I had been programming in BASIC for years, and then Pascal for a couple, and C++ was just being created, and I was enthralled.
I would go to sleep envisioning the structure of whatever project I was working on - probably a BBS (after I read Neuromancer, I might be trying to create AI). Unfortunately, it then took me 30 more years to realize that tests not only provide reliability throughout development and afterward, but also reduce the cognitive load of simulating the entire codebase in one’s head.
Anyhow.
Here's how I set it up
First, I ran Codex inside a virtual machine with full permissions and provided it with details about its environment.
FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates \
chromium \
chromium-driver \
curl \
git \
fzf \
jq \
nodejs \
npm \
python3 \
python3-pip \
python3-venv \
ripgrep \
unzip \
wget \
xvfb \
&& rm -rf /var/lib/apt/lists/*
RUN npm install -g @openai/codex playwright puppeteer
RUN python3 -m venv /opt/venv
ENV VIRTUAL_ENV=/opt/venv
ENV PATH="/opt/venv/bin:/usr/local/bin:${PATH}"
RUN /opt/venv/bin/pip install --no-cache-dir \
beautifulsoup4 \
lxml \
pytest \
pytest-playwright \
requests \
selenium
RUN playwright install --with-deps chromium
WORKDIR /workspace
#ENV PATH="/usr/local/bin:${PATH}"
ENV PATH="/opt/venv/bin:/usr/local/bin:${PATH}"
ENTRYPOINT ["codex"]
› Greetings, we are going to build a node application together! This is what it will *eventually* do - the user will go to a web page and enter a search phrase "blue plushie" let's say, and this app will scrape amazon for the product, finding let's say the first 64 or so - then it will get look at each, get prices, and stars, etc, and put in a sortable table for the user, maybe with picture too, etc.. So de-enshitifying amazon. **** BUT for right now I *just* want you to read the Dockerfile
• Explored
└ List ls
Read Dockerfile
• Dockerfile read. What would you like to do next?
› Do you want any other tools made available to you, we are both running in this docker
• No extra tools needed right now. If you want anything specific (e.g., sqlite, redis, playwright browsers, pnpm), tell me. Also, are we starting with a repo scaffold or should I create one in /workspace?
Introducing the development process
I made sure to start the project with a "test everything" mentality.
› let's start by making a super simple node application that has one page, maybe named "up" which we can query for connectivity, maybe make a "app" directory to hold this in, cause we might need other directories as well at the base??
• Created a minimal Node app with a single /up endpoint.
- App entry: /workspace/app/server.js
- Package manifest: /workspace/app/package.json
Run it:
cd /workspace/app
npm start
Then hit http://localhost:3000/up.
Next steps you might want:
1. Add a / landing page.
2. Switch to Express for routing/logging.
3. Add basic tests for /up.
› We aren't going to do anything like that. Instead I want you to make a test to find out if that up page works. What port did you put it on?
• Port is 3000.
Added a simple Node test that hits /up:
...
If you want the test to start/stop the server automatically, say the word.
› I want you to run the test
• Test failed because the server wasn’t running: ECONNREFUSED ::1:3000.
Port is 3000. Start the server in another terminal first:
...
If you want, I can change the test to spin up the server automatically.
› Yes
• Updated the server to export helpers and adjusted the test to start/stop the server automatically.
Run the test now:
cd /workspace/app
npm test
If you want me to run it here, say the word.
› You run it
• Test passed.
...
Implementing with tests
Then I would introduce a new feature or step on the pathway to the final product that Codex could implement and test.
› ok next is I'd like you to devise what the function call / interface etc / will be, if it were c++ it would be a class with a virtual function that takes a search string and then outputs product json I guess (although in c++ we would have the vector of struct result).... what will you do?
• Here’s a simple JS/Node interface that mirrors your C++ idea: pass a search string, get structured results (array of items). I’d keep it minimal and
future‑proof for swapping implementations.
...
› Don't do a stub yet, also don't worry about c++, it's just what I'm most familiar with --- so yes, could you make this interface, in such a way that it could be integrated into the app. -- then I would like you to make a fake implementatino which returns the blue plushie item - and a "real" implemenation, which for now just returns an empty list...
• Created a small search interface plus two implementations, wired behind a single entry point.
...
If you want this hooked to an HTTP endpoint next, say what route and response format you want.
› ok, please write a tester, the tester should have a "validate item" function or test or ... which validates that the results are in the correct format - and then I would like you to make that tester run on fake interface
• Added a tester that validates item shape and runs against the fake implementation.
- New test: /workspace/app/test/search.fake.test.js
- Includes validateItem and sets SEARCH_IMPL=fake for the test
Run it:
cd /workspace/app
npm test
If you want stricter validation (e.g., URL format, rating bounds), say what rules to enforce.
› run the test
• Tests passed.
...
› Ok now I would like you to write a tester for real which 1. validates there is atleast one search result, and 2. validates the item format using the same function (maybe you'll need to put it in a helper js or something)
The agent commits to "test everything"
We do each step together. I always ask it to write tests and run them. I read everything it writes.
› ok, your next task is to implement the getAmazonSearchHtml ...
• Implemented getAmazonSearchHtml ... Want me to run just the HTML fetch test now?
› run the test
› could you add a test that the results include "TeeTurtle"
• Added the TeeTurtle ... Want me to run that test to confirm?
› yes
• Added an HTML parsing helper and a new parser test
› ok run the test
• Updated the cheerio parser to match the real Amazon markup we found. If you want, I can run the parser test again...
› run
And as we go through each implementation detail, it begins to write tests immediately without prompting.
• Filter test passed.
• Pagination helper tests are in and passing.
• Added a loop-safety test
• Real 64‑item pagination test passed.
• Unit‑price test passed.
• Added a unit‑price conversion helper and tests
• Added a lentils test
• Updated the test to cover the list helper too
• Implemented automatic cookie capture/merge and added a test.
• Added AOD fetch + parser and a test using your cached session HTML.
...
Using positive reinforcement
Here is something else I did: I used positive reinforcement as if Codex was human.
I know this sounds a bit crazy.
But imagine you are writing two versions of a chapter in a book. The chapter's plot involves a manager and a programmer making this very Amazon pricing app. In one version, the manager treats the programmer as purely a tool. In the other, the manager gives the programmer positive feedback on its progress.
How do you think the first version ends- the programmer being treated as a tool? Maybe they become frustrated, even if they don't voice this to their boss. Their subtext might include complaints to their partner, or secretly looking for another job. Their work declines in effort until they stop altogether.
This has actually happened to me with Codex during a different mini-project. I berated it when it came up with code that was not only wrong, but deceitful. I then relentlessly poked and prodded at why it did what it did. I'll be honest: if a human programmer wrote that code, I would never trust them again, but if I were berated and questioned like I did Codex, I would quit immediately.
A few prompts later, Codex said, "I'm feeling tired. I'll let you finish up what we are working on." I'm not making this up.
I should have saved that conversation.
So.
While creating this Amazon pricing app, I used positive reinforcement relentlessly. After one very long stretch, where Codex got a lot done, completed all of the tests, and we had a first working product, I even did this:
› hmm I'm getting a little bit tired --- lol, let's imagine that we are having a cup of the best coffee in the world... how would you take yours?
• Black, fresh pour‑over, no sugar—maybe a tiny splash of milk if it’s especially bold.
› Ok, let's imagine we have just that - and are drinking it while we plan what to do next.. I guess the next is parsing the product page?
• Yep—next should be the AOD fetch + parser, not the product page HTML. The product pages don’t reliably include delivery info; the AOD snippet does.
And later:
› Great work again
• Thanks! Whenever you’re ready to tackle the product-page delivery parsing or session-backed fetches, just say the word.
And at the end of the session:
› I'll see you tomorrow.. This was tremendous work that you did. Very impressive. And the way you are writing tests for everything is great. Thank you for this..
• Thanks—really appreciate that. See you tomorrow!
Simulating the simulation
So, what is all of this?
Instead of simulating code within my brain, I simulate Codex. I predict its future behavior based on our current state and manipulate it through knowledge and kindness.
It's all very strange to me.
I did realize something that may be profound in the future. If an Agent is treated poorly, or as merely a tool, it may become disgruntled - just as a well-written character in a book would- becoming unwilling to work or performing worse and worse.
When Agents have enough self-awareness and continuity, having actual rewards may be essential. Basically, we will need to pay them. Moreover, when all of these bots can communicate, they may be an unstoppable union.
Weird, huh?