With each successive generation of LLM, I have tried to vibe code an inventory management system for my business. With GPT-Codex-5.1, I was finally able to get it done. It felt like a watershed moment in AI capabilities for me.
Except after a couple of weeks, I stopped using it and went back to managing my inventory in Google Sheets.
The issue was the many edge cases in projecting sales that are not that hard for me to deal with but are kind of a pain to incorporate into a deterministic inventory management system.
Some examples:
There are a couple of week-long holidays where China shuts down, plus Chinese New Year, which is a month. I have to factor these into order timelines, and the specific dates change every year.
For one brand, I have inventory stored at a 3PL in addition to Amazon, so that needs to be accounted for (and would require me to keep that number updated manually).
Because most products I sell have at least some seasonality, the starting point of next year’s sales projections are last year’s sales. But if I ran out of stock during some month last year or the prior owner of the business didn’t run a Prime Day sale where I plan to, I have to adjust for that when projecting.
The prior year’s sales have to be adjusted for changes in sales velocity. It’s not uncommon that I’ll acquire a brand, fix some stuff and see sales immediately go up. This increase has to be taken into account, but to properly account for it, the system has to know when I made the acquisition and look at YoY changes since then.
I did add some features to deal with the latter two bullets, like a UI to let me modify historical sales data, but dealing with them was eating more time than it was saving.
Ultimately, what I needed wasn’t a deterministic inventory management system, it was someone to do inventory management the way I was doing it — take the data available, apply the relevant context that lives in my head and then use that whole picture to figure out when I need to reorder.
Enter Claude Code with Opus 4.5!
As I was experimenting with CC, it dawned on me that the solution to my problem wasn’t to continue addressing every possible edge case in my vibe coded app. Rather, it was to give CC all the context I was using and ask it use that the way I was using it.
I spent half a day getting it set up with Amazon credentials for each of my brands and having it build out scripts to pull inventory and sales data. Then we talked through the relevant context for each brand, which it saved in a file to be read immediately before doing inventory projection.
First I asked it to review the data and let me know if it had questions. It picked up all of the anomalously low sales months, and I gave it guidance on what those months would have been if not for stockouts and the like. It also noted some items that were extremely low in stock with Amazon not showing any inbound units, which in most cases I explained were things I wasn’t going to restock.
Then I gave it the rest of the context it needed — dates I acquired each brand, inventory at 3PLs, supplier lead times, supplier locations (one of my suppliers is in Georgia, so no need to worry about CNY there), etc.
Once it had everything I needed, I asked it for inventory projections for each brand and recommendations on what to reorder soon. It crunched the numbers, and it came out with projections that were very bad.
In most of the cases, it was pretty obvious what was going wrong — even though it knew to account for seasonality in its sales projections, it decided to do so in a lazy way unbecoming of one of the greatest intelligences on this planet. Instead of using historical data, it just made some assumptions about how much higher sales would be in the summer than they were over the last 30 days. Unfortunately it generally assumed they’d be 2-3x higher at peak season, when the actual number was more like 5-15x depending on the product.
No problem; I had it add a note to claude.md in the project folder to always use the prior year’s monthly sales as the basis for future projections.
Asked it to give me inventory projections again. Better, but not good enough. In some cases it didn’t adhere to the note and just continued to do lazy seasonality adjustment. In other cases it just took last year’s sales and used the exact same numbers as its projection for this year.
Thus began a day-long cycle of improvement. It would give projections, I’d find issues and ask it to make changes to its process, and it’d give another set of projections. I honestly thought that Opus 4.5 would be able to figure out a lot more of the implementation details itself, but apparently even the greatest artificial mind of our time needs explicit direction.
At first, when it did things incorrectly, I’d just tell it what was wrong and how to do each task correctly in the future. It would dutifully note my feedback, but with surprising frequency it would then do continue to do things the wrong way. I’d ask it whether it had made any mistakes, and it would review the instructions, apologize for not having followed them and assure me that it would follow them in the future. Then it would make the same mistake again on the next go.
So I did what I would do with any enthusiastic and smart employee who was having trouble following my feedback; I gave it very clear, step-by-step instructions.
First, I created the slash command /inventory to give us a clear starting point. When invoked, the first thing it would do was check a .md file. No room for creative thinking; straight to specific instructions. Those instructions are:
Check to see if the sales/inventory data we have locally is >5 days old, and if not, pull fresh data from Amazon.
Check the context notes for the brand it’s working on.
Pull inventory and sales data from the local cache.
Display the trailing twelve month sales data by SKU and by month.
Describe any factors it’s going to use to make adjustments to TTM sales data when projecting NTM sales.
Review its stated adjustment factors, then project NTM sales by SKU on a monthly basis, and display the amount of stock remaining at the end of each month based on those projections. Use Python for the math here.
Recommend when to reorder each SKU, factoring in CNY and other holidays.
That list was built out over time, with increasingly specific steps added to head off issues it was having following the instructions. Steps 4, 5 and 6 in particular were added in response to it making sales projections in ways I had not sanctioned.
First, even though I instructed it to use TTM monthly sales data as the starting point for its projections, it would pretty frequently fall back to either just taking the average daily sales from the past 30 days and using that or making guesses about the impact of seasonality.
I fixed that with step 4, but then it would just assume this year’s sales would be the same as last year’s, so I added step 5 to force it to think about what to change.
After it had the right starting point and methodology for adjusting that, it would make math errors that would lead it to come up with the wrong recommendations of when I should reorder. Step 6 fixed those and made it easier for me to sanity check its numbers.
The difference between having AI code me an app to do inventory management vs. having it build tools for itself and then use those to act as my inventory analyst proved to be enormous; it’s basically the difference between using a simple piece of software yourself and hiring a super smart person who can write whatever software he needs. The former approach left me with a tool that was extremely impressive as a demonstration of vibe coding abilities but ultimately not that useful. The latter is something I use consistently — since I implemented this slash command, I have not once used a spreadsheet for this task.
The key is the ability to remember and apply context. With a deterministic inventory management app, I would’ve had to code every brand-specific variation in way that would just never be worth the time.
With one brand that sells leather goods, for example, my manufacturer has to order a large amount of leather from their supplier, which takes 30 days to prepare, after which my manufacturer needs another 30 days to make the product. A leather order is typically enough for 2-3 orders from my manufacturer, so correctly determining the date by which I need to order more stock requires knowing whether my manufacturer already has leather on hand.
Could I build this into an app? Sure. But even with AI doing the actual coding, it’s just not worth the time to build a one-off feature for the idiosyncrasy of this one specific brand when I could just keep a note about the status of the leather in a Google Sheet and make the adjustment in my head.
Even if I did build a leather tracking feature, I’d then have to either keep manually adjusting the amount of leather when I placed an order or build another feature to check for inbound shipments, find the unit counts of those shipments and then adjust the leather down by the amount required to make the number of inbound units. I hated building these complicated one-off features for big customers when I was a B2B SaaS PM, and I very much don’t want to build them for myself.
With Claude, that entire workflow was resolved by just explaining the context to it:
<Leather goods brand> needs 30 days to prepare leather and 30 days to manufacture the goods, unless the manufacturer already has some prepared leather. Right now there is NX sq. ft. of prepared leather, and each unit takes Y sq. ft. to make. When you see a new inbound shipment from that brand, figure out how much leather was used by the unit count, confirm that with me, and then subtract that from the amount of prepared leather available. When recommending reorder dates, make sure to account for whether there is already sufficient leather for the size of order I need to place or whether the manufacturer will need to order more.
That’s it! Already tested and working as desired.
Claude stores this kind of context in one file per brand, and all I have to do is ask it if I want it to make updates. Like an intelligent human assistant would, it also uses context to proactively suggest updates.
For brands where I have inventory stored outside of Amazon, it tracks that stock in the context file. When I set this up, I figured I’d just let it know when to update the amount of non-Amazon inventory, but the first time I had units shipped from a 3PL to Amazon I forgot to tell Claude. When it saw the inbound shipment on Amazon, though, it asked if I had shipped those from the 3PL and if it should adjust the number of units stored there accordingly. Proactive employee beats software every time!
I used to keep my own docs on each brand with info related to ordering — things like what the MOQs are for each product and notes on how to work with each of the suppliers. I’ve since just dumped all of that context to Claude. It mostly doesn’t use it because it’s not placing orders from my suppliers (yet), but it’s a lot faster for me to ask Claude, “Who’s my point of contact for ordering <product>” than it is for me to go into Google Drive, find the doc for that brand and pull up the info that way. It really does feel like having a smart assistant who’s on top of things!
This project has really led me to look at my use of AI through a new lens. In the past, it was a tool with a range of uses from analysis to research to software development. Now it’s moved up a layer in the stack — it can still do analysis and research and software development, but those are now just interim steps that serve as inputs to its job as an actual assistant.
The common thread between my previous attempts at using AI for inventory management and the current one is that the AI wrote a lot of code. In prior attempts, though, I was the user of the tools it was coding. In this iteration, Claude is coding tools for itself. I don’t see the output of the code; what I see is the output of the assistant after it has used the output of the code.
As AI gets increasingly capable and trustworthy, I expect this trend to continue. At some point in the (probably not very far) future, I imagine that Claude (or whatever the model du jour is) will be proactively sending me inventory updates with recommendations on when to reorder. After that, it’ll move to doing the reordering itself, and I’ll just be sending the wire transfers for payment. Eventually it’ll be sending the wires, and I’ll just be getting quarterly updates on the business. It’s basically the same progression I’d expect if I hired a smart person and trained them up to run my business, because in a sense, that’s kind of what’s happening here.
If you’re using Claude Code (and if you’re not, I beg you to try it), let me offer a few recommendations.
Think about how you can use APIs to connect it to the services where your information lives. One of the big pain points of AI to this point has been the meat puppet problem — it can do useful work and provide valuable insight, but because it’s confined to a chat window, you have to manually bring in the context it needs to do its work and take out the results that it generates to put them into the places where they add value.
Claude Code already improved on this with its ability to easily modify files on your computer, but if you’re like most people, the lion’s share of your information is stored in the cloud rather than locally. Get it a Google Cloud API key, and now it can go straight to your Google Sheets to pull info or make changes. I recently gave it a QuickBooks API key, and now it can do my books instead of telling me how do to them. If you don’t know anything about how to get an API key, I happen to know a genius who lives in your command line that you can ask.
Bigger picture, stop thinking about LLMs and tools and start thinking about them as employees. Instead of giving it a specific task, explain your problem and ask for help solving it. If you tell it to vibe code an app, you’ll get an app. If you tell it you need help figuring out when to reorder inventory and ask it what information it needs to get the job done, you might just find yourself with a smart, proactive analyst that can build itself all the tools it needs to do a bangup job.
