Introduction
At Corpogames our stack is built on a Ruby on Rails monolith. Over the years, we used a mix of Bootstrap 5 and Tabler.io for our CSS. We wanted a cleaner and more maintainable solution, so we decided to migrate to Tailwind CSS.
This task had been sitting in our backlog for months, with an initial estimation of roughly two weeks of work.
In this blog post, I’ll explain how we managed to get Claude Code to rewrite the code and evaluate its own work by capturing before/after screenshots to compare them visually.
We significantly boosted the success rate of these comparisons by telling Claude to ask GPT-4o for a second opinion.
This allowed Claude Code to iteratively fix differences until both Opus and GPT-4o agreed the pages matched.
Initial Observations
I had already noticed that Claude Code excels when it can evaluate its own work.
- We use unit tests and integration tests as our primary feedback mechanism. Claude is instructed to run these tests at the end of each task.
- In many situations, we also had good results instructing Claude to run the server and interact with the application using curl commands and shell scripts.
However, the process was much more tedious for pure front-end tasks, particularly those involving CSS:
- Claude performs the initial migration.
- I test the results locally on my machine.
- I take screenshots and paste these images back into Claude, explicitly pointing out the CSS issues.
For a project involving 16,000 lines of code across dozens of screens, this manual approach was clearly not going to be enjoyable.
We needed a better solution.
Inspiration and Experimentation
Around the same time, I watched an excellent presentation from the Claude Code team: https://www.youtube.com/watch?v=gv0WHhKelSE&ab_channel=Anthropic
This inspired me to experiment with providing Claude tools to automate the process of capturing screenshots of its work — both before and after migrating a page — and comparing them visually.
It turned out to be effective:
Press enter or click to view image in full size
After a few initial experiments with this tool, my conclusion was that Claude Opus was decent at interpreting screenshots, but it had limitations. Often, I still had to explicitly describe visual differences between two screenshots to help Claude fix the issues.
To address this, I decided to compare Claude’s performance with GPT-4o. Although I don’t have formal benchmarks, I quickly noticed that each model typically identified different issues.
Therefore, I wrote a second CLI tool allowing Claude to call GPT-4o and clearly describe the visual differences between screenshots.
This significantly improved the quality of the final migration.
The Final Process
The complete, refined workflow we developed looked like this:
- Claude launches the server and captures a reference screenshot of the page.
- Claude performs the HTML and CSS migration.
- Claude captures a new screenshot of the updated page.
- Claude compares the two screenshots and iteratively fixes any differences until it considers them identical.
- Both screenshots are then sent to GPT-4o for a second opinion :)
- This iterative process continues until both Claude and GPT-4o confirm there are no significant visual differences.
- I manually debug any remaining minor issues.
Sample result:
Before screenshot:
Press enter or click to view image in full size
After screenshot:
Press enter or click to view image in full size
Conclusion
Using this automated approach, simple pages such as those containing basic forms, were migrated without any manual intervention.
Complex pages with custom designs did require some additional work.
Overall, we managed to complete the migration and deliver all screens to our staging environment in about two days : a dramatic improvement over the original estimate of two weeks !
And most importantly we transformed a painfully boring task into a process that was actually quite fun :)