Claude vs. OpenAI GPT-4 generated content, side-by-side comparison
- OpenAI GPT-4 https://gist.github.com/adaboese/12e3c3d28783bc831c202ad1e55d932b
- Claude 3 (Opus) https://gist.github.com/adaboese/d0b7397381726a7d394920e6a82ee39c
Both of these are outputs of AIMD app. They are not made using a single prompt, but rather using RAG with over a dozen instructions. This allows to test a quite broad range of expectations, such as the adherence to instructions, error rate, speed, etc. Since the two model APIs are mostly compatible, I've decided to compare it side-by-side.
A few interesting observations:
- Claude followed instructions a lot closer than OpenAI. The outline that was provided to the initial instructions is pretty close to the final article structure despite multiple revisions.
- Claude output scored better in terms of use of broader set of data formats (tables, lists, quotes).
- Contrary to many tweets, Claude output is not excessively verbose. Worth mentioning that part of RAG instructions to rewrite content for brevity.
- Claude took 5 minutes to execute 52 prompts. OpenAI took 7 minutes. Forgot to mention, Claude appears to be a lot more rate-limited that OpenAI. Hit quite a few concurrency rate limits, but as long as you have auto-retry, it is non-issue. Image inputs (prompts) are generated with respective models, but the actual images are generated using DALLE.