Maybe ChatGPT has some pre-frontal cortex problems
solresol.substack.comThis is a really odd way to test capabilities of an LLM. First, most photos of clocks are 10:10, since the training data for watches are usually set to 10:10 (in order to better sell watches etc).
Second, I don't think the photo generation aspect of chat gpt is being marketed or presented as a problem solving AI.
I like the part where the AI couldn’t be trusted to draw a clock, so we trusted it to psychoanalyze the incorrect clock
I administered the CDT to ChatGPT and got Claude to diagnose what was wrong with the "patient" based on the results.
There are signs of pre-frontal cortex damage or early stage dementia.
But does the patient get better or worse with each update?
Here's the thing (which you probably knew going in).. Generative AI is quite well-known to be terrible at drawing specific times on clock faces.
This is down to the training data. It has been trained on a huge amount of images.
That includes advertising. For whatever reason, wrist watch manufacturers have a tendency to set watches to 10:10 in ads, almost without exception. Perhaps it's just a nice-looking time, or it's good for comparison purposes.
Simply Google "wrist watch" and you'll see.
So, these generative models have a huge bias towards 10:10 on clock faces, because that's what all the clocks they've been trained on look like.
FWIW, Claude 3.5 Sonnet got the SVG right on the first try: https://claude.site/artifacts/8dedf16e-b861-4497-96e2-872773...
Prompt was just "create an svg of a clockface with the time being 10 past 11"
I love the concept of the article where one LLM can't draw a simple clock but the other one can accurately diagnose medical conditions from a hypothetical drawn image.
It has sentience problems...