A First Look at Fable

6 min read Original article ↗

Anthropic's latest model was released today. Here's a first glance on how it performs and what it feels like in Plotly Studio. We'll have a more detailed post on this once we put it through the full set of benchmarks but for now here are our field notes.

We ran through 5 real world, personal, and relatable use cases that allow for a wide range of analytical depth.

1. Housing Rent or Buy Analysis with FRED Data

2. SF Mayor Effectiveness via 311 Data

3. Workout Impact Analysis from Apple Health - See our previous deep dive on this analysis.

4. SF Water Temp Buoy Dashboard App - Examining the water temp in San Francisco for open water swimming

5. Iran Conflict Economic Impact via FRED

A screenshot of the Plotly Studio home screen showing completed sessions with chart previews: Iran conflict economic impact via FRED, SF mayor effectiveness via 311 data, workout impact analysis from Apple Health, and an SF water temp buoy dashboard app.
A few of the sessions we ran through Fable tonight in Plotly Studio

Remarkable World Knowledge

Others have commented how this model "feels big" in its world knowledge.

We noticed this right away where every step of the analysis presents much more detailed contextual information about the data.

For example, it presents station information about each buoy with more detailed location information.

A screenshot of Plotly Studio listing NOAA buoys and tide stations near San Francisco with their water temperatures and locations, noting that there is no water temp sensor at Aquatic Park / Fort Point itself.
Fable presents much more detailed "world knowledge" about the data that it analyses.

I've written previously about LLM's curious and remarkable knowledge of public datasets, and this release is no different. Zillow Research data is a new dataset that I haven't seen any model surface before for this type of analysis.

A screenshot of Plotly Studio asking which source to use for current home prices and rents per metro, recommending Zillow Research ZHVI and ZORI data alongside the FRED indices.

Pauses Appropriately

Fable is positioned as a highly autonomous model that can work for hours or days at a time. From it's knowledge card (emphasis mine):

Claude Fable 5 [...] It is suited for long-running, complex, and asynchronous tasks that previously required frequent human check-ins.

It is particularly strong at end-to-end work that would otherwise take a person hours, days, or weeks - taking on problems that are long-running, ambiguous, or highly multi-step. It executes well-scoped tasks with few mistakes, automatically self-correcting through verification loops, and ships with robust safeguards.

In data work, we want our agentic loop to recognize ambiguity and pause and ask the operator for clarification if the ambiguity is consequential rather than just plough ahead. See more in our essay about designing agentic analytic benchmarks. I have been concerned that the over-emphasis on autonomy and long-running tasks would end up being in conflict to this behavior.

However, Fable handles ambiguity with grace and curiosity within Plotly Studio's agentic loop. It raises good questions and seeks clarification with appropriate context under the scenarios we presented.

A screenshot of Plotly Studio asking what the 102 workouts logged as 'Other' were, since naming that big chunk of training time makes the story clearer.
A question that Plotly Studio raised while running the Fable model when analyzing the Apple Health data. Great example of the model seeking clarification and providing good context.

Autonomy and Long Horizon Tasks

By default, the model still appears to work up to about 10 steps (about 15 minutes) for any given data analytics task. This is a reasonable behavior as a default as it prevents cost overruns or unnecessary depth of analysis.

It can be steered to work for longer horizon analytics tasks if you tell it to work longer in open exploration or if you give it a more detailed specification of the analytics task.

However, I find that it is still difficult for it to really go into open exploration on its own where it might come up with new questions and ideas as it works or follow different rabbit holes - as is common with exploratory data work. I suspect that this is RL trained behavior to prevent the loops from going "off the rails".

A screenshot of an 11-step plan in Plotly Studio to download all ~6M rows of 311 data and analyze it with DuckDB, in response to a prompt asking it to work for a long time on its own.

Solid visualizations with room for improvement

I was delighted to see it build out these physiological subplots similar to what we designed by hand when working through the Apple Health data.

It does a better job at handling labels than I've seen previously as well.

Subplots of physiological metrics from Apple Health data, with HR recovery, HRV, resting HR, and VO2 max as 28-day rolling means sharing one x-axis.
Subplots with shared x-axis

But some of the charts it creates are pretty dense by default, and difficult to interpret at a glance:

A screenshot of Plotly Studio showing a dense stacked bar chart of weekly training volume by activity, with a regimes table below.

But it takes direction well. I asked it to update the chart to display rolling averages and subplots instead of fixed aggregations and stacked bars and it had no problem:

A screenshot of Plotly Studio showing the same training volume redrawn as small multiples, one panel per activity, with 4-week rolling means.

It defaults to clean line and bar charts, but if you ask it for a wider range of charts it does well across the Plotly visualization stack:

A screenshot of Plotly Studio showing a treemap of 311 complaints colored by year-over-year change, with a heatmap of every category indexed to its 2023 baseline below.
A screenshot of Plotly Studio showing a slope chart of 311 case volume shifts by category on a log scale.
A screenshot of Plotly Studio showing a dumbbell chart of 90th percentile days to close for each 311 category, comparing the year before vs under Lurie.
A hotspot map of 311 reports across San Francisco with a toggle to switch between the before period and the Lurie period.

The Dash apps are very nice and clean as well:

A screenshot of a Dash app in Plotly Studio with a metric explorer showing VO2 max as a 28-day rolling mean with the training regime periods shaded by activity.

The first shot of graphs, reports, and graphs is remarkably good across the board. And it's also not uncommon to still see a few visual quirks here and there that need to be followed up upon, like large number formatting issues:

A screenshot of a data story app titled 'Judging a Mayor by His Inbox' where the large records-analyzed headline number wraps awkwardly onto a second line.

Stronger Analysis

The analysis was notably stronger across the board. It handled time series and lagging correlations better. In the Apple Health analysis, it correctly identified the different regimes of training and the hiking-to-VO2 max correlation that we found when steering the sessions in Plotly Studio more manually.

A screenshot of a Plotly Studio report that splits the Apple Health workout data into 11 training regimes, with a weekly training volume chart by activity.
A screenshot of a Plotly Studio report describing how VO2 max climbed to its peak squarely during the 2024 hiking block, with 28-day rolling line charts of the physiological metrics.

It does a better job at highlighting some of the core assumptions in the financial models as well:

A screenshot of Plotly Studio asking to confirm the ownership cost assumptions in the rent-vs-buy model: maintenance, insurance, and buyer and seller closing costs.

And the approach is strong as well, handling considerations like data size and server-side aggregation and having solid plans.

A screenshot of a Plotly Studio plan for the SF mayor 311 analysis, reasoning that the full dataset is millions of rows so the Socrata API's server-side aggregation matters.

It demonstrates capability to identify and investigate data quality issues during it's analysis autonomously:

A screenshot of a category renaming check in Plotly Studio, discovering that the 311 'Graffiti' category was split into 'Graffiti Public' and 'Graffiti Private' in June 2024 and that 22 categories appear or disappear between the comparison windows.

It also demonstrates ability to self correct it's analysis as it works:

A screenshot of Plotly Studio re-running the affected analyses on corrected time windows, with harmonized category counts, closure outcome mix, and spike-day exclusion effects.

Nice backtesting:

A screenshot of a 1996-2026 rent-vs-buy backtest in Plotly Studio showing the realized buy-minus-rent gap by metro.

Strong, neutral analysis:

A screenshot of a report section titled 'Inflation Pricing: A Spike, Not a Regime Change' with a chart of inflation expectations and real yield.

Thorough deep diving into data quality issues that can really distort a story. Rather than trying to make a story out of bad numbers, Studio discovered the data quality issues and then investigated and surfaced them its report.

A screenshot of a data story chapter titled 'Faster - but look closely at why' explaining that improved 311 resolution times come with caveats: a partly self-inflicted 2024 baseline and bulk closures that flatter the averages.
A screenshot of a chart of 311 cases closed per day with the Feb 2026 backlog purge spike annotated.
A screenshot of a data story chapter titled 'The quality of closed' noting that closing a case is not the same as fixing a problem, with a bar chart of the closure outcome mix before vs under Lurie.

Tone

In Plotly Studio, we steer the direction of the tone of the writing a fair amount in order to the analysis focussed on the data and remove some of the enthusiastic or sycophantic behavior. This model does a pretty good job at adhering to this but we still see a fair amount of AI's "hard hitting journalistic" tone and tropes ("Here's the honest answer") in the reports that it creates, especially if you ask for it to editorialize.

As an aside - Plotly Studio lets you control how much editorializing a write-up should provide, so you can create reports that just show the numbers.

A screenshot of a rent-vs-buy report section titled 'Sensitivity: The Verdict Hinges Almost Entirely on the Investment Return' with a chart of year-30 advantage vs assumed investment return.
A screenshot of a report section titled 'Equities and Credit: A Shallow, Fast Round-Trip' with a chart of equity indexes through the conflict window.

Methodology

Surfacing the "Methodology" is a first class feature in Plotly Studio, and this model does a fine and thorough job of reporting this.

A screenshot of the methodology section of the rent-vs-buy report, listing the columns and inputs, approach, constants, data sources, and limitations.
A screenshot of the methodology section of the economic impact report, listing the FRED series used, data sources, and limitations like attribution by event window rather than causal identification.

Cost

A normal, reasonable session costs about $0.5-$2. Tonight's exploration cost about $80 in tokens. It was 5 sessions in parallel over about 2 hours, and I was definitely steering the analysis to go deep to see how far I could get it to work in a long-running autonomous mode. These sessions also generated many reports and full fledged Dash apps - about 10-15 in all.

Stay tuned

We'll be exploring Fable more over the coming weeks as we tune our agentic loop to work better with the model and compare it more rigorously across our benchmarks.