Fabric Is Just Plain Unreliable, and Microsoft's Hiding It. - Brent Ozar Unlimited®

Last Updated July 21, 2025

Update 2025/07/21: Microsoft heard the complaints, and ~~fixed the reliability~~ added a better status page. Baby steps.

Last week, Microsoft Fabric went down yet again for hours on multiple continents.

Oh, you didn’t hear about it? Let’s talk about why.

First, Fabric’s status page is fabricated bullshit. The link https://aka.ms/fabricsupport takes you to a localized status page that almost always shows all green checkmarks – even when the service is on fire. During last month’s 12+hour overnight outage, people were screaming on Reddit overnight that things were down, but the status dashboard was showing all green. When Microsoft employees woke up, they asked if people were still having problems – and then eventually got around to updating the status page to reflect the outage when it was clear that things were really borked.

Redditors have resorted to relying on reporting Fabric outages to Statusgator, who then tracks the time gap between a burst of user outage reports, to the time Microsoft actually updates their status page – and it ain’t pretty:

Second, the post-mortems are just as fabricated. After last month’s outage, the team posted on Reddit, and opened with this whopper:

Fabric/Power BI is deployed in 58+ regions worldwide and serve approximately 400,000 organizations, and 30 million+ business users every month. This outage impacted 4 regions in Europe and the US for about 4 hours.

See what they did there? They used big giant numbers to talk about the subscriber base, and then switched units of measure to talk about the affected population (just “the US”.) That’s like saying, “We served over 30 billion hamburgers last month, but unfortunately, just 1 country (the US) came down with food poisoning.” Gimme a break. Furthermore, the 4-hour thing is just wildly incorrect, as evidenced by the people screaming on Reddit overnight and into the morning.

The combination of factors that triggered this issue did not occur until we hit specific regions and usage patterns. This was caught at that point through automated alerting, and our incident management team initiated a rollback.

Specific regions like, uh, Europe and the United States. You know, small places. Villages, practically.

I absolutely love the second sentence as a world-class example of fabrication. Microsoft is accidentally admitting that their own internal alerting showed that Fabric was broken – but not their external alerting, aka their status page. They’re accidentally showing their cards that the status dashboard just doesn’t show the truth.

Next, Microsoft hides the Fabric outage history as quickly as they can. The status dashboard has no list of recent outages. I feel genuinely sorry for Fabric admins who struggle troubleshooting failed Fabric processes that were supposed to run overnight. They think it’s their own problem, not realizing that there was an overnight outage that Microsoft has simply swept under the rug as quickly as possible. The admin checks the status page, sees nothing, and continues troubleshooting, thinking it’s their problem.

Contrast this with the overall Azure status page, which has a prominent link to Azure status history, publicly calling out major outages and their post-mortems. Microsoft knows how to do this – but the Fabric team ain’t doin’ it.

I don’t understand why the Fabric team is so secretive about the outages.

It’s not like Microsoft Fabric even has a service level agreement.

It’s not like they’re giving refunds when your data is gone for hours at a time.

Oh you didn’t realize that?

That brings me to the only reason I can think of that someone would recommend Microsoft Fabric as a critical part of a company’s infrastructure today: ignorance. That’s where the blog post comes in, dear reader – I don’t want you to be ignorant.

Related