Service Blog - Azure DevOps

5 min read Original article ↗

Featured posts

We’ve Moved! – Introducing Azure DevOps Service Status Portal

Latest posts

Azure DevOps Availability Issues – 19 April 2019

We’ve Moved! – Introducing Azure DevOps Service Status Portal

We’ve Moved! – Introducing Azure DevOps Service Status Portal

Azure DevOps SRE

Azure DevOps SRE

Today, we’re happy to introduce Azure DevOps service status portal which helps with real time insights into active service events and provides further details on the event being investigated. This portal replaces our current experience using this Service blog. No new posts will be posted to this blog and existing subscribers are encouraged to use the rss feed that exists in the new portal. To help clarify what specific aspects of the service are affected, we will communicate impact of all active events in a 2-dimensional service matrix mapped between services and geographic regions of impacted organizations. ...

Postmortem: Azure DevOps Service Outages in October 2018

Postmortem: Azure DevOps Service Outages in October 2018

Azure DevOps SRE

Azure DevOps SRE

Earlier this month, Azure DevOps experienced several significant service outages, for which we are deeply sorry. As with every significant live site incident, we have completed a detailed root cause analysis for these. Due to the proximity of these incidents and common underlying causes, we wanted to share the details with you to ensure that you know what happened and what we’re doing to prevent them from recurring. October 3, 4 and 8 Incidents The incident on Wednesday, 3 October 2018 started with a networking issue in the North Central US region. Since our authentication service, SPS, is in this region the issu...

Postmortem – VS Marketplace outage – 4 September 2018

Postmortem – VS Marketplace outage – 4 September 2018

Azure DevOps SRE

Azure DevOps SRE

On Tuesday, 4 September 2018, Visual Studio Marketplace suffered an extended outage affecting most of its customers. Marketplace hosts and serves extensions for the Visual Studio IDE, Visual Studio Code, and Azure DevOps. This was the first instance of the Marketplace service going down completely, and we sincerely apologize for the outage. What happened and resultant customer impact Azure resources that Marketplace depends on (largely Compute, Storage and SQL) were down during the incident in Azure South Central US and this took down the single instance Marketplace service completely from 2018-09-04 09:45 UTC to...

Postmortem: VSTS 4 September 2018

Postmortem: VSTS 4 September 2018

Azure DevOps SRE

Azure DevOps SRE

Postmortem – VSTS Outage – 4 September 2018 On Tuesday, 4 September 2018, VSTS (now called Azure DevOps) suffered an extended outage affecting customers with organizations hosted in the South Central US region (one of the 10 regions globally hosting VSTS customers). The outage also impacted customers globally due to cross-service dependencies. It required more than 21 hours to recover all VSTS services in South Central US because the recovery of VSTS services was dependent upon Azure restoring the data center. After VSTS services were recovered, we had an additional incident which lasted 2 hours impacting Release...

Postmortem: Global VSTS availability issues – 22 May 2018

Postmortem: Global VSTS availability issues – 22 May 2018

Azure DevOps SRE

Azure DevOps SRE

Customer Impact: On 22 May 2018, Visual Studio Team Services (VSTS) experienced a major incident across multiple regions between 15:00 and 16:55 UTC.  An event in a Western European scale unit of the Team Foundation Service (TFS), caused a chain reaction that sporadically took other TFS scale units offline in other regions.  Based on our telemetry, we estimate a total of 20,800 users were impacted during the incident. Impacted Users over time Total request volume over time   What Happened: First, some background on a few components in VSTS. In the example:     So,...

Updated and Completed Postmortem: Performance Issues and failures in VSTS West Europe – 7 February 2018

Updated and Completed Postmortem: Performance Issues and failures in VSTS West Europe – 7 February 2018

Azure DevOps SRE

Azure DevOps SRE

A week ago we posted an incomplete postmortem and are now following up with the completed version. If you want the full story of how we progressed through this incident, start by reading that.  This postmortem will cover the full root cause analysis but it won’t rehash the first part of the investigation. Customer Impact On 7 February 2018 we had an incident which impacted users in our Western European scale unit. During this time, users experienced slow performance and 503 errors (service unavailable) when interacting with VSTS services. Close to 5,000 users were impacted at the peak of the incident. The incid...

Preliminary Postmortem: Performance Issues and failures in VSTS West Europe – 7 February 2018

Preliminary Postmortem: Performance Issues and failures in VSTS West Europe – 7 February 2018

Azure DevOps SRE

Azure DevOps SRE

Edit February 26, 2018: We have just posted an updated and complete postmortem here: https://devblogs.microsoft.com/devopsservice/?p=16295 Customer Impact On 7 February 2018 we had an incident which impacted users in our Western European scale unit. During this time, users experienced slow performance and 503 errors (service unavailable) when interacting with VSTS services. Close to 5,000 users were impacted at the peak of the incident. The incident lasted for two and a half hours on 7 February 2018 from 10:10 - 12:40 UTC. What Happened Our root cause analysis (RCA) has not gone well for this incident. ...

Postmortem – Intermittent Failures for Visual Studio Team Services on 14 Dec 2017

Postmortem – Intermittent Failures for Visual Studio Team Services on 14 Dec 2017

Azure DevOps SRE

Azure DevOps SRE

On 14 December 2017 we began to have a series of incidents with Visual Studio Team Services (VSTS) for several days that had a serious impact on the availability of our service for many customers (incident blogs #1 #2 #3). We apologize for the disruption these incidents had on you and your team. Below we describe the cause and the actions we are taking to address the issues which caused these incidents.  Customer Impact The issues caused intermittent failures across multiple instances of the VSTS service within certain US and Brazilian data centers. During this time, we experienced failures within our applicati...