Settings

Theme

Atlassian Exceeds 99.9999% of Availability Using Sidecars, Fault-Tolerant Design

atlassian.com

17 points by stelliosk 3 years ago · 27 comments (26 loaded)

Reader

sebslomski 3 years ago

Interesting.

* Atlassian: We estimate the rebuilding effort to last for up to 2 more weeks: https://news.ycombinator.com/item?id=30990697

* Inside the longest Atlassian outage: https://news.ycombinator.com/item?id=31015813

* Atlassian products have been down for 4 days https://news.ycombinator.com/item?id=30973808

* Post-incident review on the Atlassian April 2022 outage https://news.ycombinator.com/item?id=31210469

  • jsiepkes 3 years ago

    Yes, but you see, that wasn't a full outage because not all customers were affected. So therefor it doesn't count as downtime according to the SLA... :')

  • oxfordmale 3 years ago

    Just checking the Atlassian status page and there is an active incident:-)

xorcist 3 years ago

An availability of 99.9999% means a maximum of 31 seconds unavailable per year. The usual "five nines" is 5 minutes, and that's a tough target for anyone.

Given that their outage was from April 4 to April 19 this year, they should reach their target availability on average at the earliest in the year 45222. If they keep perfect uptime in the meantime, that is.

warent 3 years ago

lol. They just had a multiple week outage this year. No, they cannot claim this level of availability until around May 2023. This is marketing nonsense trying to cover their massive April mistake.

  • h2odragon 3 years ago

    "fault tolerant design" == "we knew the design was faulty, we tolerate that"

  • rozenmd 3 years ago

    This part of the system didn't go down.

    They just deleted a shit ton of customer data, and had to manually restore it. The system itself was still available if your data wasn't part of the deletion script.

  • miscaccount 3 years ago

    Well technically their system was up and running . Except that they did not have data to work on. /s

mkl95 3 years ago

> Atlassian Engineering recently published how it exceeded 99.9999% of availability with its Tenant Context Service (TCS).

What a misleading and cynical headline. Literally all Atlassian products I work with have some unexpected downtime every now and then.

posnet 3 years ago

The title is very misleading, it is just one of their micro-services that has that uptime.

  • MiscIdeaMaker99 3 years ago

    Yes, exactly. It's their Tenant Context Service. The headline is misleading, but really only for those who don't bother to read the article.

    • grnmamba 3 years ago

      Headline: "Smoking does not cause cancer."

      Article: "This study proves that smoking does not cause skin cancer."

CyanLite2 3 years ago

Misleading Title.

Should be: "Besides that Mrs. Lincoln, how was the play?"

kayodelycaon 3 years ago

Exactly which part of their system has 6 9s? It certainly hasn’t been Jira.

grnmamba 3 years ago

This is the worst attempt at corporate propaganda I've seen in a while.

https://www.atlassian.com/engineering/post-incident-review-a...

dang 3 years ago

Url changed from https://www.infoq.com/news/2022/09/atlassian-high-availabili..., which points to this.

  • itsdrewmiller 3 years ago

    Can you change the title too to something that doesn't seem like they are trying to mislead people? The real title is "Here’s how one of Atlassian’s critical services consistently gets above 99.9999% of availability"

    • dang 3 years ago

      If someone can suggest an accurate, neutral title, preferably using representative language from the article itself, we'll happily change it.

      (I'd do it myself but am just being pulled away)

0xbadcafebee 3 years ago

Atlassian's status pages have had "active incidents" for the last two days straight: https://status.atlassian.com/

Six nines of availability means no more than 30 seconds downtime per year.

Maybe the fault tolerance of one system isn't such a big deal if you depend on 30 other systems?

hericium 3 years ago

Didn't Atlassian irreversibly lost Confluence data of some of their clients this year after weeks-long outage?

fipar 3 years ago

I think this is relevant regarding the very misleading availability percentage in the title: https://rachelbythebay.com/w/2019/07/15/giant/

atulvi 3 years ago

Is JIRA not included in this calculation? They were down many times last year.

jayanmn 3 years ago

>achieved this high availability by implementing highly-autonomous client sidecars, able to proactively shield themselves from complete AWS region failures.

complete region fail? How often does that happen?

rwbhn 3 years ago

Actual title: Here’s how one of Atlassian’s critical services consistently gets above 99.9999% of availability

jtthe13 3 years ago

Escaping confluence and transitioning to a competing service was the highlight of my summer.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection