Settings

Theme

Failure Friday: How We Ensure PagerDuty is Always Reliable

blog.pagerduty.com

82 points by DougBarth 12 years ago · 6 comments

Reader

jedberg 12 years ago

I posted this on the blog but I thought I'd repeat it here:

The simian army isn't AWS only. :) Some of it runs on other stacks.

And the best part is, it is open source! So if you wanted to leverage the simian army, it wouldn't be that hard to modify it to run on whatever stack you want and then submit the changes back. :)

teh_klev 12 years ago

We just started using PagerDuty to deliver our Nagios alerts to landlines and mobile phones after losing confidence in Vodafone's pager network.

The other thing we like is the integration with HipChat to deliver alerts into our NOC chat room.

Overall we've been quite impressed....will be more impressed if you folks run into actual trouble but we still get our alerts :)

mjallday 12 years ago

Annecdotal I know, however: pager duty is the only service we rely on that has yet to go down on us. These guys are solid!

I like that tip on how to simulate a slow network too.

kapitalx 12 years ago

My first impression from the title was that this is a post-mortem for an actual failure on Friday. But after reading your post the title made more sense ;)

Great post!.

iLoch 12 years ago

It's Wednesday!

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection