Settings

Theme

Show HN: OpenStatus – Open-source monitoring with incident managements

openstatus.dev

171 points by tibozaurus 2 years ago · 72 comments · 1 min read

Reader

Hey HN!

We’re Max and Thibault building OpenStatus.dev an OpenSource synthetic monitoring platform with incident managements

1 min demo: https://twitter.com/mxkaske/status/1685666982786404352

We have just reached 2000 stars on GitHub

https://github.com/openstatusHQ/openstatus

We are really excited to hear your feedback/questions and connect further: our emails are max@openstatus.dev and thibault@openstatus.dev.

Thank you!

kc10 2 years ago

Congrats on the launch!!

I am previously the founder of a synthetic monitoring startup, devraven.io.

Just sharing my experience - monitoring is brutally competitive. From my conversations most large enterprises have very little synthetic monitoring, they use DDOG or other APM tools and do not want to try any new tools for few thousand dollar savings. And in a lot of cases they are comfortable with their custom test frameworks that use Selenium. Some are even worried that setting up synthetic monitoring will bring down their environment or trash their database with junk data ::sigh::

Most smaller companies we spoke to are not mature enough to have monitoring and did not have resources who can setup monitoring. They used to ask us for help to build tests for them. Asks for discounts on $29.99/mo price point were not uncommon.

After few months of operating the product, we did find few angels who were interested in investing in us (not the product). But in the end, we did not feel that we can make good use of investor money and provide a decent return to them, so we ended up backing out of the investment and chose to shutdown the product.

CAP_NET_ADMIN 2 years ago

In what ways is it better than Uptime Kuma which doesn't require a bunch of SaaS spaghetti to run and has much broader community support.

[1] https://github.com/louislam/uptime-kuma

  • gettodachoppa 2 years ago

    As a power user (not professional sysadmin) I love Uptime Kuma. A simple docker image that uses 110MB RAM and gave me all the monitoring I need for my home lab and my cloud VPS. Very easy to use too, lovely UI.

    • tibozaurusOP 2 years ago

      Uptime kuma is great but for some orgs and users they don't want to have an other stacks to manage

oooyay 2 years ago

First, congrats on the launch!

Why did you end up going with a SaaS model? 30 Euros or $31.50 USD is pretty expensive for something like a status site. You'd have a lot less to manage day to day and be able to focus more on innovating the product if you just sold the software, imo.

Why the focus on synthetic monitoring? As a SRE, I actively eschew synthetic monitoring. It's highly error prone and doesn't actually indicate regional availability. I'd like a status site that I could push a certain internally derived SLA for a given service to and the status site reflects the average over time of that windowed SLA.

SLA's are intended to incur customer refunds when they're violated if they're meaningful. If your synthetic monitoring shows an SLA of 4 nines but it was actually closer to 4.8 or 4.9 then you could be on the hook for causing your customers a good bit of legal pain. Just something to think about in this space.

Other status sites don't build external SLAs off of internal metrics because the process of deriving internal metrics that align with external outcomes is sufficiently difficult. Instead, they calculate an SLA based off of posted statuses over a period of time eg: Degraded, Down, Up. Supporting both modes could be a boon to potential customers.

Overall looks like a great start; good luck on your venture!

  • lucgagan 2 years ago

    > Why the focus on synthetic monitoring? As a SRE, I actively eschew synthetic monitoring. It's highly error prone and doesn't actually indicate regional availability. I'd like a status site that I could push a certain internally derived SLA for a given service to and the status site reflects the average over time of that windowed SLA.

    As an end user, hard disagree.

    GitHub is a great example of this. Their status almost always shows 100% uptime while the service is entirely unstable.

    It is clear that their uptime SLAs do not align with end user experience.

    As an end user, I care whether I can access and use the service. I don't care what broke in between.

    • oooyay 2 years ago

      I suspect on GitHubs front this has to do with how they populate their status site. They may update it manually once they identify customer impact. If they're using internal metrics to qualify the status site then they're likely not using all of the needed metrics to reflect customer impact. There's also a third possibility which is that between you and GitHub there's something that causes a partition or failure that is outside of GitHub and your domain of control.

      I agree with you that the ultimate value is in customer impact. I was saying "that's hard" but synthetic monitoring is not the solution because it doesn't achieve what it sounds like it achieves.

  • 101008 2 years ago

    I don't know much about statuses pages, I just check them to see if the services I use are having an issue. It's the first time I read about "synthetic monitoring", and from a quick Google search, it seems to referring to "automatic monitoring". A bsic versino of this would be to do a ping to see if the server is responding, or a HTTP request to see if it's returning a 200 status code.

    However, if I read your comment carefully, you are suggesting to provide an alternative where the company (owner) could decide manually when a system is down or up. If that's the case, wouldn't the status page be just a page template where someone logs into a panel and toggle a button to say "down" or "up" and post updates? If there is no automatic monitoring, the service would look more like a blog/tumblr/twitter than anything else.

    Or probably I am missing something because of my lack of experience and I am curious, I'd like to know!

    • oooyay 2 years ago

      Good question. Status sites usually advertise the availability of features. When your service to feature mapping is 1:1 with just a load balancer or a cache in between then it's relatively simple to calculate. The number of 500s on the load balancer, cache, or both indicates errors sent to users. As a company grows several services usually combine to form a single feature; think about how a company has a "sign in" feature. There's likely a service that handles typical username password auth, then one for SSO, one for passkey, etc... at this rate, you have several inputs but the outputs remain somewhat consistent. 500s seen on your most externally facing endpoints are errors to users.

      Now combine all of the above with a client that has retry capabilities. That client could be a modern web app or a desktop app. Eventually consistent systems often rely on retry behavior and rate limiting to achieve smooth user transitions. Now I can't simply rely on 500s being sent because they may indicate a timeout or a caching problem. Now I need to rely on statistics on specific endpoints that will definitely result in a user facing error. Collecting that in real-time (real-time enough for alerting, anyway) is challenging as a company at that scale could be dealing with an abundance of requests per second.

      When SREs get into an incident they'll often try to determine customer impact in order to know what hemorrhaging to stop first. Looking at a list of 500s in a system like that is often unhelpful, so we'll build dashboards of specific endpoints that show a level of degradation eg: "Show me all requests that did not have 2xx where the number of retries is 3". In my contrived example the client shows an error after the third exponential retry. If you were calculating availability purely off of the number of 500s you're not actually calculating customer impact, you're calculating the number of errors. That said it's a lot easier said than done to build a data system to make a query like what I described, much less to export it. So in order to provide accurate information the status site is updated manually.

      On the flip side of what you described, some errors don't have a statistic. For instance, if I force rotate everyone's password and kill logins then I might post that on the status site as well. If it's the result of a security action or vulnerability I might declare the service degraded for a period of time.

      • 101008 2 years ago

        Thank you very very much for taking the time to write this explanation. I learnt a lot today :)

  • tibozaurusOP 2 years ago

    Thanks again !

    Tbh we haven't thought of the sla violation

    For region availability we are planing to add multi region check per Monitor

    At the moment you can only set one region per monitor

  • paulddraper 2 years ago

    > Why did you end up going with a SaaS model?

    Convenience.

    More companies want Datadog,etc than to manage Datadog,etc.

impulser_ 2 years ago

Why is it normal in the Typescript community to rely on a lot of other SaaS providers to build a simple application?

This project relies on 4 different paid services. Why?

Why do you need a SaaS to handle your auth, mailing, database, and logging?

Aren't there libraries for these things in Typescript? Why pay for them?

  • ies7 2 years ago

    Disclaimer: This is my personal opinion/experience. Not all people/startup do this.

    In 2018-2020 my company (an FMCG company) asked me to temporarily lead the IT & product team of 2 startups that they invest (as a majority) a few years before. One is a telemedicine and the other is an e-commerce.

    Both of them have almost all of their auth, db, etc using other unrecognized newly startup SaaS.

    After a few meeting I realized that these startups is "guided" by the VC to use other startups service that the VC invest and in return the other startups will(must) use our service (telemedicine) for their employee.

    So all of these startup companies can claim the monthly active users and companies that use their products, we also get the topline revenue and then those numbers will be included in a pitch-deck for the next round of investment.

    To top that, for the telemedicine I also got a KPI to hire 200 programmers so we can also include that number in the pitch-deck. In 2 years, I got 3 talented one and less than 30 that can code fizz buzz or simple CRUD (with their language of choice).

  • cchance 2 years ago

    Not the dev but i know for me it mostly comes down to not reinventing the wheel, and it allows for a lot easier ability to scale while also allowing for free operation as well for indies.

    Turso: Has insanely large free level and means no need to run your own DB(though you can run your own sqlite locally), their free tier even just got drastically expanded.

    Clerk: 5000 free users, not having to deal with your own authentication.

    Resend: Avoids dealing and managing mail, and dealing with spam filtering etc, i dont know if they allow just using an internal smtp, but seems ok given 3,000 mails per month.

    Tinybird, i don't know enough about but also has a free plan...

    So mostly i'd imagine most of these aren't about paying for third party platforms, its about offloading tasks you don't want to worry about implementing yourself, and that also give you the ability to scale outside of the small initial deployment for cost.

    • impulser_ 2 years ago

      You don't have to reinvent the wheel. You just have to use a library instead of using a library and paying someone lol.

      There are hundreds of auth libraries out there that you can use. Not one of them charges you per user lol. We been doing this for decades. Why are we now paying companies to do it for us?

      This can be said about mailing, logging, and databases. I spent decades building web application not once was it hard to implement these features using libraries.

      In fact it easier than ever with the tooling with have today.

      No wonder 99% of starts up are losing money and going out of business. They are giving all there money away to the few that survive lol.

      I guess the typescript people don't appreciate frameworks like Rails, Django, and Phoenix that implement all these features for you lol.

      • arrowsmith 2 years ago

        > We been doing this for decades. Why are we now paying companies to do it for us?

        Probably for the same reason that every trivially simple web app is now a bloated React SPA - because it's all that many devs have ever known.

        They graduate and join companies where everything is built in an insanely over-complicated, over-engineered fashion using fifty layers of complex tooling because "that's how Google does it" and the senior engineers wanted to learn something new. So they assume that this level of complexity is necessary because companies wouldn't do everything this slowly and painfully unless they really needed to, right?

        Then they pass this lack of wisdom onto the next generation of juniors, and the cycle continues until no-one remembers that it doesn't have to be this way, to the point where people think that paying $99/month for a third-party tool makes more sense than "having to deal with your own authentication", as if authentication is some huge burden and not a basic cookie-cutter feature that's as old as the internet.

        By the way, in Phoenix you can get a fully-featured authentication system for free in literally ten seconds: just run `mix phx.gen.auth Accounts User users` then `mix ecto.migrate`. There, I just saved you hundreds of dollars.

        • 59nadir 2 years ago

          > Then they pass this lack of wisdom onto the next generation of juniors, and the cycle continues until no-one remembers that it doesn't have to be this way, to the point where people think that paying $99/month for a third-party tool makes more sense than "having to deal with your own authentication", as if authentication is some huge burden and not a basic cookie-cutter feature that's as old as the internet.

          It's significantly worse than that. You have developer YouTubers now that are megaphoning this exact lack of wisdom out to thousands of developers, either misguided older ones or completely new ones.

          The four listed things by one of the ancestors in this thread look like they basically came from one fairly popular YouTuber who advocates for exactly this type of thinking very loudly and with dubious credentials to do it.

      • lucideer 2 years ago

        > We been doing this for decades

        This is rarely a solid argument for doing anything in & of itself. Even less so when security is involved.

  • lucideer 2 years ago

    > Why do you need a SaaS to handle your auth, mailing, database, and logging?

    There's absolutely no reason to require SaaS to handle database & logging, but:

    1. For mail, in 2023, it's a defacto requirement, for any app. Sure you can do it yourself, but handling spam filters will be a challenge. Defaulting to SaaS on this is extremely defensible.

    2. For auth, in 2023, rolling your own auth that is secure & offers decent MFA is a similarly daunting task. Would it be nice if they offered an optional local auth backend, maybe. Would it be nicer if they offered a choice of multiple SaaS backends, definitely. But it's ultimately pretty defensible.

    3. It seems to me the DB can be local sqlite / libsqld (looks primarily aimed at dev envs but at least it's an option).

    ---

    On aggregate though you're right, this does seem excessively SaaS-y.

  • josevalerio 2 years ago
nodesocket 2 years ago

I run Uptime Kuma[1] in my home to monitor all my homelab and Kubernetes services. It's really awesome. How does OpenStatus compare to it?

[1] https://github.com/louislam/uptime-kuma

jaxn 2 years ago

I had just been looking at open source status pages this morning, and this was not in the list I was looking at.

OP, you might want to d a PR here: https://github.com/ivbeg/awesome-status-pages

Everyone else might be interested in that list of similar projects.

donavanm 2 years ago

Hey Max & Thibault, interesting approach. It seems like you're going for after a specific feature and (unintentionally?) pulling in some product areas that are very hard businesses. I believe youre reusing existing saas and framework tools to make your effort more effective, but may be asking your customers to adopt new dependencies as well.

I think your core offering is around status tracking and stakeholder notification. However you're also pulling in Monitoring/APM by running your own status checks, for example. I would expect any paying customer to already have monitoring and alerting of some type; New Relic, DataDog, Amazon Cloudwatch Synthetics, etc. Wouldnt your customers want to use their own existing metrics for SLOs, or existing alarms & alerts for incident detection? Similarly it seems like youre implementing alerting/engagement as well. Are you asking your customers to reimplement their PagerDuty/OpsGenie/VictorOps configuration? There's a lot of organisational inertia around business processes that define alerting & engagement. I haven't looked at userbase numbers in a long time but I would guess the vast majority of your target customers are using one of those three already.

If I was to guess initial adoption would be aided by "ease of use", particularly integration with the customers existing tools & process. Then differentiation and value is based on what those existing monitoring/alerting tools cant do, eg alternative data sources (APM vs RUM), automated/predefined response, approval processes, customized visibility & communication per client, etc.

disclosure: Principal at AWS. Comments are my own personal opinion, based on public information only.

jacooper 2 years ago

Looks good, however i find the pricing a bit on the high side, especially compared to others like uptime robot and hettrix tools.

  • tibozaurusOP 2 years ago

    Yep but we are on the same pricing range as BetterStack or Checkly based on the number of request we make per month month

snowstormsun 2 years ago

10 minute intervals and only 5 monitors is very limited for the hobby plan. Why shouldn't I use UptimeRobot (or any other alternative) instead which has 5 minute intervals and also only one status page for free?

  • iimblack 2 years ago

    For me uptime robot has been very buggy and unreliable so I’m looking for alternatives. Parts of the dashboard break often. Maintenance windows not respected. Alerts just not working at all when a real outage happens.

    • compumike 2 years ago

      If you're looking for UptimeRobot alternatives, would you be open to taking a look at what we've been building at Heii On-Call https://heiioncall.com/ ? Curious to hear if that covers what you need or if there are any critical features we're missing.

throwawake 2 years ago

Not to downplay the product which is awesome, but how common it is for a single logo to be used by multiple commericial entities ?

For example, this one seems to be from https://creativemarket.com/Mujigraphic/27123272-Logo-S-desig...

wicktron 2 years ago

Speaking of status pages, are there any that exist that can aggregate the status pages of various SaaS apps?

Meaning - Let's say I'm a company that subscribes to many SaaS apps (ie: Google Workspace, Slack, Zoom, etc.), but want to create an internal dashboard to monitor those SaaS apps and alert my internal users. What options are available?

tnolet 2 years ago

Checkly founder / CTO here. Got a ping we got mentioned. Good luck to the OP!

Checkly started bootstrapped (not open source) and it’s indeed a tough market. Very exciting also, but not super easy to get a foothold.

Anyway! Good luck again and will be following your efforts.

deadlast2 2 years ago

On your homepage https://www.openstatus.dev/ the Star on Github button shows 0. I think it should show the actual count 2.3k.

tiberriver256 2 years ago

The site looks visually very good. Lots of typos in your English translations though.

bluehatbrit 2 years ago

Congrats on your launch! It seems well designed and well thought though from a UX perspective. The landing page is also very simple and easy to wrap my head around.

I'd be tempted to use this for a small side project, but I'm not sure it has a huge draw when getting to the paid tiers. There are plenty of competitors out there and this doesn't seem to have a niche that this serves any better than them. Open source is great but I'm not sure it's a strong differentiator in this case.

I'd probably compare this to something like Plausible Analytics. They lean heavily into privacy-friendly, and easy GDPR/CCPA/PECR compliance. It being open source is what gives you confidence to use it, knowing that if the small business did disappear you can continue using the product on your own infra for as long as you want. OpenStatus seems to be trying to lean on the fact it's open source, but if their SaaS business failed it would be trivial to move to a competitor without much consideration. It's not delivering anything different, and it's not baking itself into my app or infrastructure, so moving is the case of setting up with a competitor and moving a subdomain.

I really hope you're able to make it work and get profitable though. Having high quality open source options is really fantastic, I just think it's going to take more to cement a place in the market.

todotask 2 years ago

I see that this was written with Hono web framework, one of an interesting idea for OpenStatus.

ilrwbwrkhv 2 years ago

Maybe it's just me, but seeing pricing on the first page of an open source project makes me puke a little in my mouth. I know that technically the code is open source and people need to eat and all that, yet, I can't help it but feel a little bit scammed.

mrfynd 2 years ago

congrats on the launch! product looks cool. just one thing though: the dots make it difficult to focus on the design or read text. or is it just me?

octagons 2 years ago

I'd recommend working on editing any public-facing copy and unifying your message. I'm assuming I know what this tool does, but these examples do not validate that assumption. Is it a platform or a service? In 3 different locations, OpenStatus is listed as the "open source monitoring XYZ with...": "on-call managements", "Incident Management", and "beautiful status page".

Examples copied from the landing page and front page of the GH repo.

> Open-source monitoring service

> OpenStatus is an open source monitoring services with on-call managements.

> The Open-Source Synthetic Monitoring Platform with Incident Management

> OpenStatus is open-source synthetic monitoring platform with beautiful status page.

> The open-source monitoring platform

  • tibozaurusOP 2 years ago

    Thank for the feedback

    Tbh we have struggled to find the perfect messaging in the last month

    But we will take it into account thanks again

jmartens 2 years ago

Pings don’t seem like enough info

typosaur 2 years ago

Do you really need 4 SaaS provides to run this?

From your docs: tinybird, turso, clerk, Resend

  • tibozaurusOP 2 years ago

    We use Tinybird to store the request data payload Turso for hosted SQLite Clerk for Auth Resend to send email

    We could have build everything by ourself or just just some providers to build faster when we launched we have chosen the latter

  • drorn 2 years ago

    What are you specifically worried about?

ushakov 2 years ago

Why should I use this instead of 100s others paid and open-source alternatives?

  • tibozaurusOP 2 years ago

    Because using our hosted solution you don't have to care about the infra :)

    • remram 2 years ago

      If I use the hosted version, I don't care that it's open source.

      If I run your open source software myself, I don't care that you have a hosted offering.

    • ushakov 2 years ago

      That’s a naive assumption. I can go with any of your competitors (Datadog, Checkly, BetterStack) and not care about the infra

      • tibozaurusOP 2 years ago

        And mostly closed source :)

        • typosaur 2 years ago

          Is being open-source the only differentiator you have?

          Only software devs care about this. Your competitors make millions of $ annually, without being open-source.

          • robertlagrant 2 years ago

            This seems a bit inappropriately toned. Plenty of businesses care about this, as it de-risks things if you know you can self-host if necessary.

            • chrisandchris 2 years ago

              Yes, sure. But the difficulty about monitoring is not the hosting per-se, but hosting in different datacenters throughout the world abd keeping all these services up.

              My monitoring should not show "just down" if users from location A can't reach it but everyone else can.

              • ushakov 2 years ago

                How many web-services out there are actually geographically distributed? Most companies just host everything in us-east1

                • chrisandchris 2 years ago

                  Uptime monitoring services? I think most of those I tried (and I tried many in the past weeks).

            • lucgagan 2 years ago

              I never understood the "de-risk" things angle. Is the idea that you'd self host if the service went under?

              • robertlagrant 2 years ago

                Yes, or a new company could post on Reddit, "would anyone like this service?" and spin up a replacement. Probably not worth it in this instance, but I imagine a lot of people sleep better at night knowing that Postgres is available to self-host, or from a variety of cloud providers.

        • lmeyerov 2 years ago

          It's worth coming up with a stronger public-facing answer

          We went through this last year, I think we have a public one, a private one, + our actual more 'serious' telemetry (opentelemetry, ...). For the status pages, I think one we don't pay for, and the other is like $20/yr.

          It's a crowded space, both open + closed, so clearer differentiation seems useful for your users and for your own journey: https://github.com/ivbeg/awesome-status-pages

designdev1996 2 years ago

congrats on the launch!! Something i've been looking for, for a while now

jjtang1 2 years ago

Congrats on the launch, the early traction is great to see.

Would be happy to jam more on my experiences building Rootly.com, an incident management platform on Slack used by Canva, Cockroach Labs, and others! :)

-JJ

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection