Ask HN: Why diverse amount of monitoring and observability tools?
Can't there be one tool which can collect metrics and logs, parse it and use it for visualization under one hood.
I am writing after experience I had with setting up trio of OpenTelemetry , Prometheus and Grafana. I think there's such diverse needs in this space, and everyone has a different budget for observability that a single tool just wouldn't make sense. > everyone has a different budget for observability Wouldn't open source software make sense here? I may be too naive, but hosting 3 different services which communicate with each other instead of 1 service handling all that stuff would make more sense to me. Just my opinion as I have been recently engaged in this observability space. I think even stuff like the storage model can change who deems a tool acceptable. A decade or two ago, I liked Ganglia which used a round robin database. You'd lose granular data over time, but you'd keep aggregate data. I.E. one-minute metrics may be available for a week, but after that you'd have 15-minute metrics stored for a month, and then maybe hourly metrics for a year. Disks were expensive and small so this model made monitoring feasible for some people. Other tools just dropped old data so you needed to choose to retain for a long time ($$$) or only keep recent data. Storage is now quite cheap and we're storing even more metrics, deep insights into application behavior. With many tools you're back to storing as much data as your budget allows. There's a lot of focus on building queries over the data, turning those into alerts, and keeping those performant. Open source often means self hosting. If you've got an Ops team, sure, go for it. Otherwise focus your engineers on your app and use a managed tool. Really great insights. Would you classify which tier companies/projects needs one-minute metrics, 15-minute metrics or hourly metrics ? These frequency of collecting metrics is set by scrape interval on the receiver's end , right ? A SaaS can provide granularity on that based on the certain factors such as in event of peak traffic scenarios , etc... the "budget" you mentioned can also encompass the costs to run, or configure, the tool which may be implied in what you meant by "budget" but in my experience is worth highlighting even if one isn't subject to Datadog-billing-nonsense plus there's also the differing licensing terms <https://github.com/grafana/grafana/blob/v10.2.3/LICENSE>, often differing even within projects themselves <https://github.com/grafana/grafana/blob/v7.5.16/LICENSE>