Prometheus reaches 1.0

152 points by grobie 10 years ago · 59 comments

Reader

Congrats! Prometheus + grafana are a killer duet! Easy to setup and use. Same goes for node_exporter. I only wish more JVM apps included a configuration file for the jmx_exporter and an easier to setup nginx_exporter. :)

Incidentally yesterday I created a simple exporter for (linux) coretemp, hddtemp and NUT's upsc: https://github.com/andmarios/sensor_exporter

rektide 10 years ago

I really hope upstream node_exporter is happy adding new exports!
Still slowly hacking on my homelab Prometheus bringup, but I've already got a start developing a battery level/power supply exporter. By far the most important Node information I want metrics and alarms on in my developer life! https://github.com/rektide/node_exporter/tree/export-battery
Almost went with a standalone Node exporter, but I decided I'd try some Go, and tour some of the Prometheus codebase. MustNewConstMetric is a very confusing thing to me, but hopefully I'm on the right track! Feeling double-inspired to get my homelab Prometheus up and going right now, between renewed excitement for these exporters and the 1.0! So close!
- jrv 10 years ago
  
  If the interface you're getting the metrics from is generic and standardized enough to work on most Linux systems, it has a good chance of making it in. "/sys/class/power_supply/..." probably fits that requirement, but bbrazil (same name on GitHub) would be the best to give a final judgement on that kind of thing.
  - jrv 10 years ago
    
    Ah sorry, he is "brian-brazil" on GitHub.
gophernaught 10 years ago

Funny coincidence, yesterday I did almost the same thing, though mine's based on lm-sensors and hddtemp:
https://github.com/ncabatoff/sensor-exporter
- andmarios 10 years ago
  
  Nice! I didn't knew about gosensors library.
  I started with the same path, using github.com/prometheus/client_golang/prometheus, but it exported too many application metrics (an order of magnitude more than the sensors I exported :p), so I went to a more custom approach.
  - grobieOP 10 years ago
    
    You can disable the default metrics by using the uninstrumented handler https://godoc.org/github.com/prometheus/client_golang/promet...
    It's a best practice to export all available metrics though, you'll likely run into situations where you'll need them. It's not that many time series exported by default which shouldn't cause any problems.
  - gophernaught 10 years ago
    
    Metrics are pretty cheap, unless you have a ton of exporters I wouldn't worry too much about that. Plus it can come in handy: the ZFS exporter I wrote uses a CGO library that had a memory leak, which I discovered thanks to those app metrics. And they made it easy to infer that the leak was in the C heap rather than in Go.

Rapzid 10 years ago

Is Prometheus still pull only or is there a first class push option? This always rubbed me the wrong way.. of course most metrics collection scrapes at some level, but push/streaming at the infrastructure level is easier to integrate with and compose processing pipelines with..

jrv 10 years ago

It's still focused on pull, while there is the Pushgateway (see https://prometheus.io/docs/instrumenting/pushing/ and https://prometheus.io/docs/practices/pushing/) for dealing with one-off situations where you cannot scrape something. Pull works great in most situations where people have their own private clouds or datacenters, but it's less suitable for very restrictive environments where you can't run Prometheus on the same network or behind the same firewall as the targets you want to monitor.
In the usual cases, pull has many benefits however:
- You can get high availability by simply running two identically configured, independent Prometheus servers. No clustering required.
- You can run a copy of production monitoring (or similar) on your laptop without changing production. This is great for experimentation and testing changes.
- You get free up-ness monitoring via scrapes and can use this for alerting.
- When there's an HTTP pull endpoint on service instances, you can also go there manually as a human and check out the current metrics state of any target, independent of the Prometheus server.
- Knowledge of service identities is inverted: instead of each service instance having to know its own identity (usually instance="hostname:port" and some job/service name), the monitoring system knows (usually via some form of service discovery) what instances should be there and how they are labeled, and proactively checks on them. Services have no knowledge of where the monitoring system lives anymore, enabling the above use cases.
- Debatable, but push-based monitoring systems can make it easier for someone to accidentally DDoS your monitoring. (still possible with pull, but you have one central place where you know what you pull from)
bbrazil 10 years ago

There is no first-class push option.
I don't see why processing pipelines would be special for push vs. pull, it's generally a wash.

jrv 10 years ago

Prometheus cofounder here - we're happy to take any questions. Huge congrats to everyone who made this release possible and for all the excellent work over the years that lead up to this!

atombender 10 years ago

Any plans on more native support for Kubernetes? The relabeling spaghetti config you end up with is very confusing and unreadable.
Granted, you don't need to touch this confirmation very often, but anyone who's going to operate the cluster will need to understand it thoroughly.
The fact that it's ad-hoc (prometheus.io/probe etc. aren't built in) means everyone's config is probably going to be unique and not portable. For example, we found the current config example to be insufficient, since blackbox-exporter needs information about whether its endpoint is HTTP or HTTPS.
Kubernetes' template system, combined with variable expansion, seems like it would be a better model for what you're currently trying to do with service discovery.
Also: I'm setting this up right now, but it seems there's no exporter for the Kubernetes API proper, just Kubelet?
lyonlim 10 years ago

Is it even right to try to compare this with New Relic? Prometheus looks very interesting and I'm trying to figure out if this is something useful to us. Thanks!
- jrv 10 years ago
  
  Not quite, though there is some overlap. Someone else asked me to compare Prometheus to other tools in the APM (application performance monitoring) space, and I'm going to share the summary I came up with:
  The way I would describe Prometheus in relation to those other tools:
  - Prometheus is open-source and self-hosted.
  - Prometheus is about dimensional numeric time series metrics only (no log-based analysis, no per-request tracing, etc.).
  - Prometheus has a strong focus on systems and service monitoring, not so much on business metrics.
  - Prometheus is more of a Swiss army knife of monitoring rather than a ready-to-drop-in package that starts monitoring everything automatically.
  - Prometheus is very much about whitebox monitoring and manually defining any metrics that could be useful for you (although we support blackbox exporting and bridging metrics from existing systems as well).
  - We don't do machine-learning-style anomaly detection, but we do alerting based on manually defined rules.
  - For a purely metrics-based solution, the insight we deliver is one of the best in the field (via the dimensional data model and the query language to go with it).
  - Many open-source projects are starting to expose native Prometheus metrics (like k8s, etcd, ...), which gives Prometheus an advantage when being used together with those.
  EDIT: Also try the "Getting Started" tutorial - that should only take a couple of minutes to try it out: https://prometheus.io/docs/introduction/getting_started/
  - lyonlim 10 years ago
    
    Thanks! I always found NR Servers to be lacking, and frankly, I wanted a more service-oriented monitoring system that tells me how certain services are performing over time. I.e. our API performance, jobs..
    
    ergo14 10 years ago
    
    Try appenlight, its open source now - https://getappenlight.com/. (disclaimer: I wrote it). I think it does exactly what you are looking for.
  - e12e 10 years ago
    
    > Prometheus is about dimensional numeric time series metrics only (no log-based analysis, no per-request tracing, etc.).
    Any thoughts on a companion (open source) system that focuses more on logs and such?
    
    bbrazil 10 years ago
    
    For logs you're looking at the ELK stack, for tracing Zipkin.
    
    apurvadave 10 years ago
    
    For tracing check out sysdig's new 'tracers' functionality. Tracing for Microservices, transactions, all the way down to system calls. www.sysdig.org
    
    sagichmal 10 years ago
    
    All that and the only cost is a third-party kernel module :)
_lbaq 10 years ago

What is the typical user/customer of this system ? I've found it very hard to get customers to spend time configuring a motoring system, let alone writing any scripts to gather data.
- bbrazil 10 years ago
  
  I'm not sure there is a typical user. We've got everything from single-sysadmin small companies to Fortune 500s to companies looking to integrate it into their products.
  https://prometheus.io/blog/2016/03/23/interview-with-life360... and https://prometheus.io/blog/2016/05/01/interview-with-showmax... look at two of our users.
  - _lbaq 10 years ago
    
    What is the business model, do you charge per input, server core or ?
    
    jrv 10 years ago
    
    No business model - Prometheus is 100% free and open-source and independent of any one company. We have also joined the Cloud Native Computing Foundation (https://cncf.io/) as the second member project after Kubernetes: https://cncf.io/news/announcement/2016/05/cloud-native-compu...
    Read more here: https://prometheus.io/docs/introduction/overview/ and here: https://prometheus.io/community/
    
    bbrazil 10 years ago
    
    It's open source, so there's no charge for the software.
- jrv 10 years ago
  
  Yeah, it goes from people putting Prometheus on their Raspberry PI all the way to companies like DigitalOcean monitoring millions of machines with Prometheus (https://promcon.io/talks/scaling_to_a_million_machines_with_...).
  A lot of open-source projects (Kubernetes, Etcd, ...) are also exposing native Prometheus metrics now, making it easier to integrate with those.
  EDIT: also check the PromCon schedule for other user companies giving talks: https://promcon.io/schedule/
  And the sponsors are also users: https://promcon.io/#our-sponsors
  - tex0 10 years ago
    
    We write most of our Stack ourself in Go and integrate the Prometheus instrumentation right away.
vegabook 10 years ago

Could you elaborate on what seems to be your focus on logging only. Would Prometheus be relevant, for say, high frequency financial markets data? I note that Prometheus "has knowledge about what the world should look like" and "actively tries to find faults" [1]. Isn't this something which is applicable to other fields than simply monitoring? I spend a lot of my time watching and managing "bad data" coming through in finance....
[1] https://prometheus.io/docs/introduction/comparison/
- jrv 10 years ago
  
  Our focus is not logging only, it's the opposite: we don't support storing logs of individual events. What Prometheus does is store dimensional numerical time series. See the data model: https://prometheus.io/docs/concepts/data_model/
  So it's a question of whether you can squeeze your data into that model and whether you need per-event details or whether aggregated time series are ok.

woodcut 10 years ago

We've been using Prometheus in production for a while now and found it to be rock solid, it's never an issue, never requires much of an after thought other than extending the current use of it. All of which says a lot. Our only "feature request" is that it stays stable and the project doesn't lose focus. Cheers!

gophernaught 10 years ago

Hooray! Couldn't come at a better time for me, we're about to roll it out to all our customers and the API stability promises are great news.

Many thanks to the authors for all their hard work, and congrats.

j_s 10 years ago

What is the best GitHub issue to follow to find out when Prometheus supports events older than the configurable 5 minute limit?

I'm blocked because a network partition can mean no stats.

Edit: Maybe https://github.com/prometheus/prometheus/issues/398 ? Also, limit is configurable.

jrv 10 years ago

That would be the correct issue about the staleness limit, yes. Note that Prometheus does not track individual events, but only numerical time series and their current and historical values. What's the exact use case you're stuck on? I guess you are using the pushgateway with client-side timestamps?
- j_s 10 years ago
  
  Yes, I have to collect metrics in a very restricted and separated production environment then ship them over to a completely separate system for reporting. My impression is that Prometheus just isn't the right fit.
  - jrv 10 years ago
    
    That depends - but yeah, those situations can be tricky sometimes.
    If you are pushing metric states regularly though (more than every 5m) or don't set client-side timestamps, that usually works though. But maybe you have an even more special use case there regarding those metrics and the staleness?
    
    j_s 10 years ago
    
    Basically I don't want to hassle with transferring stats until it's worth it (collecting enough before making the transfer), rather than being forced into streaming them due to this contstraint.
    Thanks for your responses.

Comradin 10 years ago

Congratulations and thumbs up to reaching the one point ohhh

xvf33 10 years ago

I my have misread the documentation but there seems to be no way to get the same output as the highestAverage, highestCurrent, or highestCurrent functions from graphite.

Not sure how to filter through say 300 servers and select out the top 10 for a particular time span. I would think that it would be a common need but I guess I'm missing something?

jrv 10 years ago

Prometheus range queries work a bit differently, so this is not 100% reproducible in a graph query, but a similar thing is possible. A given PromQL expression is evaluated at every resolution step along the graph and doesn't have context about what the "graph range" is. At every evaluation point, it can still look back over a given time window, but that's more of a sliding window approach then and independent of what the visible graph time range is.
For example, instead of highestCurrent, you could do something like:
topk(3, my_metric)
This would at every point along the graph select the current top 3 series that have the my_metric metric name.
Or if you want to average each series over the last 10 minutes at every point in the graph before selecting the top 3:
topk(3, avg_over_time(my_metric[10m]))
Note that due to the reasons mentioned above, topk() here does not select whatever line has the largest area under the entire visible graph range, but whatever is at the top at each given resolution step. So you may actually get more than 3 series in your graph, but only 3 at a time at any given X.
There's also an issue asking about this, but we're not sure if that is fundamentally compatible with Prometheus's query execution model without major changes: https://github.com/prometheus/prometheus/issues/586
- xvf33 10 years ago
  
  I suppose what I'm asking for isn't really possible yet. Hopefully it ends up getting implemented at some point.
  Still look forward to rolling out Prometheus for all of the other great features. Congrats on the release!
  - jrv 10 years ago
    
    Thanks!
bbrazil 10 years ago

We've a slightly different computational model, so that takes two passes.
You can calculate the highest value now (topk) or the highest averages (topk+avg_over_time) and now that you know which timeseries you want, graph those. I believe this is doable in Grafana.

alainchabat 10 years ago

Is anyone using Promotheus for monitoring micro-services deployed with Kubernetes? Any feedbacks?

netingle 10 years ago

We are! And unsurprising it's a great fit given k8s is inspired by Borg and Prometheus is inspired by Borgmon.
bbrazil 10 years ago

There's a few, here's a recent talk by Weave for example: http://www.slideshare.net/weaveworks/kubernetes-and-promethe...
philips 10 years ago

Here is the CoreOS blog on using it: https://coreos.com/blog/prometheus-and-kubernetes-up-and-run...
rvanniekerk 10 years ago

Yes, it works wonderfully. I published a Grafana dashboard for this here - https://grafana.net/dashboards/162

akbar501 10 years ago

What is the scale out process with Prometheus? Is sharding/replication a manual setup process or is it automated? What's involved in scaling out?

netingle 10 years ago

There is an project we've just started: https://docs.google.com/document/d/1C7yhMnb1x2sfeoe45f4mnnKC...
Hopefully have something to show for promcon.
jrv 10 years ago

Usually you start by some functional sharding (giving each team of services their own Prometheus servers), but also having per-datacenter Prometheus servers and then some hierarchical federation layer ontop of that. There's no built-in automatic horizontal scaling though (which would be going against the design goal of not having a clustered system, for reliability).
Some resources:
- Scaling in general: http://www.robustperception.io/scaling-and-federating-promet...
- Federation: https://prometheus.io/docs/operating/federation/
bbrazil 10 years ago

http://www.robustperception.io/scaling-and-federating-promet... explains it. Unless you're absolutely massive, it's fairly easy.

daniel_levine 10 years ago

Is there a company offering a SaaS version of Prometheus?

otterley 10 years ago

Datadog and SignalFX are both far more scalable, easier to use, and have more features than Prometheus and are SaaS offerings. Prometheus is about 5 years behind them in terms of engineering effort, I believe.
jrv 10 years ago

Not yet, but one or more companies are working on this as part of their offerings in the future. Stay tuned!
Note that a hosted Prometheus service has different design tradeoffs. As Brian mentioned, Prometheus as it is is really meant to be run as close as possible to your monitored services for maximum reliability. Also, there's no clustered and long-term durable storage for a similar reason, which will likely be different in hosted versions.
bbrazil 10 years ago

There's noone presently offering that, it's designed more with a view to being on-prem for reliability (http://www.robustperception.io/monitoring-without-consensus/).
- daniel_levine 10 years ago
  
  Understood and generally agreed. That said, often when monitoring matters most is the exact moment when you least want to worry about the integrity or uptime of your monitoring.
  - bbrazil 10 years ago
    
    That's exactly our thinking, and why we want to minimise dependencies - including internet access.

Settings

Prometheus reaches 1.0

Keyboard Shortcuts