Polar Signals Cloud: Always-on, zero-instrumentation continuous profiling
polarsignals.comHey HN! Myself and several members of the Polar Signals team have been long time Observability ecosystem maintainers of projects such as Prometheus, Thanos, Cortex, Kubernetes and many more. In 2018 we read the "Google-Wide Profiling" paper and immediately realized we have to treat profiling just like any other Observability data, by systematically and comprehensively collecting it for our entire infrastructure and applications and store that throughout time.
At the same time eBPF was on the rise and it was a match made in heaven, we can provide an incredible user experience without requiring any code changes, and all our users need to do is deploy an agent (a single command to run against a Kubernetes cluster), and we can profile just about any popular language (and those that we can't profile yet we have on the roadmap). Not only does eBPF get us an incredible user experience, it also allows us to collect this data at never-before-seen low overhead (<1%) since we can grab exactly the data we need and nothing more.
In addition to that we built a custom columnar database to back this product to allow a Prometheus-style label-based query language to work on this very high cardinality data.
In a world where the economics of Moore's Law have already stagnated, everyone is going to have to learn to build more efficient software, and the root of all evil performance optimizations are those that are not based on measurements.
We're going to be hanging out in the comments all day, so please feel free to ask us any questions you might have, as well as any feedback you have for us!
The pricing tiers seem a bit rough for small shops. One to 44 CPUs are the same 50 dollars per month right? I work a very small shop with just me as the developer and just one server. I'd really like something smaller.
Pricing is never final, so that feedback is valuable, thank you! If you want to schedule a chat I'd love to learn more about your use case and happy to figure out what works for you!
https://calendly.com/frederic-branczyk/getting-to-know-polar...
> In addition to that we built a custom columnar database
I did some digging in your blog history and it seems that is referencing https://www.polarsignals.com/blog/posts/2022/07/22/frostdb-i... and digging into the "but why?" section <https://github.com/polarsignals/frostdb#why-you-should-use-f...> seems to imply you favored the embedded feature over having something standalone, but I would enjoy hearing (or reading a blog post!) about why you felt it was a better use of your engineering to make your own columar DB versus using one of the existing columanr dbs that I have seen referenced a ton in other Show HN announcements around both logging and metrics services
Great question!
The big one that existing columnar databases can’t do (or not well), is the ability to search and aggregate on user defined dimensions (think prometheus-style labels). Influx 3.0 is the only other columnar database that is (now) available that was engineered specifically to be able to do this. The good folks at honeycomb came to the same conclusion that this type of columnar database (wide column) is necessary to build Observability with exceptional user experience. Plus we now own our future and can do any kind of optimization while other companies (and I’ve spoken to them) constantly fear ClickHouse relicensing or otherwise destroying their business.
//edit: we call this feature dynamic columns: https://github.com/polarsignals/frostdb#dynamic-columns
Congratulations on the launch, it's been great watching your journey so far.
Thank you for the kind words! Try the product and let us know what you think! :)
I rarely come out and post on HN, but I couldn't resist.
First off, congratulations on the launch!
I am doing some work around Python tracing using eBPF tools and spending a lot time reading about CPython interpreter and runtime implementation. I came across this Polar Signals blog post[0] last week. Fantastic, fantastic work! I learned a ton about Python internals in just a few minutes that I had completely missed even after reading the code for days. This post also set a benchmark for technical writing for me.
Huge kudos for all the great work! And thank you for sharing all these out!
[0] https://www.polarsignals.com/blog/posts/2023/10/04/profiling...
Thanks for the shout out! The team has done a phenomenal job.
If you enjoyed that post, I think you’ll also like the one we wrote about native unwinding without frame pointers: https://www.polarsignals.com/blog/posts/2022/11/29/dwarf-bas...
Congrats on the launch!
Continuous profiling is a game changer and once you use it on your production systems you can't do without it.
Thank you! Agreed, it's a real super power!
The first time we got differential flamegraphs working we were so mesmerised that we could finally see exactly what the difference between a low and a high point (like a CPU spike) was, that we clicked around our own code for a good hour.
It's pretty hard to tell from the website what this product does, and the submitted title ("Polar Signals Cloud Is Generally Available") doesn't explain anything, so I replaced it with a phrase from https://www.polarsignals.com/blog/posts/2023/10/10/polarsign... that seems to say something.
Thank you! We'll work on the messaging on the website!
One of my most popular HN comments ever has some suggestions:
Congratulations to you and the Polar Signals team!
Wishing you all the best with your launch and looking forward to seeing this product help countless developers and businesses!
Congratulations on the launch! I'm a huge proponent using eBPF for profiling and observability in general.
I noticed pricing doesn't include anything about data retention.
Does the data live indefinitely? How far back can I query?
Great question! At the moment it is 30 days. We've had a few requests to make that configurable, but the reality is that the most costly part is to lay it out in an optimized way in the first place which happens very early on in the ingestion path, and from there on it's only object storage cost, so if there is customer demand we can increase this.
Any plans on the roadmap for aggregation (compression) and long term retention or would you instead ship data back out to a separate tool?
I’ve enjoyed having 3-5 years of prom metrics retention to look at seasonal traffic trends but we spent a fair bit of CPU aggregating raw metrics down to the right granularity for that kind of long term retention. My feeling is that the observability world is moving towards small, localized installations with real-time data and separate systems with cheaper, slower, but long term retention. Curious how you see it as you’re building a product in the space.
Yeah that’s basically how this system is built as well. We buffer a certain amount of data in the ingestion nodes and then offload it to object storage. In object storage it’s actually laid out in the parquet format so we could allow users to export/download this. Or like I said, the object storage cost is really the smallest piece in the system we might even just decide not to have any retention at all.
I was excited to see Haskell in one of the screenshots (“difference detection”) only to realize that Haskell is not in the list of supported languages :(
Haskell support should land sooner rather than later actually! We are very very close it’s just that the Haskell compiler unfortunately creates binaries that don’t comply to the x86 ABI. It’s not too dramatic though, it just shuffles around a few registers’ purposes.
Prerequisites:
Kubernetes Cluster. Kubernetes Nodes are running Linux 5.4 or newer.
Dang.. but it does look like a tool I would want if I used K8s!
Oh we’ll make this more clear, actually only Linux 5.4 is required, Kubernetes is only to add more metadata. It still works just fine and we have customers running it outside of Kubernetes provisioned by ansible.
Is there any support for profiling js applications? I see node.js but seems like browser js isn’t supported?
Great question! That’s correct, for the moment we’re focusing on everything that happens in the backend. That said, the system will accept anything in the pprof format so I’d love to see something like “1% of users get profiled and data sent” from the frontend, that’d be awesome!
We’ll think about building something like this but if there is someone with a use case I’d love to chat and figure out how we can make it work together!
Congrats on GA! Excited to try this out.
Thanks for posting! We'll be hanging out in the comments all day if you have any questions or feedback!