Open Sourcing Peloton, Uber’s Unified Resource Scheduler
eng.uber.comIs it just me or is every big release from Uber just a custom rewrite of an existing technology? It seems their engineering department has a large not-invented-here attitude. I could be wrong - they're certainly large enough to have custom requirements that aren't met with what's on the market but the pattern is just becoming suspect.
I think you are right.
I think that what is going there is also a bit political. They started to grow their Engineering department so fast that they need to justify the headcounts now. So each team is trying to invent new projects all the time. Anecdotally, this was partially confirmed to me by a friend working there.
I said this before, but I still cannot understand why a service like Uber need so many engineers in the backend (multiple thousands). It is a complex distributed application, but nowhere near the scale or complexity of a Facebook or Google.
>I said this before, but I still cannot understand why a service like Uber need so many engineers in the backend (multiple thousands). It is a complex distributed application, but nowhere near the scale or complexity of a Facebook or Google.
Thank you so much, I thought I was going crazy. I understand the demands of running a service on the level Uber has, but well, for instance I can't imagine what kind of computational workload / infrastructure requirements would make developing your own resource scheduler a reasonable option - for a Taxi app? With non-essential (to the core product) machine learning?
Forgive me if I'm ignorant, but what exactly does Uber engineering team do?
edit: On their blog I was able to find that they namely "forecast rider demand", from a relatively small [0] article - that is, comapred to the article [1] about what essentially is "just" data visualization, which doesn't help my confusion much.
0 - https://eng.uber.com/neural-networks/ 1 - https://eng.uber.com/maze/
Those cars generate a lot of sensor data. (Tb per drive?). Id imagine that data needs to be made actionable and seperated into training and simulation sets pretty quickly. Mapping is a massive problem to automate.
Oh right, the self-driving cars. Well that's starting to make sense now.
Makes it possible for me to get a ride, process payments and refunds even when the data centers are having issues or when there are temporary internet problems.
What does the facebook engineering team do?
Sure but that doesn't require thousands of backend engineers unless they are reinventing everything... and I'd be left wondering what they are rebuilding since, over the past year, every one of my trips with Uber has been a bar-lowering experience...
To be fair, operations becomes a much bigger deal as you get bigger. It’s not just the app, it’s having your infrastructure not fall over (because 0.1% failure rate means losing a lot of money)
Think of all the random bugs you’ve seen in your job and told yourself “eh, this would take someone 2 weeks to fix and is almost never hit by customers”
I think one of the bigger challenge is that when you become bigger you launch more projects to handle the scale and each of those projects introduce new bugs for which you need new team of engineers.
Basically, once you hit scaling I think you end up with a super small team that managed to keep simple (instagram for example) or you end up with a huge team that explodes in complexity and needs to grow exponentially to handle all the extra complexity. Uber is very obviously in the later bucket.
I periodically see the same content on my FB feed because well, it is NBD if I see the same update from my friend several times.
Let me assure you, it is a BFD if I get billed twice for the same trip. So I am pretty sure Uber needs quite a few engineers to make sure that their stuff works correctly every time in every market for every customer.
I appreciate their efforts on open-source projects. Jaeger is wonderful and the effort they put into both making something great, and supporting the open standards (Opentracing and the legacy Zipkin propagation) is greatly appreciated. I recently had the need to write a service in Typescript (most everything else is Go), and I felt very at home using the Jaeger node bindings. It felt like I wasn't losing any features for using a less-popular language and everything just worked.
Sure, they just reinvented Dapper from Google... but unlike Dapper I can download and use Jaeger. That counts for a lot. Do I use their ride sharing service? Nope. But I do like their open source projects.
Maybe not at the same scale but imo their core system has a lot of tech challenges:
* Lots of realtime
* Resource scheduling problems
* Route optimization problems (especially with shared uber or shared lyft rides)
ok, but:
* Everything they do is low data (no video, image or anything high bandwith).
* Their whole model can be subdivided into smaller local problems (all users//drivers in the bay area have nothing to do with the users//drivers currently in NYC).
yes there is a couple of algorithms to develop for Uber Pool, and for the real time matching but everything else looks like a fairly simple app backend to me.
> Everything they do is low data (no video, image or anything high bandwith).*
So is everyone else? Storage and CDN isn't nearly as complex as ad serving on Facebook (Which is like Uber's matching - it's a realtime marketplace). Ad serving takes up relatively little bits.
I'm not a network engineer so I'm unfamiliar with how problems scale by bandwidth. I do know that solving NP hard problems is difficult, so I respect Uber engineers for that at least.
> need to justify the headcounts
Something similar happened in LinkedIn too I guess. Multiple teams building very similar tools that were on very related problems.
I wouldn't say this is a rewrite of existing technology. They borrowed concepts from other well-known open source projects, but this is substantially a wrapper around Mesos, not a competing project. The technical overview of Peloton[1] is more clear about this than the open source announcement, which is what's featured here.
_________________
Thanks for sharing! That is definitely a better link than the one posted.
Anyone who had to take an Uber after they switched away from Google Maps and onto their in-house half-baked mapping/navigation solution knows this is a huge problem.
I'm assuming like all mapping solutions it'll get better but for now, it's just full of bad routes, over-optimizing turns, out-of-date detours (for MONTHS!) and non-sensical U-turns
Not to defend Uber/NIH syndrome, but if Google wanted to specifically charge Uber more for maps because of usage, could they legally do that?
Maybe the rewrites are risk management?
They could just use a third-party (and at their scale, they can definitely negotiate a custom deal where they feed back usage data to improve the third-party’s service) or even use open source solutions like OpenStreetMaps. Even the latter (with the overhead of hosting it themselves) makes total sense at their scale.
That'd be in a contract though, so what if Google decided to not renew and only let Uber know like a month before the contract ended?
If these questions seem dumb it's because I know pretty little about legal battles in software.
I'd wager this contract has a 90+ day termination notification requirement at a minimum.
Google increased maps cost for everybody though, not just Uber.
https://gadgets.ndtv.com/apps/features/google-maps-apis-new-...
I'm asking whether only Uber could be charged more.
You've hit the nail on the head. Uber can't risk being one management decision from Google away from shutting down.
They are already forced to be like that by being in the app store.
Were they not recently embroiled in a lawsuit, too?
The problem with a comment like this is that it only takes into account when something was released publicly and not when the problem was first worked on and implemented internally.
Many internal projects that eventually become open-source often are not NIH projects because when the project was proposed their may have been no public open source projects or at least none that is mature enough. Even if something exists but it still in its early stages, it presents a lot of risk because your company isn't in the driver seat building and maintaining it.
Claiming something is NIH based on when it first became polished enough to be open-source ignores all the history behind the state of the world when a project was first worked on.
Is it just me or is every big release from Uber just a custom rewrite of an existing technology?
I'm a little sad that this is the top comment here. I mean, maybe you're right. But so what? Some people find this useful, and some won't. Same as anything else.
At the end of the day, every line of code added to the world's pool of OSS code is a Good Thing™ as far as I'm concerned. Even if it's something I personally don't have a use for.
I think we should encourage companies to release code as open source, and give Uber at least some small measure of "props" for the stuff they release. Maybe none of their stuff is a game changer like Linux, but it doesn't need to be.
Their urban gps locating technology based off satellite masking is pretty unique as far as I know.
Does anyone know an open source Uber Michelangelo?
there's pipeline.ai, airbnb said they'd open source theirs this year and also, TFX suite is getting there. Platforms are becoming popular
I’ve worked with Mesos pretty extensively before and when Uber first announced Peloton last year I was intrigued. Peloton seems to be a wrapper around Mesos that allows for running smaller, unique jobs without having to write a Mesos framework for each. Writing a Mesos framework for every small job you have can get annoying when you just want to define how your job should run and don’t really care for the resources or task allocation of the job, and it seems like Peloton solves this on Mesos. It’s similar to YARN but not limited to Hadoop. It would have been useful for the project that I worked on because it was more geared for our use case and shifting from Mesos to k8s would’ve been a huge engineering project.
Mesosphere created dcos-commons to reduce the friction of getting frameworks up and running.
Just call it (Uber) customized mesos. I find this article somehow deceiving and boring. I am pretty sure I can run this peloton thingy with most Mesos API calls.
It pretty much is, but the appeal (at least to me) is abstracting away all the framework writing. It seems like it’s easier to run small, unique jobs on it similar to something like YARN.
What exactly is Uber running in its clusters?
- Route Optimization
- Demand Forecasting
- Rider Hotspot Prediction
This post doesn't exactly tell us the true nature of their workloads (other than the crude categorization - batch, stateful, stateless), nor does it talk about the inflection points where off-the-shelf solutions don't cut it anymore and such customization is required. I mean some before & after numbers / graphs on resource utilization would have really helped.
At first glance I thought the "resources" here might be Uber drivers, which would be interesting.
Is this thing/Apache Mesos abstract enough to allow for such a use?
The best thing I can think of that fits what you're asking is an Actor framework (which abstracts the compute and message passing between objects for you).
Think Akka or New Orleans, etc.
Me too. Is anyone aware of such an open-source project?
This sounds a bit like Two Sigma’s Cook scheduler for Mesos. https://github.com/twosigma/Cook
Immediate reaction: Well, now there's Peloton, the fitness tech company, Peloton, the self-driving truck caravan company, and Peloton, the cluster scheduler...
...and peloton, CMU's NVMe-first db
lol "Latest development updates"
> Added notice that the project is dead
https://github.com/cmu-db/peloton/commit/484d76df9344cb5c153...
...and https://www.peloton.com and their oil & gas software.
And Peleton the open source RDBMS.
I remember when Mozilla got such heat for “usurping” the Firebird name (due to the name already being used by Firebird BD) - they then changed to Firefox.
Does Uber get held to the same standard or do we just assume all names are overloaded now?
Where do all these names come from? Is it Latin or Greek or something?
Peloton is a somewhat common French word for "ball" that became common in English as a sports term for a grouping in a bicycle race, by way of the Tour de France. https://www.merriam-webster.com/dictionary/peloton
Small detail, that's not what it means in French, it's basically used the same was as in English (a group of cyclist, group of race car drivers, etc.)
It is however pelota, ball, in Spanish.
https://en.wikipedia.org/wiki/Peloton I would guess
I searched on google, a "peloton" is "the main field or group of cyclists in a race".
In addition to what others have said here, "peloton" also means "fearless" in Finnish.
I have no idea if or why they would've used that or if they're just referring to the cycling thing, but I guess "fearless" could also be kind of fitting for this project.
Kraken was already taken[0]
My first reaction was "this just sounds like Mesos?". And it's cited in their page (which on first read I thought meant they were trying to act as a single pane of glass for Mesos/k8s/etc.):
In the OP blog post though, they assert "to our knowledge, there is no other open source scheduler which combines all types of workloads for web-scale companies like Uber."
And then, when you dig...it's just Mesos. They built a framework for Mesos. So, that's cool. But man, the puff piecery borders on dishonesty. I mean--Singularity has existed, and is implemented at very large scales, for a while. I'm sure Peloton is a fine scheduler, but there's a lot of huffing-one's-own-farts in the documentation here.
Singularity isn't a scheduler. It's generally invoked by traditional hpc schedulers like moab or slurm.
edit: clearly I'm thinking of a different Singularity than you.
Yeah - on a phone but I'm referring to the Mesos framework by HubSpot.
very early on in the article there is a "Alternate cluster schedulers" section which cites mesos.
I think eropple's point might be that it's listed as an alternate without a (or much) difference.
I second you
You mean not a bike company?!
Peloton should make a framework called Uber in response.
The term "peloton" obviously predates the company of the same name...
The term "uber" obviously predates the company of the same name...
Would people use open source stuff from a morally questionable company? Especially when its just a re-write of existing technology posted to a different github repo?
EDIT People get so up in arms about Google and Microsoft working with China and the military, but Uber has done some horrendous stuff on their own. Just curious where people think the line is OK to be.
Using their open source stuff doesn't fund them to do bad things.
Arguably it does indirectly - e.g. being able to say that you're responsible for Popular Tool X might help your brand, make it easier to make sales, etc
The thing with mortality is it's highly subjective. What horrible for one person is perfectly acceptable for other.
React and Angular are quite popular.