Hugging Face and Google partner for AI collaboration
huggingface.coWhat will each party gain from the contract they signed? The first few sections left me with a lot of questions.
From what I can tell, based on the later sections:
> new experiences for Google Cloud customers to easily train and deploy Hugging Face models within Google Kubernetes Engine (GKE) and Vertex AI
I assume that means a new API field like `huggingface_model: "google/flan-t5-base"`?
> Models will be easily deployed for production on Google Cloud with Inference Endpoints
That seems to mean the GCP button which is currently disabled in the Inference "Create a new endpoint" page (https://ui.endpoints.huggingface.co) will now be enabled, which is the clearest part of the announcement.
From vertexai inside gcp, there's a whole bunch of models that aren't just Google's (ex: yolo, llama) available in their model "garden"[1] that can be doployed relatively easily in gcp. It sounds like what's available in the model garden will be extended with what hugging face has to offer?
[1] https://cloud.google.com/vertex-ai/docs/start/explore-models
Google is terrified of open source AI and the only thing the people in charge understand is how to manipulate people with money, they lack other ideas or methods. So yeah, that would explain it.
They can’t be that terrified of it or they never would have open sourced TensorFlow.
They're even more terrified of torch or another framework take up the industry. As ML is a research-heavy space, getting ML researchers to use your platform has benefits.
Rational smart people would think so; but the group in charge at Google now just sees something that doesn’t have ads on it and kills it. They hit OSS hard in first round of layoffs.
But as people who only understand money they give HF a pile of cash to hedge against a FOSS risk to the balance sheet. It’s cheap insurance. Same with Claude.
Actuary tables and bank statements is all they understand, there is no technical leadership at google, accounting took over and the company only makes sense if you understand all decisions are made by the CFO.
This partnership is way too vague. I have to assume Google is just investing money into HF and getting some marketing and integrations in return. When deciding where to host models or which services to use for my startup, I have a hard time trusting GCP services though as being around for the long term. On the flip side, Azure and AWS you know will have support and staying power by their respective companies for as long as the companies are around.
+1, I bet it amounts to same as AWS/Microsoft: they get contact info for biz dev, a button in Inference Endpoint deployment for GCP, and mayyyyybe some custom integration to let them offer models without having a literal HF repo with the weights.
GCP sometimes shuts down their own services while still supporting them for a few years while encouraging their customers to switch to partner products.
One such example is the human-in-the-loop feature of document ai.
Maybe it’s a naive question, but here we go: Following HF for a while in the industry, despite all the UI/UX, what’s the business use case to use them instead a “s3” to distribute the models or some kind of torrent for a non-centralised model distribution?
After the whole debacle around the GPT4chan[1] and the whole gating mechanism for models, it’s hard for me to think how some entity can trust that they are not going to shutdown or do gating due to some ToS shenanigans. In other words: if you’re this man in the middle between models and clients, it’s not better to treat yourself as a “dumb pipe”?
N.B.: I think the company has a great culture seeing from the outside and I assume that I can be misinformed about their business model.
It's not naive and it's hard to answer, because the answer is equally naive:
It's sort of like asking what the business case is to have repos on GitHub instead of having a private git server / GitLab.
The value is because "that's where everything is happening": ex. I just did a 4 day hackathon wrapping llama.cpp on _all_ platforms for my as-yet unreleased app. If you need a local AI / llama.cpp model, you go to HuggingFace, full stop.
Then, I want to host these models on my own - I don't want to rely on the HF repos of 3rd parties being stable. Few clicks later, started my own, and uploaded the models. Then, I translate a Python function to Dart, and I can download these models, ranging from 2 GB to 28 GB, using the app, for free, without an API key.
That's much easier than S3, both in cost and integration time.
But still, the answer sounds naive and marginal I'm sure.
But in this case, Github had 3 major tailwinds: (1) the popularization of VCS, (2) the network effect due to the number of users and (3) it's more or less the mandatory part of modern software engineering.
For the use case that you mentioned, I agree that it is important and I understand. Still, for the folks that aren't relying upon LLMs or are doing some vanilla/traditional ML in some laggard industry, I have a hard time believing that those folks are going to HF.
(1) popularization of pytorch
(2) network effect due to the number of users
(3) uploading weights, distributing weights, running demo on GPUs is mandatory part of ML engineering
=== interlude ===
I hope it doesn't sound like I'm being argumentative; discussion is especially interesting to me because it often weighs on me how hard it is to explain HuggingFace. So I enjoy trying and improving at it.
=== longer analogy ===
Imagine if all mobile developers in 2008* needed to host demos on iPhones captive in a server farm somewhere.* Some company offered that for free. On top of it apps were 30 GB, but the company hosted downloads for free. So everyone is putting their stuff on there. Then that feedback loop continues while the field takes a historic spike in interest and it's 4 years later.
* AI developers in 2020.
** GPUs captive in a server farm somewhere.
== Musings ==
This sort of highlights a thread of discussion for startups, the unreasonable effectiveness of specialization. Data scientists in 2020 use Python because they can, they're not really familiar with GitHub as in VCS so their mental model of it is more a dropbox. All of a sudden there's an $X billion (so far) opportunity to clone GitHub, but make it marginally easier to use via hiding stuff that's necessary for all other software, and then light money on fire hosting GPUs and S3.
my experience with HF is that it's incredibly useful for exploring models, and making prototypes, demos and MVPs.
Once you want to scale to production, you're right, it doesn't make sense to use the HF repository, it makes more sense to clone it into S3 or something else that you have more ownership over.
It sounds like SketchUp for LLMs.
To be honest, I do not care about this gpt-chan comtroversy, I doubt most people who use them do, and they have built up a community and ecosystem around their offerings.
I am also surpriswd this has its own wiki page, although it looks like it has been rather quickly put together woth not the most fluid writing.
I do care in the sense that I don’t want to live in some sterilized clean room Disney world.
See also the neutering of all the big commercial models. No one is running / giving access to a high quality virtually (or completely) uncensored model.
Horses for courses...
Big Corp wanting to replace their call center with the robots will certainly not want them to be offensive to someone who just wants to pay their bill.
> have built up a community and ecosystem around their offerings
The community it’s great and I am user for a while. My doubt is that if there’s a lot of use cases where companies and/or MLEs/DS will do some “git pull model_v0.1” from any of the HF model store.
I work at a major tech company and we do this.
Congrats to HF on their pathway to being acquired by Google.
Most efficient route to shut it down.
It would presumably replace https://cloud.google.com/model-garden like how Youtube replaced Google Videos.
Please no. Oh god please no.
Meh they already have similar or better relations with Microsoft. If anyone, they would be the one acquiring HF. And it makes sense, aligned with GitHub in a way.
Soooo, ai.azure.com coming soon for GCP?
Idk what this means but, yes, HF does have identical partnerships with AWS and MS Azure.
I still remember google partnering up with XMPP. Could be lead and consume strategy (or lead, make confusing and slow down development).
Though tensorflow was made by google and it is pretty good library.
Embrace, extend, and extinguish
Can you elaborate on how the internal strategy at Microsoft from 1996 applies here?
More AI bad news
How so?
Because the sides of the partnership are disproportionately unequal and this imbalance creates incentives that are in the opposite direction of more openness.
Ahh the old "We've partnered with one of the largest proprietary software developers in the world to somehow improve open source things" trope never gets old.
Google is also one of the largest open source developers in the world, too.
So partnering with Google on Open Source can make a lot of sense.
But it's in vogue to hate everything Google does on HN at the moment, so oh well.
While the comment you are replying to does sound a bit cynical. The reality is HF already signed similar agreements with AWS and Azure several months ago.
I imagine that there’s not much Hugging Face can do about it if they’re getting pressure to make money by their investors.
"We've partnered with $company, but we don't really know what that means, so stay tuned" is the oldest PR stunt ever.
if google is largest proprietary software developer, which company is not?
all the other ones, hence "largest".
how sure are you that google is ranking lowest by contributions to open source projects ?
Your original question leaves no other answer, rephrase perhaps.
can't it is looked, should have put more thought before typing, others already pointed out what I intended to say.
Look, I know this is going to paint me as the stereotypical "Pedantic HN commentor", but sincerely, the only response to
"if google is largest proprietary software developer, which company is not?"
is "all the other ones".
I promise I'm not being purposefully obtuse, I really can't find any other meaning to that comment.
I really don't know how Google "ranks" w.r.t OSS contributions.
I think there is a language issue here (doesn’t seem like English is their first language?), so I suspect the original comments intent was different than what it actually says
I mean pedantically, both things could be simultaneously true:
Company XYZ is the largest developer of proprietary software
Company XYZ is the largest contributor to open source
If we're talking about raw amounts, not %-of-company's-output
A lot of folks here being critical of the partnership because Google hasn’t open sourced all of their models, etc.
Not too defend Google, but they have arguably the deepest AI knowledge in the industry and released many of the fundamental building blocks for today’s AI boom (transformers, Tensorflow, etc.)
Yet, their models are always deemed as second class (see recently Gemini), so I think they are trying to pretend to be "catching up" with vague announcements like this.
Good for shareholders, that's all. Not really sure I believe their "open science" argument.
If Google has such "deep AI knowledge" why do 100% of their AI products suck?
They have the best unreleased model, for a few years already.
They certainly “had” a lot of the building blocks and creators but most of those people seem to have moved on, and libraries like TF have become less used. I don’t think you can say today’s Google with it’s hand wavy AI products (Gemini with fake demos and still unreleased) and lack of core open source ML tools has the deepest AI knowledge anymore (see Meta for who has overtaken in the FAANGs and a bunch of startups like OpenAI who have taken a lot of the other talent).
The proof is in the pudding as the saying goes, and right now Google's llm pudding is neither the tastiest nor the most popular around.