[LWN subscriber-only content]
Welcome to LWN.net
The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider subscribing to LWN. Thank you for visiting LWN.net!
Many large language models (LLMs) are described as open source, but if one looks a bit deeper it turns out that is not actually so; the model may be free to download, it may be "open weight", but it does not fit the Open Source Initiative (OSI) Open Source Definition (OSD). Assessing the actual openness of models is not easy, as Arnaud Le Hors explained in his talk about the Model Openness Tool (MOT) at Open Source Summit North America 2026. The tool is designed to help users of LLMs understand to what degree a model is (or is not) open, and to combat the openwashing that is prevalent with LLMs.
The problem
Le Hors began by asking the audience a rhetorical question,
"do you think that all the models that are on Hugging Face are open
source? Are they even open models?
" Hugging Face, of course, is a
popular site for sharing and downloading LLMs, data sets, and
applications for working with them.
Much of what is available on Hugging Face, he said, falls short
of the basic requirements of an open-source license. Many vendors or
projects are creating their own licenses for models. Le Hors said
that this was not unlike the early days of open source; that created
"a lot of chaos
", which led to the creation of OSI and its
definition of open source. "Now, many years later, we're seeing a
similar type of challenge with 'open' AI
."
The models are often described as open-source, or just open, which
causes many problems. He said that, in fact, "there are a lot of restrictions
associated with the licenses under which they are made
available
". For example, some licenses try to limit the number of
users or try to place restrictions on the types of use: "They can
say, well, you can use my model, but not for military use.
" That
kind of limitation may be well-intended, but a license with use
restrictions still falls short of being open source.
People believe that if something is on Hugging Face, they can
simply download it and do whatever they want with it. He said that
those users may be infringing on the licenses and taking a legal risk.
Worse, some users download a model, do their own fine-tuning,
and then republish the model under a different license. This would be
the equivalent of downloading software under the GPL and then
republishing it under the Apache License. "You just
can't. Legally, it's not allowed.
"
Model Openness Framework
Le Hors said that those were the kind of problems that the Generative AI Commons working group of the Linux Foundation's AI & Data Foundation has been trying to solve with the Model Openness Framework (MOF). One might wonder, what about the OSI's Open Source AI Definition (OSAID)? He did not address the OSAID during the talk, but it could be because the work on MOF was underway separately from OSAID and a final version was introduced in April 2024, while OSI was still working on OSAID, which was not finalized until October 2024.
The MOF provides a structure for
evaluating machine-learning models and provides a framework for
describing how open (or not) a model actually is. The specification
sets up a tiered system with three classes that represent
"ascending levels of model completeness and openness
", with a
Class III ("Open Model") being the least open and a Class I ("Open
Science Model") being the most open because it not only allows
distribution and tuning, but also enables others to study how the
model was created as well as the data used to train it. If a model's
terms are too restrictive, it does not receive a classification at
all.
According to the specification, a Class III model would allow
fine tuning of a model, unrestricted usage, and creation of a product
or service based on the model. To meet the Class II definition, a
model would also need to include supporting libraries and tools,
inference code, evaluation code, as well as code for training the
model. A Class I model would have all the components included with
the previous classes, as well as a research paper that explains the
model, the components that would be needed to reproduce a similar
model, and the training data "used for any form of model
training
" that users could examine.
Openness, he said, has to do with the license a model and its artifacts are provided under, while completeness refers to what is included with the distribution. The framework covers 17 components that fall into three categories: code, data, and documentation.
For example, code might include the model's architecture and
training code; data would include the model parameters and training
data sets; documentation includes Hugging Face
model cards, technical reports, research papers about the model,
and so forth. He did not go through each of the separate components,
but said that "every component must have an open license
" that
is based on the principle of open-source software according to the
OSI. See slide 5 in his presentation
for a graphic that lists all 17 components.
A lot of the licenses, he said, fall short of addressing the
specifics of the different types of artifacts. For example, there are
not many licenses that are specifically designed cover data. There is
a license, OpenMDW, that is meant to
cover machine-learning models and all of their artifacts, "but it's
not generally used yet.
" There is a blog
post that goes into detail about the OpenMDW license and the
intent behind it.
Model Openness Tool
The MOT, "which is really what I want to talk about today
",
is an online registry and tool for classifying models. Many of the
models listed on the registry, he said, don't even qualify as Class
III because they do not have an open license at all.
The site has a list of models that have been submitted; it displays each model's classification, as well as information about the model, such as links to Hugging Face and GitHub with model resources, the organization supplying the model, etc. The information on the site is taken from YAML files in the MOT GitHub repository.
Le Hors spent some time showing off the site, exploring the model pages, looking at the YAML syntax for the model information, and so forth. He demonstrated the model evaluation form, which takes user input about a model and then provides a classification for the model. As an example, a user might put in all of the available data about a model and receive an evaluation that indicates the model only meets the criteria for Class III. They could then submit the model to be included on the MOT site as-is or make changes to the documentation, license, and so forth to improve the score. Once they are satisfied, they can either sign into MOT with their GitHub account and send a submission directly from the site, or download the YAML file and manually create a pull request. The documentation for the process is fairly comprehensive.
He used the Aquila-VL-2B
model from the Beijing
Academy of Artificial Intelligence (BAAI) as an example. He said that BAAI had
originally submitted a model that "completely failed to
qualify
", and then spent time working to have a completely open
model. "They came up with a new version that actually qualifies,
and they did a stellar job at filling out the record.
"
Other topics
Once he had finished with the demo, Le Hors said that he wanted to talk about some other work that the Generative AI Commons working group had been engaged in. The group has been working on a Responsible Generative AI Framework (RGAF) as part of its Responsible AI effort. He did not go into details, but invited the audience to look into it; there is a blog post about RGAF from March 2025, and version 0.9 of the document is available.
He also mentioned that the commons had started an exploration
working group within the past few weeks that is meant to "be a really
open space for people to come and discuss and explore different topics
related to generative AI or agentic AI
". He invited anyone who
might be interested to visit the web site and join one of the
bi-weekly calls that the group holds.
With just a bit of time left over, he opened the floor for questions. I asked a two-part question about how the submissions to MOT were audited, and why the group was using manual submissions instead of some form of LLM to create entries for the site.
Le Hors said that the project relied on the community to audit
submissions. "And just like you do for anything else like this, if
you lie and you get caught, you'll get a black eye, right?
"
As to why the group wasn't using LLMs, he said that people have tried
but so far have not had much success. "We haven't had
anybody really committed to this in a long period of time to really
make it work.
" Part of the problem, he said, was that there is no
standard for model data. The Hugging Face model card is unstructured
Markdown "with a little bit of metadata
". But he did think it
was possible to do, "it just needs somebody who's really motivated
to work through it
".
Another member of the audience asked if MOT was an independent project, or if it was an IBM project (Le Hors is an IBM employee). He reiterated that it was an LF project as the session's time ran out.
[Thanks to the Linux Foundation, LWN's travel sponsor, for funding my travel to Minneapolis to attend the Open Source Summit.]
| Index entries for this article | |
|---|---|
| Conference | Open Source Summit North America/2026 |