MOT: a tool to fight openwashing in AI

9 min read Original article ↗

[LWN subscriber-only content]

Welcome to LWN.net

The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider subscribing to LWN. Thank you for visiting LWN.net!

Many large language models (LLMs) are described as open source, but if one looks a bit deeper it turns out that is not actually so; the model may be free to download, it may be "open weight", but it does not fit the Open Source Initiative (OSI) Open Source Definition (OSD). Assessing the actual openness of models is not easy, as Arnaud Le Hors explained in his talk about the Model Openness Tool (MOT) at Open Source Summit North America 2026. The tool is designed to help users of LLMs understand to what degree a model is (or is not) open, and to combat the openwashing that is prevalent with LLMs.

The problem

[Arnaud Le Hors speaking at Open Source Summit North
America 2026]

Le Hors began by asking the audience a rhetorical question, "do you think that all the models that are on Hugging Face are open source? Are they even open models?" Hugging Face, of course, is a popular site for sharing and downloading LLMs, data sets, and applications for working with them.

Much of what is available on Hugging Face, he said, falls short of the basic requirements of an open-source license. Many vendors or projects are creating their own licenses for models. Le Hors said that this was not unlike the early days of open source; that created "a lot of chaos", which led to the creation of OSI and its definition of open source. "Now, many years later, we're seeing a similar type of challenge with 'open' AI."

The models are often described as open-source, or just open, which causes many problems. He said that, in fact, "there are a lot of restrictions associated with the licenses under which they are made available". For example, some licenses try to limit the number of users or try to place restrictions on the types of use: "They can say, well, you can use my model, but not for military use." That kind of limitation may be well-intended, but a license with use restrictions still falls short of being open source.

People believe that if something is on Hugging Face, they can simply download it and do whatever they want with it. He said that those users may be infringing on the licenses and taking a legal risk. Worse, some users download a model, do their own fine-tuning, and then republish the model under a different license. This would be the equivalent of downloading software under the GPL and then republishing it under the Apache License. "You just can't. Legally, it's not allowed."

Model Openness Framework

Le Hors said that those were the kind of problems that the Generative AI Commons working group of the Linux Foundation's AI & Data Foundation has been trying to solve with the Model Openness Framework (MOF). One might wonder, what about the OSI's Open Source AI Definition (OSAID)? He did not address the OSAID during the talk, but it could be because the work on MOF was underway separately from OSAID and a final version was introduced in April 2024, while OSI was still working on OSAID, which was not finalized until October 2024.

The MOF provides a structure for evaluating machine-learning models and provides a framework for describing how open (or not) a model actually is. The specification sets up a tiered system with three classes that represent "ascending levels of model completeness and openness", with a Class III ("Open Model") being the least open and a Class I ("Open Science Model") being the most open because it not only allows distribution and tuning, but also enables others to study how the model was created as well as the data used to train it. If a model's terms are too restrictive, it does not receive a classification at all.

According to the specification, a Class III model would allow fine tuning of a model, unrestricted usage, and creation of a product or service based on the model. To meet the Class II definition, a model would also need to include supporting libraries and tools, inference code, evaluation code, as well as code for training the model. A Class I model would have all the components included with the previous classes, as well as a research paper that explains the model, the components that would be needed to reproduce a similar model, and the training data "used for any form of model training" that users could examine.

Openness, he said, has to do with the license a model and its artifacts are provided under, while completeness refers to what is included with the distribution. The framework covers 17 components that fall into three categories: code, data, and documentation.

For example, code might include the model's architecture and training code; data would include the model parameters and training data sets; documentation includes Hugging Face model cards, technical reports, research papers about the model, and so forth. He did not go through each of the separate components, but said that "every component must have an open license" that is based on the principle of open-source software according to the OSI. See slide 5 in his presentation for a graphic that lists all 17 components.

A lot of the licenses, he said, fall short of addressing the specifics of the different types of artifacts. For example, there are not many licenses that are specifically designed cover data. There is a license, OpenMDW, that is meant to cover machine-learning models and all of their artifacts, "but it's not generally used yet." There is a blog post that goes into detail about the OpenMDW license and the intent behind it.

Model Openness Tool

The MOT, "which is really what I want to talk about today", is an online registry and tool for classifying models. Many of the models listed on the registry, he said, don't even qualify as Class III because they do not have an open license at all.

The site has a list of models that have been submitted; it displays each model's classification, as well as information about the model, such as links to Hugging Face and GitHub with model resources, the organization supplying the model, etc. The information on the site is taken from YAML files in the MOT GitHub repository.

Le Hors spent some time showing off the site, exploring the model pages, looking at the YAML syntax for the model information, and so forth. He demonstrated the model evaluation form, which takes user input about a model and then provides a classification for the model. As an example, a user might put in all of the available data about a model and receive an evaluation that indicates the model only meets the criteria for Class III. They could then submit the model to be included on the MOT site as-is or make changes to the documentation, license, and so forth to improve the score. Once they are satisfied, they can either sign into MOT with their GitHub account and send a submission directly from the site, or download the YAML file and manually create a pull request. The documentation for the process is fairly comprehensive.

He used the Aquila-VL-2B model from the Beijing Academy of Artificial Intelligence (BAAI) as an example. He said that BAAI had originally submitted a model that "completely failed to qualify", and then spent time working to have a completely open model. "They came up with a new version that actually qualifies, and they did a stellar job at filling out the record."

Other topics

Once he had finished with the demo, Le Hors said that he wanted to talk about some other work that the Generative AI Commons working group had been engaged in. The group has been working on a Responsible Generative AI Framework (RGAF) as part of its Responsible AI effort. He did not go into details, but invited the audience to look into it; there is a blog post about RGAF from March 2025, and version 0.9 of the document is available.

He also mentioned that the commons had started an exploration working group within the past few weeks that is meant to "be a really open space for people to come and discuss and explore different topics related to generative AI or agentic AI". He invited anyone who might be interested to visit the web site and join one of the bi-weekly calls that the group holds.

With just a bit of time left over, he opened the floor for questions. I asked a two-part question about how the submissions to MOT were audited, and why the group was using manual submissions instead of some form of LLM to create entries for the site.

Le Hors said that the project relied on the community to audit submissions. "And just like you do for anything else like this, if you lie and you get caught, you'll get a black eye, right?"

As to why the group wasn't using LLMs, he said that people have tried but so far have not had much success. "We haven't had anybody really committed to this in a long period of time to really make it work." Part of the problem, he said, was that there is no standard for model data. The Hugging Face model card is unstructured Markdown "with a little bit of metadata". But he did think it was possible to do, "it just needs somebody who's really motivated to work through it".

Another member of the audience asked if MOT was an independent project, or if it was an IBM project (Le Hors is an IBM employee). He reiterated that it was an LF project as the session's time ran out.

[Thanks to the Linux Foundation, LWN's travel sponsor, for funding my travel to Minneapolis to attend the Open Source Summit.]


Index entries for this article
ConferenceOpen Source Summit North America/2026