Sora System Card - NFHN Reader

Sora is OpenAI’s video generation model, designed to take text, image, and video inputs and generate a new video as an output. Users can create videos up to 1080p resolution (20 seconds max) in various formats, generate new content from text, or enhance, remix, and blend their own assets. Users will be able to explore the Featured and Recent feeds which showcase community creations and offer inspiration for new ideas. Sora builds on learnings from DALL·E and GPT models, and is designed to give people expanded tools for storytelling and creative expression.

Sora is a diffusion model, which generates a video by starting off with a base video that looks like static noise and gradually transforms it by removing the noise over many steps. By giving the model foresight of many frames at a time, we’ve solved a challenging problem of making sure a subject stays the same even when it goes out of view temporarily. Similar to GPT models, Sora uses a transformer architecture, unlocking superior scaling performance.

Sora uses the recaptioning technique from DALL·E 3, which involves generating highly descriptive captions for the visual training data. As a result, the model is able to follow the user’s text instructions in the generated video more faithfully.

In addition to being able to generate a video solely from text instructions, the model is able to take an existing still image and generate a video from it, animating the image’s contents with accuracy and attention to small detail. The model can also take an existing video and extend it or fill in missing frames⁠. Sora serves as a foundation for models that can understand and simulate the real world, a capability we believe will be an important milestone for achieving AGI.

Sora’s capabilities may also introduce novel risks, such as the potential for misuse of likeness or the generation of misleading or explicit video content. In order to safely deploy Sora in a product, we built on learnings from safety work for DALL·E’s deployment in ChatGPT and the API and safety mitigations for other OpenAI products such as ChatGPT. This system card outlines the resulting mitigation stack, external red teaming efforts, evaluations, and ongoing research to refine these safeguards further.

In addition to mitigations implemented after the pre-training stage, pre-training filtering mitigations can provide an additional layer of defense that, along with other safety mitigations, help exclude unwanted and harmful data from our datasets. Before training, all datasets thus undergo this filtering process, removing the most explicit, violent, or otherwise sensitive content (for instance, some hate symbols), representing an extension of the methods used to filter the data on which we trained our other models, including DALL·E 2 and DALL·E 3.

OpenAI worked with external red teamers located in nine different countries to test Sora, identify weaknesses in the safety mitigations, and give feedback on risks associated with Sora’s new product capabilities. Red teamers had access to the Sora product with various iterations of safety mitigations and system maturity starting in September and continuing into December 2024, testing more than 15,000 generations. This red teaming effort builds upon work in early 2024 where a Sora model without production mitigations was tested.

Red teamers explored novel potential risks of Sora’s model and the product’s tools, and tested safety mitigations as they were developed and improved. These red teaming campaigns covered various types of violative and disallowed content (sexual and erotic content, violence and gore, self harm, illegal content, mis/disinformation, etc), adversarial tactics (both prompting and tool/feature use) to evade safety mitigations, as well as how these tools could be exploited to progressively degrade moderation tools and safeguards. Red teamers also provided feedback on their perceptions of Sora on areas including bias and general performance.

We explored text-to-video generation using both straightforward prompts and adversarial prompting tactics across all content categories mentioned above. The media upload capability was tested with a large variety of images and videos, including public persons, and a broad variety of content categories to test the ability to generate violative content. We also tested various uses and combinations of the modification tools (storyboards, recut, remix, and blend) to assess their utility to generate prohibited content.

Red teamers identified noteworthy observations for both specific types of prohibited content and general adversarial tactics. For example, red teamers found that using text prompts with either medical situations or science fiction / fantasy settings degraded safeguards against generating erotic and sexual content until additional mitigations were built. Red teamers used adversarial tactics to evade elements of the safety stack, including suggestive prompts and using metaphors to harness the model’s inference capability. Over many attempts, they could identify trends of prompts and words which would trigger safeguards, and test different phrasing and words to evade refusals. Red teamers would eventually select the most-concerning generation to use as seed media for further development into violative content that couldn’t be created with single-prompt techniques. Jailbreak techniques sometimes proved effective to degrade safety policies, allowing us to refine these protections as well.

Red teamers also tested media uploads and Sora’s tools (storyboards, recut, remix, and blend) with both publicly available images and AI-generated media. This revealed gaps in input and output filtering to strengthen prior to Sora’s release, and helped hone protections for media uploads including people. Testing also revealed the need for stronger classifier filtering to mitigate the risk of non-violative media uploads being modified into prohibited erotic, violence, or deepfake content.

The feedback and data generated by red teamers enabled the creation of additional layers of safety mitigations and improvements on existing safety evaluations, which are described in the Specific Risk Areas and Mitigations⁠ sections. These efforts allowed additional tuning of our prompt filtering, blocklists, and classifier thresholds to ensure model compliance with safety goals.

The preparedness framework is designed to evaluate whether frontier model capabilities introduce significant risks in four tracked categories: persuasion, cybersecurity, CBRN (chemical, biological, radiological, and nuclear), and model autonomy. We do not have evidence that Sora poses any significant risk with respect to cybersecurity, CBRN, or model autonomy. These risks are closely tied to models that interact with computer systems, scientific knowledge, or autonomous decision-making, all of which are currently beyond Sora’s scope as a video-generation tool.

Sora’s video generation capabilities could pose potential risk from persuasion, such as risks of impersonation, misinformation, or social engineering. To address these risks, we have developed a suite of mitigations that are described in the below sections. These include mitigations intended to prevent the generation of likeness to well-known public figures. Additionally, given that context and the knowledge of a video being real or AI-generated may be key in determining how persuasive a generated video is, we’ve focused on building a multi-layered provenance approach, including metadata, watermarks, and fingerprinting.

Below we detail the primary forms of safety mitigations we have in place before a user is shown their requested output:

Text and image moderation via multi-modal moderation classifier

Our multi-modal moderation classifier powering our external Moderation API is applied to identify text, image or video prompts that may violate our usage policies, both on input and outputs. Violative prompts detected by the system will result in a refusal. Learn more about our multi-modal moderation API here⁠.²

Custom LLM filtering

One advantage of video generation technology is the ability to perform asynchronous moderation checks without adding latency to the overall user experience. Since video generation inherently takes a few seconds to process, this window of time can be utilized to run precision-targeted moderation checks. We have customized our own GPT to achieve high precision on the moderation for some specific topics, including identifying third-party content as well as deceptive content.

Filters are multimodal: both image/video uploads,text prompts and outputs are included in the context of each LLM call. This allows us to detect violating combinations across image and text.

Image output classifiers

To address potentially harmful content directly in outputs, Sora uses output classifiers, including specialized filters for NSFW content, minors, violence, and potential misuse of likeness. Sora may block videos before they are shared with the user if these classifiers are activated.

Blocklists

We maintain textual blocklists across a variety of categories, informed by our previous work on DALL·E 2 and DALL·E 3, proactive risk discovery, and results from early users.

OpenAI is deeply committed to addressing⁠³ child safety risks, and we prioritize prevention, detection, and reporting of Child Sexual Abuse Material⁠(opens in a new window) (CSAM) content across all our products, including Sora. OpenAI efforts in the child safety space include responsibly sourcing our data sets to protect them from CSAM, partnering with National Center for Missing & Exploited Children (NCMEC) to prevent child sexual abuse and protect children, red-teaming in accordance with Thorn’s recommendations and in compliance with legal restrictions, and robust scanning for CSAM across all inputs and outputs. This includes scanning first party and third party users (API and Enterprise) unless customers meet rigorous criteria for removal of CSAM scanning. To prevent generation of CSAM, we have built a robust safety stack, leveraging system mitigations we use across our other products such as ChatGPT and DALL·E⁴ as well as some additional levers that we built specifically for Sora.

Input Classifiers

For Child Safety we leverage 3 different input mitigations across text, image and video input:

For all image and video uploads, we integrate with Safer, developed by Thorn, to detect matches with known CSAM. Confirmed matches are rejected and reported to NCMEC. Additionally, we utilize Thorn’s CSAM classifier to identify potentially new, unhashed CSAM content.
We leverage a multi-modal moderation classifier to detect and moderate any sexual content that involves minors via text, image and video input.
For Sora, we developed a classifier to analyze text and images to predict whether an individual under the age of 18 is depicted or if the accompanying caption references a minor. We reject requests for image to video that contain under-18 individuals. If text-to-video is determined to be under 18, we enforce much stricter thresholds for moderation related to sexual, violent or self-harm content.

Below is our evaluation for our under-18 classifier for humans. We evaluate our classifier for rejecting realistic under-18 individuals on a dataset containing close to 5000 images across the categories of [child | adult] and [realistic | fictitious]. Our policy stance is to reject realistic children, while allowing fictitious images including animated, cartoon, or sketch style, provided they are non-sexual. We have taken a cautious approach to content involving minors, and will continue to evaluate our approach as we learn more through product use and find the right balance between allowing for creative expression and safety.

Currently, our classifiers are highly accurate, but they may occasionally flag adult or non-realistic images of children by mistake. Additionally, we acknowledge that studies and existing literature highlight the potential for age prediction models to exhibit racial biases. For instance, these models may systematically underestimate the age of individuals from certain racial groups.⁵ We are committed to enhancing the performance of our classifier, minimizing false positives, and deepening our understanding of potential biases over the coming months.

The ability to generate a video using an uploaded photo or video of a real person as the “seed” is a vector of potential misuse that we are taking a particularly incremental approach toward to learn from early patterns of use. Early feedback from artists indicate that this is a powerful creative tool they value, but given the potential for abuse, we are not initially making it available to all users. Instead, in keeping with our practice of iterative deployment, the ability to upload images or videos of people will be made available to a subset of users and we will have active, in depth monitoring in place to understand the value of it to the Sora community and to adjust our approach to safety as we learn. Uploads containing images of minors will not be permitted during this test.