GPT‑4o has safety built-in by design across modalities, through techniques such as filtering training data and refining the model’s behior through post-training. We he also created new safety systems to provide guardrails on voice outputs.
We’ve evaluated GPT‑4o according to our Preparedness Framework and in line with our voluntary commitments. Our evaluations of cybersecurity, CBRN, persuasion, and model autonomy show that GPT‑4o does not score above Medium risk in any of these categories. This assessment involved running a suite of automated and human evaluations throughout the model training process. We tested both pre-safety-mitigation and post-safety-mitigation versions of the model, using custom fine-tuning and prompts, to better elicit model capabilities.
GPT‑4o has also undergone extensive external red teaming with 70+ external experts in domains such as social psychology, bias and fairness, and misinformation to identify risks that are introduced or amplified by the newly added modalities. We used these learnings to build out our safety interventions in order to improve the safety of interacting with GPT‑4o. We will continue to mitigate new risks as they’re discovered.
We recognize that GPT‑4o’s audio modalities present a variety of novel risks. Today we are publicly releasing text and image inputs and text outputs. Over the upcoming weeks and months, we’ll be working on the technical infrastructure, usability via post-training, and safety necessary to release the other modalities. For example, at launch, audio outputs will be limited to a selection of preset voices and will abide by our existing safety policies. We will share further details addressing the full range of GPT‑4o’s modalities in the forthcoming system card.
Through our testing and iteration with the model, we he observed several limitations that exist across all of the model’s modalities, a few of which are illustrated below.