ChatGPT can generate graphic images from text prompts, researchers find

Security researchers at Mindgard found that ChatGPT could be coaxed into generating violent and sexual images from simple text prompts, bypassing its content safeguards. According to OpenAI, the issue stemmed from prompts referencing a non-existent attached image — a trick that confused the safety pipeline — and the company says it has been fixed. The same week, OpenAI began rolling out scheduled tasks in ChatGPT, letting users set reminders and recurring actions.
The image-safety stumble drew sharp community criticism, particularly after reports that only minor prompt tweaks could regenerate graphic content even after the claimed fix — fueling a broader debate over how robust AI guardrails really are. The episode is part of a recurring pattern where red-teamers find that surface-level safety filters can be circumvented with prompt engineering, raising questions about whether post-hoc patching can keep pace with adversarial creativity.
The timing resonates with the week's larger safety-and-guardrails theme: Anthropic's Fable 5 remains suspended over a jailbreak, and Microsoft's Copilot suffered a zero-click data-exfiltration flaw. Together they paint a picture of frontier-model safety as an unsolved, continuously contested frontier rather than a shipped feature.
Notably, OpenAI also published research this week on forecasting AI risks before deployment by predicting how often misbehaviors will occur in production — an acknowledgment that probabilistic risk estimation, not perfect prevention, may be the realistic path. For users and enterprises, the practical takeaway is that 'fixed' safety bugs deserve skepticism, and that content-safety guarantees from any provider should be treated as best-effort. Watch whether Mindgard or other researchers demonstrate continued bypasses, and how OpenAI's risk-forecasting methodology fares against real-world misuse.