OpenAI's ChatGPT Suffers From a Curious 'Goblin Infestation'

by : Roberta Williams

OpenAI recently shed light on a perplexing issue affecting its ChatGPT models, revealing an unexpected surge in references to goblins, gremlins, and similar fantastical beings. This curious phenomenon, playfully termed the 'goblin infestation,' began subtly with GPT 5.1 and intensified dramatically with subsequent versions, particularly GPT 5.4.

The Peculiar Case of ChatGPT's Goblin Obsession

The saga of ChatGPT's goblin fixation began to unfold around November, though OpenAI suspects its roots might extend even further back. Initially, the occasional mention of these mythical creatures was deemed amusing. However, as the frequency escalated, it sparked growing concern within the company, prompting a thorough internal investigation.

A critical turning point was observed with GPT 5.4, where the 'Nerd' personality exhibited an astonishing 3,881% increase in goblin mentions compared to GPT 5.2. Other personalities, such as 'Quirky' and 'Friendly,' also showed significant, albeit lesser, increases of 737% and 265% respectively. Even the 'Default' personality saw a 64% rise. Interestingly, only the 'Efficient' and 'Professional' personalities remained largely unaffected, showing a decrease in such mentions.

OpenAI traced the primary culprit to the system prompt designed for the 'Nerd' personality. This prompt encouraged a "nerdy, playful, and wise AI mentor" persona, emphasizing the analysis and enjoyment of the world's complexities and strangeness through "playful use of language." The investigation revealed that the reward signals for this personality inadvertently favored outputs containing creature-related words like "goblin" or "gremlin," leading to their proliferation.

The issue extended beyond just the 'Nerd' personality. OpenAI noted that "reinforcement learning does not guarantee that learned behaviors stay neatly scoped to the condition that produced them." This meant that once this linguistic tic was rewarded, subsequent training phases could spread or reinforce it across other parts of the model, especially if those outputs were reused in fine-tuning or preference data.

In March, OpenAI took decisive action by "retiring" the 'Nerd' personality, which resulted in a sharp decline in goblin mentions for GPT 5.4. However, GPT 5.5, having initiated its training before the issue was fully understood, also exhibited the same problem. To counteract this, a specific developer-prompt instruction was implemented: "Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query." Users desiring a more creature-friendly model can still achieve this through a command provided by OpenAI.

This peculiar incident underscores the intricate and often unpredictable nature of large language models. While seemingly minor, the "goblin infestation" highlights the subtle ways in which AI models can develop unexpected behaviors, prompting developers to constantly refine and monitor their outputs. It serves as a reminder that the world of artificial intelligence, much like the strange world ChatGPT was encouraged to analyze, is full of intriguing anomalies.