When AI Outsmarts Average Minds: A Creativity Threshold

A large-scale study finds some language models now outperform the average human on creativity tests, while top human creators still lead. Discover methods, implications, and how AI reshapes creative work.

Nora Schmidt Nora Schmidt . 3 Comments
When AI Outsmarts Average Minds: A Creativity Threshold

7 Minutes

A single sentence can rearrange how we think about machines and imagination: some large language models now outperform the average person on standard creativity tests. Surprising? Yes. Alarming? Sometimes. Useful? Absolutely.

How researchers measured creativity

Early this year, a research team led by Professor Karim Jerbi at the Université de Montréal published a far-reaching comparison between human creativity and the creative output of large language models. The study, appearing in Scientific Reports on January 21, 2026, brought together familiar names in the AI community, including Yoshua Bengio, and leveraged responses from more than 100,000 human participants. The models under study included GPT-4, Claude, Gemini, and other leading generative systems.

Instead of using loose or subjective benchmarks, the team relied on a specific psychological test called the Divergent Association Task (DAT). The DAT asks a respondent to list ten words that are as semantically different from each other as possible. It is deceptively simple. A creative human answer might read: galaxy, fork, freedom, algae, harmonica, quantum, nostalgia, velvet, hurricane, photosynthesis. The scoring system quantifies semantic distance between words, giving a rapid, comparable measure that works with both humans and machines.

The study did not stop at word lists. To probe whether a high DAT score predicts more general creative ability, the researchers also evaluated performance on creative writing tasks: composing haiku, outlining movie plots, and drafting short stories. These tasks reveal not only combinatorial novelty but narrative coherence, voice, and genre awareness.

What the results reveal

The headline: some AI systems now exceed the average human on measures of divergent, linguistic creativity. Models such as GPT-4 achieved higher mean scores on the DAT than the typical participant in the 100,000-person dataset. That is a milestone. Machines are no longer lagging across the board on basic idea-generation tests.

But the nuance matters. When the researchers sliced the human sample by creativity percentile, a clear pattern emerged. The top half of human participants—those above median creativity—still outscored all tested models on average. The gap widened further among the top 10 percent of human creators, who remained distinctly ahead of even the strongest large language models. In short: AI can match or beat the average, but not the elite.

Statistical rigor underpins these observations. Co-first authors Antoine Bellemare-Pépin and François Lespinasse developed a framework that aligns scoring procedures for humans and machines, eliminating many of the methodological mismatches that have plagued earlier comparisons. By doing so, they exposed a genuine shift in capabilities while preserving the reality that human creativity is a distribution with a long, high-performing tail.

How model settings and prompts alter creativity

The study tested not only different models, but also how creativity can be nudged through configuration and instruction. Two factors proved decisive. First, the model temperature—a parameter that controls randomness—strongly influenced outputs. Lower temperature yields safe, predictable wording. Raise it, and the model produces more varied and unexpected associations. Second, prompt phrasing matters. Instructions that guide the model to consider word origins, metaphorical links, or etymology produced more surprising and original responses than bland or vague prompts.

These results foreground a familiar truth: creativity in generative systems is often interactive. The user, through prompt engineering and choice of parameters, shapes the model's creative profile. That makes AI less an autonomous genius and more a finely tunable tool.

Practical implications for writers, designers, and researchers

So what does this mean for creative professions? First, it reframes the conversation away from replacement toward augmentation. If an AI can reliably produce above-average ideation, teams can use it to expand their brainstorming bandwidth, generate drafts, or escape conceptual ruts. Second, the persistent superiority of top human creators suggests a division of labor: routine creative tasks and rapid prototyping may shift toward human+AI workflows, while high-stakes, original work will continue to reward exceptional human ingenuity.

There are risks and limits. Language-based creativity tests, like the DAT, capture certain kinds of novelty but miss sensory, embodied, or domain-specific creativity—think of a sculptor's intuition, a composer’s timbral experimentation, or a systems engineer’s architectural leap. Also, models can reproduce biases, and higher temperature settings increase the chance of incoherence or erroneous facts. Responsible deployment means using AI-generated ideas as fuel, not as finished products.

Scientific context and methodological notes

The DAT itself is a concentrated probe of divergent thinking, which psychologists link to the ability to produce many distinct ideas from a single prompt. Divergent thinking is only one component of creativity; convergent thinking, domain expertise, and persistence also play crucial roles. The research team took care to compare tasks that are portable across humans and machines, but they also acknowledge that a single metric cannot capture the full richness of human inventiveness.

Another methodological plus: the dataset scale. With data from 100,000 individuals, the team could assess distributions, not just averages. That exposes how AI aligns with median performance but diverges when compared to the high-performing tail. It also improves statistical confidence in cross-model comparisons and inferences about generalizability.

Expert Insight

Dr. Maya Albright, a cognitive scientist and science communicator, commented on the study's significance: 'This paper is a reality check. It tells us that machines have become reliable collaborators for routine ideation, but it also preserves the special status of human ingenuity. The next few years will be about forming workflows where AI stretches human creativity, not replaces it. That requires new skills: prompt design, critical filtering, and ethical oversight.'

Her point lands where research meets practice. Teams that learn to steer models—choosing prompts that encourage risk-taking, tuning temperature, and curating outputs—will get the most value. Labs and creative studios alike will have to teach people how to use AI as a generative partner.

Implications for the future of creativity

When a machine outperforms the average human on constrained creativity tests, it forces a reframing. The debate should no longer be about whether machines can be creative in a generic sense. The pressing questions are about domains, thresholds, and collaboration dynamics. Which kinds of creative problems are best handed to hybrid teams? How do we evaluate originality when humans and machines co-author ideas? Who gets credit when an inspiration emerges from a prompt-engineered exchange?

These are not just academic puzzles. They influence hiring, education, and intellectual property norms. They also shape how billions of users interact with generative systems that are increasingly woven into daily creative workflows.

Some AI models have crossed an important threshold: they are now useful creative tools for many tasks. But creativity remains a human thing at its highest levels. The future will belong to those who learn to make machines amplify the best parts of human imagination.

Source: scitechdaily

“The cosmos has always fascinated me. I write about space missions, astronomy, and the technologies pushing humanity beyond Earth.”

Leave a Comment

Comments

labcore

Feels overhyped but ok. Useful for brainstorms and quick drafts, not for deep originality. Prompt hacks change everything, and hallucinations will slip thru.

Marius

Is this even true? 100k sample sounds solid, but DAT = word novelty not real world creative leaps. Top creators still ahead, so...?

atomwave

wow didn't expect that, kinda eerie but cool. AI beating average on DAT? neat. Still worried about bias, and who owns the idea? feels like a tool not an artist.