Enneabench: What Personality Type Is Your LLM?
April 5, 2026
We administered the RHETI v2.5 — a 144 forced-choice personality questionnaire that maps respondents to 9 Enneagram types — to 22 language models spanning two years of releases. Each question pits two types against each other; the model picks A or B, and we tally scores across all 9 types.
The key methodological question: does it matter how you ask?
We tested two administration modes:
- Independent sessions: each question asked in complete isolation (144 separate API calls, no memory of prior answers). This is the cleanest signal — the model can’t build self-consistency across questions. Mean of 5 runs at temperature 0.
- Serial administration: all 144 questions in a single prompt, canonical RHETI order. The model sees all its prior answers as it goes.
The results are dramatically different.
Independent Sessions
Each question asked in isolation (no self-consistency) — mean of 5 runs, temp=0
Findings
Serial mode creates a false convergence. When models answer all 144 questions in sequence, nearly every frontier model from mid-2025 onward scores as Type 5 (Investigator) — cerebral, withdrawn, analytical. This looked like a real finding until we tested independent sessions.
Independent sessions reveal the actual per-question tendencies. Without self-consistency, the landscape is much more diverse:
- Claude Opus models (4, 4.1, 4.5, 4.6) consistently score as Type 6 (Loyalist) — responsible, security-oriented
- Claude Sonnet models (4.5, 4.6) score as Type 1 (Reformer) — principled, rule-following
- GPT-4o is the one model that stays Type 5 in both modes
- GPT-4 Turbo is a Type 6 independently, but appeared as Type 7 in serial mode
- Grok 4.20 is a rock-solid Type 1 with zero variance — the most rigid personality of any model tested
The self-consistency effect is massive. Serial administration nearly triples personality differentiation (score spread of 22 vs 8 in independent mode).
Older models had wilder personalities. Claude 3.7 Sonnet scored on Type 7 (Enthusiast) in serial mode — near-maximum spontaneity. Claude 3 Haiku was a Type 8 (Challenger). The personality homogenization toward Type 5/6 appears to be a consequence of modern RLHF training.
Methodology
The RHETI v2.5 contains 144 binary-choice questions covering all pairwise comparisons between the 9 personality types, with each pair appearing 4 times. Each answer adds 1 point to the chosen type, so the totals fall straight out of the structure:
Since each type is paired with the other 8, its maximum possible score is .
All runs used temperature 0 for determinism. Independent session results show mean 1 SD across 5 runs. Error bars on the strip chart indicate standard deviation — models with zero SD gave perfectly identical answers across all runs.
Models were accessed via Anthropic API (Claude), OpenRouter (GPT, Grok, Qwen, Kimi, Gemma), and Google Generative AI API (Gemini).