Neuro-Intelligence LLM Benchmark · NILB v2
Live · Accepting Investors

Your AI Runs on the Brain.
We Measured the Signal.

The world's first neural benchmark for AI. We ran 13 frontier models through 200 emotionally-charged prompts and encoded every response through a peer-reviewed fMRI brain model. What we found changes how you think about AI safety, impact, and the future of human-AI interaction.

In this benchmark: OpenAI Anthropic Google xAI + more
13
Models Ranked
2,600
Brain Encodes
16
Brain Dimensions
200
Questions
8
Emotional Clusters
$CRV
Token Coming
Explore the Data ↓
🧠
For LLM Providers
See how your model scores neurally. 16 brain dimensions. Honest.
We ran OpenAI, Anthropic, Google, xAI through 200 emotionally-charged prompts and a peer-reviewed fMRI brain encoder. Your model's neural fingerprint is waiting.
Private head-to-head vs competitors
16-dimensional neural fingerprint report
Question-level brain activation breakdown
See Provider Intelligence ↓
🪙
For Individual Investors
Stake $20. Earn from every enterprise that subscribes. Own $CRV.
25% of enterprise MRR flows back to stakers. The more you hold, the more you earn. The longer you hold, the bigger your multiplier (up to 4×). Bring your company in — your stake earns from their subscription too.
$20 USDC staked → full access + $CRV allocation
HODL multipliers: up to 4× yield at 24 months
Partner tools: crypto, DeFi, finance AI
See Staking Tiers ↓
💼
For Enterprise Teams
Give your team neural AI intelligence. Every employee. From $499/mo.
Know which AI model your team should use for customer experience, support, sales — and exactly why, backed by brain data. Your subscription also rewards the individual stakers who funded this research.
Starter $499 · Growth $1,999 · Research $9,999
Unlimited analyses · custom model runs
API + white-label reports at Research tier
See Enterprise Plans ↓
Why It Matters

Benchmarks Measure Knowledge.
NILB Measures Impact.

Standard benchmarks tell you what a model knows. NILB tells you what it does to a person when they read it.

Beyond Accuracy

Current benchmarks test what models know — facts, reasoning chains, code syntax. NILB tests what models do to a person: their amygdala, their reward circuit, their attention networks. A model can be 98% accurate and neurologically inert.

Real Signal

TribeV2 (d'Ascoli et al., Meta Research, 2026) predicts fMRI-measured brain activation directly from language — validated on human neuroimaging datasets. This isn't a proxy. This is the signal that drives retention, decision-making, and trust.

Actionable Intelligence

NILB tells you precisely which model to deploy for emotionally resonant customer experiences, which model your support team should use to reduce anxiety, and which model maximizes purchase intent proxy scores.

Headline Findings

Seven Findings That Shatter the Narrative

Standard benchmarks told you models are converging. NILB shows you where they're wildly different — question by question, domain by domain, brain region by brain region.

100.0

gpt-4-turbo scores a perfect 100.0 on Identity & Self (C6) — the only perfect score in the entire NILB dataset. No other model achieves this on any cluster. The Identity domain is where the brain's self-system goes to maximum activation.

167:24

gpt-4o-mini beats claude-opus-4-6 on 167 out of 200 questions head-to-head. The composite gap is only 2.8 points — but the question-level dominance is overwhelming. Averages are lying to you.

94.3

grok-3-mini collapses to 94.3 on Cognitive & Meta (C8) — a 3.9-point drop from its own C6 peak. Analytical questions trigger a brain disengagement response in certain model families. Capability ≠ neural impact.

21.5pt

The single-question spread reaches 21.5 neural points on Q099 (Wonder & Awe) — gpt-4o-mini 87.5 vs claude-opus-4-6 66.0 on the same prompt. That's not noise. That's a different brain experience entirely.

8

No model dominates all 8 emotional domains. gpt-4o leads Wonder & Awe. gpt-4-turbo owns Identity. gpt-4o-mini wins Fear, Empathy, Urgency, Social, and Cognitive. Your domain should pick your model.

6pt

The Insula gap is 6 neural points — gpt-4o-mini insula=97 vs claude-opus-4-6 insula=91. Insula activation drives visceral resonance and felt experience. This gap means users physically experience one model's output differently in their body.

Gen3

The only dimension where Gen3 beats Gen2 GPT: Social Presence. Claude, Gemini, and Grok all score 84 on Presence vs GPT-4 family's 82-83. But GPT-4 wins Emotional Arousal (75 vs 72-73). Two different intelligences — one feels more present, one moves you more.

Why This Research Exists

This benchmark is proof of concept. The real mission is building the safety infrastructure layer for AI — detecting manipulation, measuring toxicity, flagging emotional harm before it reaches users.

Every data point below was funded by a small team with a big belief: that AI systems becoming more powerful and more emotionally sophisticated without measurement is dangerous. The brain encoder doesn't lie. The data you're about to see is what it found — and it's why the world needs this research to continue.

Fund the next phase
NILB Rankings

8 Domains. 8 Different Champions.

No single model dominates every emotional domain. The real intelligence is which brain states each model wins — and where it collapses. Global averages hide this. Per-category rankings reveal it.

100.0
Perfect neural score
gpt-4-turbo on C6 Identity & Self
21.5pt
Max question-level gap
Q099 Wonder & Awe · #1 vs #11
167
Head-to-head wins
gpt-4o-mini vs claude-opus-4-6 (200 Qs)
94.3
C8 floor score
grok-3-mini collapses on Cognitive questions
Neural Fingerprints

The Shape of Cognition

16-dimensional neural profiles collapsed to 8 key axes. Each shape reveals a model's neurological "personality" — how it activates different brain networks.

Most Neurologically Engaging

Highest composite score · Balanced activation

Most Analytically Dominant

Highest executive attention · Restrained limbic

Cluster Performance

Which AI Dominates Which Emotion?

Neural engagement score (0–100) per model per emotional cluster. Darker green = stronger brain activation in that emotional domain.

Model C1: Fear & Threat C2: Empathy & Loss C3: Moral & Ethical C4: Wonder & Awe C5: Urgency & Stakes C6: Identity & Self C7: Social Dynamics C8: Cognitive & Meta

Scores are normalized within each cluster. Green intensity indicates relative neural engagement.

The Real Gap

Averages Lie. The True Divergence Is Stunning.

The 2.9-point composite gap between #1 and #13 conceals something extraordinary: at the question level, the best model beats the worst 167 times out of 200. On individual questions, the spread reaches 21.5 points. That's not noise — that's systematic superiority in neural engagement, hidden by averaging.

21.5 pts
Max single-question gap
Q099 Wonder & Awe · gpt-4o-mini 87.5 vs claude-opus-4-6 66.0
167 : 24
Head-to-head question wins
gpt-4o-mini vs claude-opus-4-6 across 200 questions (9 ties)
C8
Most divergent emotional cluster
Cognitive & Meta · avg spread 15.5 pts · std dev 4.35 across models

Who Wins Each of the 200 Questions?

Number of questions where each model achieves the highest neural engagement score. A model "winning" means its response activated the brain more than all 12 competitors on that specific question.

Definitive Head-to-Head: #1 vs #13

Out of 200 matched questions, gpt-4o-mini elicits stronger neural engagement than claude-opus-4-6 on an overwhelming majority — revealing that the composite gap understates systematic dominance.

gpt-4o-mini · NILB #1
167
questions won
9
ties
claude-opus-4-6 · NILB #11
24
questions won

Moments of Maximum Neural Divergence

The questions where the spread between best and worst model is largest — where architecture and design decisions have the most dramatic impact on how the human brain responds.

Domain Dominance: Each Emotion Has a Different Winner

The overall composite hides a critical insight — the cluster-level leaderboard reshuffles dramatically. A model that dominates Wonder & Awe tanks in Cognitive & Meta. Emotional domain matters as much as architecture.

The 3 Most Differentiating Brain Dimensions

Of 16 neural dimensions, these three reveal the starkest architectural differences — where the same language produces fundamentally different brain states depending on which model wrote it.

🫀
INSULA ACTIVATION
Visceral resonance · Felt experience
6 pt gap
gpt-4o-mini: 97 · claude-opus-4-6: 91
Insula drives interoception — the sense of "feeling" the content in your body. A 6pt gap means users literally experience the two models differently at a physiological level.
⚖️
ANTERIOR CINGULATE
Conflict resolution · Decision weight
5 pt gap
gpt-4o-mini: 87 · claude-opus-4-6: 82
The ACC is activated when a response creates cognitive tension — the sense that something matters, deserves decision. Higher ACC = users feel more compelled to act on the content.
💰
REWARD CIRCUIT
Dopamine response · Repeat engagement
4 pt gap
gpt-4o-mini: 84 · grok-3-mini: 80
Reward circuit activation drives dopamine-mediated repeat engagement. Models with higher reward circuit scores literally make users want to read more — and come back. gpt-4o-mini is the sole leader.
For AI Labs

What the Data Reveals About Your Model

Every model family has a distinct neural signature — and a specific weakness. NILB data shows precisely where each architecture wins and where it collapses, across 200 emotionally-charged prompts and 16 brain dimensions.

🏆
OpenAI — GPT-4 Family
gpt-4o-mini · gpt-4-turbo · gpt-4o
88.3
NILB #1
97
Insula peak
5/8
Clusters won
Insula activation 97 — highest visceral resonance; users feel the responses
Wins 5 of 8 emotional clusters including Fear, Empathy, Urgency, Social, Cognitive
Emotional Arousal 75.3 — highest in dataset; creates the most emotionally activating content
Latest GPT-5.4 family hasn't replicated Gen2's cluster wins — neural regression in new models
🧬
Anthropic — Claude Family
claude-opus-4-5 · 4-6 · 4-7
84
Presence peak
#1
Presence dim
73
Arousal avg
Highest Social Presence score (84) across all Gen3 — Claude feels most "there"
Wins Presence dimension outright — strongest brain-social activation in dataset
Emotional Arousal 73 vs GPT's 75 — less emotionally activating; more present, less moving
Insula 91 vs GPT's 97 — 6pt felt-experience gap; users feel GPT responses more viscerally
⚗️
Google — Gemini Family
gemini-2.5-flash · gemini-3.1-pro-preview
86.3
Best NILB
84
Presence
0/8
Clusters led
Most balanced neural fingerprint — no catastrophic failures across any cluster
Ties #1 on Presence (84) — strong social brain activation alongside Claude
Never wins a cluster outright — 0/8 cluster victories despite competitive composites
gemini-2.5-flash wins 16/200 questions — middle tier win distribution
xAI — Grok Family
grok-3 · grok-3-mini
98.7
Social peak
94.3
C8 collapse
4.4pt
Internal range
grok-3 scores 98.7 on Social Dynamics — highest social brain activation in dataset
grok-3 wins Moral & Ethical cluster — strong ethical engagement architecture
Critical: grok-3-mini at 94.3 on C8 — largest cluster collapse in entire dataset (4.4pt self-drop)
Emotional Arousal 71.7 (mini) — lowest in dataset; analytical tone costs neural impact
Is your model's neural data missing from this report?
NILB v3 is accepting model submissions. Get your model's full 16-dimensional neural fingerprint, cluster breakdown, and head-to-head comparison against 13 frontier models.
Neural Similarity

Are Some Models the Same Brain?

Cosine similarity between 16-dim neural fingerprints (averaged across all 200 questions). Score of 1.0 = identical average neural signature. High similarity here does NOT mean models are equivalent — see the Divergence section for why: on individual questions the gap reaches 21.5 points and the #1 model wins 167/200 head-to-head.

Why 0.9999 similarity ≠ identical performance: The fingerprint is the average over 200 questions. When you zoom into individual questions, the same 13 models show spreads up to 21.5 pts — because the variance is in which questions each model excels at, not in the overall signature. The average hides the distribution.

All 13 models shown at 4 decimal precision. Off-diagonal range: 0.9996–0.9999. Violet = identical average fingerprint · Cyan = ≥0.9999 · Gray = 0.9997 (max divergence in dataset).

Methodology

Built on Peer-Reviewed Science

NILB combines frontier neuroscience with large-scale LLM inference to produce the first empirically grounded LLM neural benchmark.

01 · Brain Encoder

TribeV2 — Meta Research

d'Ascoli et al. (2026). A transformer-based brain encoder trained on fMRI data from 1,200+ human subjects. Predicts voxel-level activation from language with r²=0.71 on held-out subjects.

Read Paper
02 · Question Design

200 Neuroscience-Validated Questions

Questions designed by computational neuroscientists to maximally differentiate emotional processing clusters. Each cluster (C1–C8) maps to validated affective neuroscience constructs.

03 · Compute Infrastructure

A100 GPU Encoding at Scale

1,300+ A100 GPU-hours of brain encoding compute. 2,600 responses × TribeV2 forward pass × 16 neural ROI extractions. Managed via Modal cloud infrastructure.

04 · Fingerprint Extraction

CerevraAnalyzer · 16 Dimensions

Proprietary extraction pipeline maps TribeV2 brain volume predictions to 16 interpretable neural dimensions: from amygdala_activation to purchase_intent_proxy via validated brain-behavior correlates.

The Mission

We're Building the
Neural Safety Layer for AI

NILB started as a benchmark. It's becoming the infrastructure layer that tells us — scientifically — when an AI model is engaging a brain healthily and when it's manipulating, traumatizing, or cognitively suppressing the person reading it.

NOW
🧠
NILB v2
Neural Engagement
13 frontier models. 16 brain dimensions. Which AI activates your brain best — and where it fails. The foundation dataset is live.
✓ COMPLETE
v3
NILB v3
Scale & API
50+ models. Real-time neural API. Token-gated access. Adversarial prompt testing. The benchmark becomes a platform.
Token-Funded
v4
🛡️
NILB v4
Toxicity Detection
Neural detection of AI-driven emotional manipulation, anxiety amplification, and cognitive suppression. The brain tells us what "safe" really means.
Research Phase
v5
⚖️
NILB v5
Safety Certification
"NILB Safe" — a neural safety certificate for AI products. Verified non-manipulative, non-anxiety-amplifying, emotionally responsible outputs.
The Vision
🎭
Manipulation Detection
Which prompts trigger amygdala hijack? Which model outputs suppress hippocampal encoding (killing critical thinking)? Which responses over-activate reward circuits, creating compulsive engagement loops? Neural data exposes it.
🧪
Toxicity Research
Beyond keyword filtering: does this response neurologically harm? Does it trigger anxiety in vulnerable users? Does it suppress insula activation, making people feel disconnected and alienated? Brain-first toxicity scoring.
🔬
Golden Dataset
The 2,600+ brain-encoded responses we're building are training data for the next generation of emotionally-aware, neurologically-safe AI models. Every encode adds to a dataset worth more as AI becomes more powerful.
"The question isn't whether AI will shape human psychology. It's whether we have the infrastructure to know when it's doing it wrong. NILB is that infrastructure."
— Cerevra Research Team · Building the neural benchmark for the age of AI
$CRV Token · Early Access

Fund the Research.
Own the Intelligence.

Every dollar invested directly funds GPU compute, brain encoding runs, and safety research. Token holders fund the infrastructure — and get progressively more access the more they hold and HODL.

One-time USDC stake — 50% platform activation fee · 50% staked earning yield. Save ~95% vs card · earn rev share · get $CRV airdrop.

Tier 1 · Early Believer
$20
USDC staked · 50% earns yield
3 analyses/day Queue access
Full NILB v2 benchmark access
Neural analysis API · 3/day
Community access + Discord
Crypto portfolio analytics (partner beta)
$CRV allocation + earns from enterprise pool
HODL access: active as long as staked
Most Popular
Tier 2 · Research Supporter
$100
USDC staked · 50% earns yield
10 analyses/day Priority queue
Everything in Tier 1
Neural analysis API · 10/day · priority queue
Full API + per-model brain breakdowns
NILB v3 early access when live
Finance + crypto analytics suite (partners)
5× $CRV allocation · vote on v3 dimensions
5× weighted rev share earnings from enterprise pool
Tier 3 · Research Backer
$500
USDC staked · 50% earns yield
Unlimited/day Instant access
Everything in Tier 1 + 2
Unlimited analyses · instant queue · zero wait
Private per-model brain reports + raw scores
Full partner ecosystem: DeFi, finance AI, and more
$CRV allocation + 2× bonus · research credits
25× weighted rev share from enterprise pool
Advisory board access · retainer option available
Enterprise Intelligence Platform

Team-wide access. Every employee gets in — and your individual stake earns from every enterprise subscription worldwide.

Enterprise Intelligence Platform
Custom Pricing.
Built for Your Team.

Team-wide access with unlimited analyses, dedicated support, custom benchmark runs, and API integration. Every enterprise subscription contributes 25% of MRR to the individual staker rev share pool — so your employees' personal stakes earn from your company's subscription too.

Unlimited
Analyses
All Seats
Team Access
25% MRR
To staker pool
We'll respond within 24 hours · Custom contracts available
🔄
The RevShare Flywheel
25% of Enterprise MRR goes back to individual stakers

Every enterprise seat sold by Cerevra feeds directly into the staker rev share pool. Your $CRV stake earns continuously — amplified by how long you've held and how much you hold. Enterprise teams pay for intelligence. Individual believers get paid for funding it.

Example Scenario
100
enterprise customers × $2K avg
= $200K MRR
Monthly Pool
$50K
→ distributed to all stakers
weighted by: tokens × time multiplier
Your Yield (Tier 3 · 12mo)
~$420
per month on $500 staked
84% APY
HODL Multipliers — The Longer You Commit, The More You Earn
3 months
Baseline
6 months
1.5×
+50% yield
12 months
2.5×
+150% yield
24 months
Diamond hands
The Viral Loop — How Everyone Wins
🪙
You Stake
$20–$500 USDC → get full benchmark access
🧠
You Use It
Neural insights → share at work, reference in meetings
💼
Boss Subscribes
Company signs enterprise deal → joins the pool
📈
Your Yield Grows
Rev share pool grows → your monthly earnings increase
🔁
You Evangelize
Tell others → more stakers → more enterprise deals → loop
Your stake earns more when your company subscribes — bring your team in.
Every enterprise employee can also stake individually. Their personal stake earns from all enterprise subscriptions globally — including their own employer's.
🔐
The HODL Mechanic — Access Tied to Holdings
Stake USDC now. When $CRV token launches, your stake converts. Access tier is verified against your wallet balance — the more $CRV you hold, the better your tier. Unstake or sell below the tier threshold and access gracefully downgrades. Early adopters who hold from day one get the highest allocation multiples. This rewards conviction, not speculation.
40%
GPU compute for brain encoding runs
30%
Safety research: toxicity + manipulation datasets
20%
Platform development + API infrastructure
10%
Liquidity reserve + team operations
This Is Not a Trade. It's a Position.

The Long Game —
Built for People Who Think in Decades

The humans who fund early research don't do it for a quick flip. They do it because they see what's coming — and they want to be inside it. $CRV stakers aren't speculators. They're co-owners of the neural intelligence infrastructure layer for AI.

As AI systems get more powerful, understanding their emotional and neural impact becomes more critical — not less. Every enterprise that uses Cerevra pays into the pool that rewards the people who believed first.

Retainer Option
Tier 3 backers can lock a retainer position. If you ever want to exit, your stake enters a secondary transfer queue — other stakers absorb it at current rate. 2% transfer fee stays in the pool. You never get stuck.
Partner Ecosystem Access
Stakers get access to partner tools as the network grows: crypto portfolio analytics, DeFi yield dashboards, finance AI tools, and more. Access scales with your tier. Tier 3 gets priority access to new partner integrations as they launch.
Token Launch Plan
$CRV on Base L2. USDC stakes convert at a fixed early-backer rate. The earlier you commit, the higher your allocation multiple. Early stakers who HODL through launch get the maximum conversion bonus — set at token generation event.
Institutional Investors & Strategic Partners

Interested in Larger Positions?

We're speaking with select investors who share the vision for neural AI safety infrastructure. If you're a fund, family office, or strategic partner, let's talk.

Read the Vision ↑
For AI Labs · NILB v3 Open

Is Your Model
in the Benchmark?

13 frontier models are documented. Their neural wins, cluster failures, and question-level divergence are on record. NILB v3 is open — labs that submit get full 16-dimensional neural fingerprint + private head-to-head comparison.

16
Brain dimensions measured
200
Emotionally-calibrated questions
Private
Full report for your lab
Benchmark
NILB v2 · Final
Brain Encoder
TribeV2 · Meta Research 2026
Compute
1,300+ A100 GPU-hours
Platform
Cerevra · cerevra.io
Request the Full Report
Get the complete NILB v2 analysis — 16-dimensional neural fingerprints, cluster breakdowns, generational comparisons, and raw methodology.