Persona Evaluation

persona-bench: An Evaluation Harness for Personalization & Reproducible Pluralistic Alignment

Human vs AI Personalization Challenge

You'll be competing against a frontier AI model in crafting personalized responses.

Disclaimer: Some questions may touch on sensitive topics. Please engage thoughtfully and respectfully. If you feel uncomfortable with any question, feel free to skip it.

Current Language Model's Ability to Successfully Personalize for a Known Demographic Varies Widely

Models

Method

Chart

Group

Sort

Metric

Want to see how your model performs?

Did you prompt?

Evaluate your model's chat style across 1,000+ personas

Did you tune?

Fine-tuning can break your model's ability to personalize for specific sub-demographics...

Developer? Bulk evals

Connect to our held-out evaluation

Key Features

Rapid Evaluation

Assess performance across 1,000+ personas quickly

Published Research

Backed by our paper on arXiv

Proven at Scale

Tested with leading AI models

Enterprise Support

Reach out for custom solutions

Seamless Integration

Easy to Implement

Compatible with popular AI frameworks and easy to integrate into your existing infrastructure.

Grounded in Frontier LLM Research

Our work is backed by rigorous academic research and collaborations. Read our paper for in-depth insights into our methodology and findings. We're also providing academic access to our datasets.

Read Our Research Paper