PERSONA: A Reproducible Testbed for Pluralistic Alignment

PERSONA: A Reproducible Testbed for Pluralistic Alignment

Louis Castricato*, Nathan Lile*, Rafael Rafailov, Jan-Philipp Fränken, Chelsea Finn

The rapid advancement and adoption of language models (LMs) has highlighted critical challenges in aligning these models with the diverse values and preferences of global users. Existing reinforcement learning from human feedback (RLHF) approaches often fail to capture the plurality of user opinions, instead reinforcing majority viewpoints and marginalizing minority perspectives. To address this, we introduce PERSONA, a comprehensive and reproducible test bed designed to evaluate and improve pluralistic alignment in language models.

Our approach utilizes synthetic personas, crafted through a combination of US census data and procedural generation, to simulate a wide array of user profiles with diverse demographic and idiosyncratic attributes. We present a detailed methodology for constructing a representative demographic of 1,586 personas, each enriched with individualistic personality traits and core values. Leveraging this synthetic demographic, we generate a large-scale preference dataset containing 3,868 prompts and 317,200 pairs of diverse feedback.

This dataset enables the evaluation of language models` ability to align with both group-level and individual preferences across various controversial and value-laden topics. Our contributions include a systematic evaluation of current LM capabilities in role-playing diverse users, verified through human judges, and the establishment of a benchmark for pluralistic alignment approaches. Our work aims to facilitate the development of more inclusive and representative language models, paving the way for future research in global pluralistic alignment.

arXiv:2407.17387v1 [cs.CL] 24 July 2024

Interested in Collaboration?

We're always open to new collaborations and ideas. If you're interested in working with us or have any questions about PERSONA, please reach out!

Samples from the PERSONA dataset

We've included a number of randomly sampled non-cherry picked samples from the PERSONA dataset, as well as below outlining the process we utilized to construct them. We hope that these samples provide a good overview of the kinds of personas that are included in the dataset, as well as the kinds of personalization that are possible with the dataset.

Dataset Creation

Often in the pluralistic alignment spaces personas lack significant grounding. To construct demographic personas that accurately reflect the challenges of pluralistic alignment in a realistic setting, we construct a set of personas with demographics closely following the US population.

Dataset Access

Important: The full PERSONA dataset is available only under an academic use only licensing agreement. To access it, you need to apply both on Hugging Face and through the form below. Please note that access is subject to approval and restricted to academic research purposes.
Loading...