The Unjournal ยท Pivotal Questions Initiative

WELLBY Reliability Discussion

Is the linear WELLBY reliable enough for cross-intervention comparison?

๐Ÿ’ฌ Annotate this page โ€” select any text to comment via Hypothes.is (free account to post; anyone can read)
SEGMENT 2 30 minutes (11:40 AMโ€“12:10 PM ET)

Presenter: Matt Lerner (Founders Pledge) ยท Discussant: Caspar Kaiser (U of Warwick)

Lerner presents the practitioner perspective on PQ1 (WELLBY reliability for funding decisions). Kaiser discusses key barriers to WELLBY adoption: comparability, linearity, the neutral point problem, and whether WELLBYs capture the right concepts. Previews Benjamin et al. on scale-use heterogeneity, presented in detail after the break.

Focal Question (WELL_01)

What combination of (a) subjective wellbeing survey data, (b) income and health-outcome data, (c) metrics based on this data (e.g., linear or logarithmic WELLBYs, standard deviations, scale-use adjustments), and (d) possible conversions between different measures would be "best" for making funding choices between interventions which may impact mental health, physical health, and/or consumption?

Overview

This open discussion segment addresses the core reliability question: given what we know about scale-use heterogeneity[1]Scale-use heterogeneity: different individuals interpret and use the 0โ€“10 life satisfaction scale differently. Benjamin et al. (2023) estimate this can bias cross-group comparisons by 30-50%. and measurement challenges, is the linear WELLBY[2]"Linear" WELLBY assumes equal intervals: moving from 3โ†’4 equals the same welfare gain as 7โ†’8. This cardinality assumption enables summing across people, but may not hold at scale extremes. measure reliable enough for comparing interventions across mental health, physical health, and consumption domains?

Discussion Prompts

Relevant Pivotal Questions

This discussion directly addresses several of our Pivotal Questions:

Institutional Context

Collaborative Notes

Open in new tab โ†’

Questions & Comments

Add questions and comments directly to the collaborative notes above.

๐Ÿ“„ Background: Linear WELLBY Analysis

This document maps the key issues we'll discuss: cardinality assumptions, Bond & Lang's identification critique, scale-use heterogeneity (shifters vs. stretchers), and what calibration methods can and can't fix.

View Analysis โ†’

AI-assisted draft (Mar 2026) โ€” annotate errors directly.

Notes

  1. Scale-use heterogeneity: different individuals interpret and use the 0โ€“10 life satisfaction scale differently. Benjamin et al. (2023) estimate this can bias cross-group comparisons by 30-50%.
  2. "Linear" WELLBY assumes equal intervals: moving from 3โ†’4 equals the same welfare gain as 7โ†’8. This cardinality assumption enables summing across people, but may not hold at scale extremes.
  3. Linear WELLBY tends to work better when: (1) comparing similar populations, (2) effect sizes are large relative to measurement noise, (3) within-person longitudinal designs are used.
  4. Options include: vignette anchoring, calibration questions, multi-item scales, experience sampling. Trade-offs involve respondent burden vs. precision. See Benjamin et al. for empirical comparison.
  5. GiveWell's StrongMinds analysis explored valuing mental health benefits via WELLBYs rather than income-equivalents. They concluded SWB "deserves more study" but didn't adopt WELLBYs as primary metric.
  6. IDinsight's research uses stated preference surveys to understand how beneficiaries weigh different outcomes (income, health, life satisfaction). This provides an alternative approach to comparing welfare across domains.