The Unjournal · Pivotal Questions Initiative

About This Workshop

Why we're bringing researchers, evaluators, and funders together to discuss how we measure and compare wellbeing across interventions.

💬 Annotate this page — select any text to comment via Hypothes.is (free account to post; anyone can read)

The problem

Organizations ranging from Effective Altruism-aligned funders like Founders Pledge, GiveWell, and Open Philanthropy to government agencies and development NGOs compare interventions across very different domains—physical health, mental health, poverty alleviation—to decide where resources can do the most good. To make these comparisons, they need a common unit of measurement.

Two measures feature prominently in these analyses. The DALY (disability-adjusted life year) comes from health economics and captures years of healthy life lost to disease or disability. The WELLBY (wellbeing-adjusted life year) is based on self-reported life satisfaction, typically measured on a 0–10 scale. Each has strengths and limitations—and how they relate to each other, and whether either reliably captures what matters for human welfare, directly affects which interventions get prioritized.

This is part of The Unjournal's Pivotal Questions initiative: working with impact-focused organizations to identify their highest-value research questions, connect them to evidence, and commission expert evaluations that can inform real decisions.

What sparked this workshop

This workshop emerged from converging work streams. First, we collaborated with Founders Pledge to identify their highest-value research questions—Pivotal Questions where credible evidence could most shift their funding decisions. WELLBY reliability and DALY-WELLBY interconvertibility ranked among the most decision-relevant.

Second, our evaluation of StrongMinds—a mental health intervention whose cost-effectiveness depends heavily on WELLBY measurement—highlighted practical stakes: how you interpret self-reported life satisfaction changes can swing an intervention from "highly effective" to "uncertain." This sharpened the need for clarity on WELLBY validity.

Third, we commissioned an evaluation of Benjamin, Cooper, Heffetz, Kimball & Zhou's paper "Adjusting for Scale-Use Heterogeneity in Self-Reported Well-Being." This paper addresses whether people use wellbeing scales in comparable ways. If the differences in life satisfaction (not just absolute levels) aren't comparable across individuals, that poses a challenge for the WELLBY as a tool for comparing interventions. The paper develops methods using calibration questions and vignette exercises to detect and adjust for scale-use heterogeneity. The evaluators' verdict was encouraging but nuanced: the differences in scale use may not be as severe as some feared, but more work is needed—particularly on whether the calibration methods generalize to low-income settings and whether scale-use heterogeneity differs systematically across treatment and control groups.

Together, these considerations led us to propose this workshop to Founders Pledge, who agreed it would be valuable—bringing together researchers, evaluators, and funders to make progress on questions that directly affect funding priorities.

What we want to achieve

This workshop brings together authors of several papers in this area, Unjournal evaluators, funders who use these measures in their work, and researchers with relevant expertise. We're organizing the discussion around four key questions:

1. Is the linear WELLBY reliable enough?

Can we treat a 1-point improvement in life satisfaction as meaning the same thing for different people and starting points? Does improving one person's wellbeing from 1→3 equal improving two people's wellbeing from 1→2? Does a move from 3→4 mean the same as 7→8? Where is the "neutral point" on the scale—and why does it matter for comparing interventions?

2. How should we convert between DALYs/QALYs and WELLBYs?

Current approaches are rough. A 1 SD change in WELLBY is often treated as equivalent to ~1 SD in DALYs (or QALYs), but is this defensible? How sensitive are funding decisions to the conversion factor used?

3. Could methodological adjustments improve things?

Benjamin et al. provide evidence suggesting that calibration questions and vignette exercises may reduce bias from scale-use differences. Should funders encourage these methods in future RCTs? Adding such instruments comes at a cost—increased survey length, respondent burden, and comprehension challenges—so the benefits must be weighed. Are there other refinements—such as multi-item scales—that could help?

4. What should funders do now?

When comparing interventions across domains—where one might be measured in WELLBYs and another in DALYs—what's the defensible approach? (Note: either type of intervention could in principle be measured using either approach.) What conversion factors and uncertainty ranges should CEA organizations use today—and what would change their minds?

Confirmed Participants (15) — click to expand

Date: Monday, March 16, 2026 · 11am–4pm ET (3pm–8pm UK)

Presenters

  • Dan Benjamin (UCLA) — Benjamin et al. paper / broader approach
  • Miles Kimball (CU Boulder) — Paper co-author
  • Julian Jamison (University of Exeter) — DALY↔WELLBY conversion (PQ2)
  • Caspar Kaiser (University of Warwick) — WELLBY barriers discussion
  • Matt Lerner (Founders Pledge) — PQ1, Beliefs Elicitation, Practitioner panel
  • Peter Hickman (Coefficient Giving) — Stakeholder 5-min + Practitioner panel

Confirmed Participants

  • Christian Krekel (LSE)
  • Anthony Lepinteur (University of Luxembourg)
  • Loren Fryxell (City St George's, University of London)
  • Daniel Rogger (World Bank Group)
  • Zhuoran Du (UNSW)
  • Yaniv Reingewertz (University of Haifa) — implementing Benjamin et al. in Israel
  • Anirudh Tagat
  • Valentin KlotzbĂĽcher (University of Basel / UJ Team)

Alberto Prati confirmed for async participation (paternity leave March 14+).

How the workshop is structured

Note: The exact structure is still being determined based on participant feedback. We will announce the precise agenda before the workshop date, so you can drop in for just the segments that interest you.

The workshop is fully online, with approximately 3.5 hours of live sessions scheduled in segments so you can join only the parts you're interested in. We also support asynchronous participation—you can submit beliefs and comments before or after the live event, and we'll integrate these into the discussion.

Proposed segments (unfold) — see also the Live Sessions page for the interactive workshop structure.

Stakeholder Problem Statement & Pivotal Questions (~25 min): Representatives from Coefficient Giving and Founders Pledge (~10 min each) explain why this matters for their work—how they currently weigh WELLBYs vs DALYs in cost-effectiveness analyses and what uncertainties they face. Then we introduce the key Pivotal Questions (~5 min) and invite initial belief estimates from participants who have bandwidth.
Paper Presentation: Benjamin et al. (~25 min): The research team presents their findings on scale-use heterogeneity in self-reported wellbeing—how people use satisfaction scales differently, and what calibration methods can do about it.
Evaluator Responses & Discussion (~25 min): Our independent evaluators share their assessment of the paper's methodology and findings, followed by author responses and open discussion.
WELLBY Reliability Discussion (~25 min): Focused discussion on whether the linear WELLBY is reliable enough for comparing interventions. Covers cardinality assumptions, neutral points, and measurement challenges.
DALY/QALY↔WELLBY Conversion (~25 min): How should we translate between health measures (DALYs, QALYs) and subjective wellbeing (WELLBYs)? Examines current approaches and what's missing.
Beliefs Elicitation (~15 min): A guided exercise where participants state their probabilities on key operationalized questions (WELL_01, DALY_01, etc.), capturing expert views before and after discussion.
Practitioner Panel & Open Discussion (~30 min): Representatives from Founders Pledge, Coefficient Giving, and other organizations discuss how they currently handle WELLBY-DALY comparisons, what they'd need to change their approach, and concrete recommendations for CEA practitioners. Aim: actionable takeaways that organizations can apply immediately.

We plan to record the workshop and make it publicly available by default, with an AI-queryable transcript so researchers and funders can easily search the discussion. Participants can opt out of recording for specific segments if needed, and we will ask for final approval before posting anything.

Outputs: We hope to produce a practitioner-focused summary document, belief elicitation results with confidence intervals, and structured notes. We will share outputs with interested organizations who couldn't attend live, so the discussion can inform decisions beyond those in the room.

Pivotal Questions & Beliefs

As part of this project, we've developed specific, operationalized questions (codes WELL_01–07 on WELLBY reliability, DALY_01–05 on interconvertibility) designed so that experts and stakeholders can state their beliefs quantitatively—and so that answers can directly inform funding decisions. We want to elicit beliefs before, during, and after reviewing the evidence, to see how expert and stakeholder views evolve. See the canonical formulations on Coda.

Three of these questions will also be posted on our Metaculus forecasting page. See key questions and share your beliefs →

Pre-Read Resources: Framing the Discussion

To help participants get on the same page before the workshop, we've prepared two analysis documents that map the key issues, assumptions, and evidence. These aren't meant to resolve debates—they're meant to structure them: "Here are the important issues; let's discuss them one by one."

đź“„ Linear WELLBY Analysis

When is the "linear WELLBY" defensible for comparing interventions? Covers the core assumptions (cardinality, comparability, neutral point), the Bond & Lang identification critique, scale-use heterogeneity (shifters vs. stretchers), and what calibration methods can and can't fix.

Read the analysis →

📄 DALY↔WELLBY Conversion

How should we translate between DALYs/QALYs and WELLBYs? Reviews empirical anchors (UK Green Book ~7:1, empirical estimates 5–10), conceptual issues (health vs. subjective wellbeing), and practical implications for CEA decisions.

Read the analysis →

Note: These documents were AI-assisted drafts (March 2026) that integrate deep research and annotation feedback. They're shared for workshop discussion—please annotate errors or concerns directly on the pages.