The Unjournal · Pivotal Questions Initiative

Wellbeing Pivotal Questions

State your beliefs on specific, operationalized questions about WELLBY reliability and DALY–WELLBY interconvertibility.

💬 Annotate this page — select any text to comment via Hypothes.is (free account to post; anyone can read)
Privacy Notice (click to expand)

Your responses will be used in our research synthesis and may be shared in aggregated or anonymized form. We may quote specific responses with attribution unless you request otherwise. If you prefer your responses remain anonymous, please indicate this in the "other thoughts" field at the bottom of the form.

These are some of the key operationalized questions from our Wellbeing Pivotal Questions project.[1]Pivotal Questions: research questions where credible evidence could most shift funding decisions. We identify these through stakeholder collaboration, prioritizing by expected value of information. We want to elicit expert and stakeholder beliefs—before, during, and after reviewing the evidence and key arguments—to see how views evolve and where consensus exists. (All questions are optional.)

📋 Full question specifications: For more detail, context, and the complete set of operationalized questions, see the canonical Wellbeing PQ formulations on Coda →

You don't need to be a specialist to contribute. We want your honest assessment and reasoning, whether you feel highly confident or very uncertain. Your input helps us understand the range of views in the field.

🔮 Related forecasting: Some of these questions may be posted to The Unjournal's Metaculus forecasting page for crowd prediction. If you forecast on Metaculus, please share your username below so we can link your contributions.

How to respond

Shared Definitions

Suppose Founders Pledge is considering whether to donate $100,000, either:
  • to StrongMinds (to treat depression in women in low-income settings through group interpersonal psychotherapy)
  • or to extend a seasonal malaria chemoprevention campaign.
Suppose they have substantial evidence on the impact of each intervention coming from RCTs combined with typical self-reported wellbeing surveys as well as objective income and health measures and outcomes. They also have the opportunity to fund the collection of more data in future studies.

They want to allocate the funds to the intervention that leads to greater "social wellbeing or welfare" in expectation.

For the current context, we define a WELLBY (Wellbeing-Year) as one point of self-reported life satisfaction measured on a 0-to-10 Likert scale for one individual for one year (following Frijters et al., 2020; Frijters and Krekel, 2021).

We follow the definition from Frijters et al., 2024, based on a life satisfaction scale (acknowledging that WELLBY has been defined differently in other contexts).

"Best" = leads to the decisions that yield the highest "true welfare" on average, in the particular relevant domain (e.g., in comparing mental health interventions in Africa), perhaps taking into account the cost of doing the measurements.

More precisely: the "best" measures and aggregations would be those that, if we collected and made decisions based on them, would yield policy and funding choices with the highest overall wellbeing or welfare in expectation. Consider reliability, practicality, cost, comparability, and other real-world considerations.

The "best" mappings would be those that, if used to make conversions between WELLBYs, DALYs, etc., would be likely to lead to the better/best decisions in most relevant situations.

When we ask for a probability, we're asking for your best calibrated subjective probability—your honest credence given everything you know.

One way to think about this: Imagine an ideal research team with unlimited resources, time, and data—perhaps even a kind of omniscience where they could perfectly understand the welfare and psychological states of everyone affected. What probability would you assign that this idealized team would ultimately conclude the statement is true?

Note: We avoid anchoring to "0% = impossible" and "100% = certain" because perfect certainty is rarely justified. If you believe something is extremely unlikely but not literally impossible, you might say 2-5%; if nearly certain but not absolutely, perhaps 95-98%.

1. WELLBY Reliability and Value

How reliable is the linear WELLBY measure for comparing interventions?

PQ1a · WELLBY Usefulness · WELL_01/07

How reliable is the linear WELLBY measure [...] relative to other available measures in the 'wellbeing space'? How much insight is lost by using linear WELLBY and when will it steer us wrong?

Adapted from WELL_07: "How reliable is the WELLBY measure of well-being/mental health (as defined above) relative to other available measures in the 'wellbeing space' (including other transformations of the 0-10 life satisfaction scale)?"

The WELLBY is used by several major funders[2]Happier Lives Institute uses WELLBYs as their primary metric; Founders Pledge incorporates them alongside DALYs. GiveWell has explored WELLBY-based analysis but hasn't fully adopted it. (Happier Lives Institute, Founders Pledge) to compare interventions across domains. The reliability of this approach matters for resource allocation decisions.

50%
PQ1b · Best Measure · WELL_02/03

Given the available collected data [...], how should [funders] measure the impact on wellbeing? [...] What measures of well-being should charities, NGOs, and RCTs collect for impact analysis?

Even if the WELLBY is "good enough," there might be better options—multi-item scales, log-transformed life satisfaction, or standardized composites. Switching measures has costs, so the improvement needs to be meaningful.

Adapted from WELL_02: "Given the available collected data from surveys and intervention trials, how should Founders' Pledge measure the impact on wellbeing in the context of mental health interventions? [...] Consider reliability, insight, and practicability."

And WELL_03: "What measures of well-being [...] should charities, NGOs, and RCTs collect for impact analysis, particularly in contexts that may involve less tangible well-being outcomes (such as mental health interventions)? This could also include stated-preference and calibration surveys."

  • Candidates include: multi-item life satisfaction scales (e.g. SWLS), experience sampling, the WB-Pro, WEMWBS, log-transformed 0-10 LS, or domain-specific instruments.
  • Diener et al. (2018) found single-item life satisfaction has moderately high reliability (~0.70 correlation) with little validity loss compared to multi-item scales.
  • WELL_03 also asks: "How should these [measures] be used?"—considering not just what to collect but how to combine and interpret the data.
WELL_01a · Cost Ratio Extension

If you propose a measure other than linear WELLBY in your answer above, how much more would it cost to achieve the same welfare improvement using linear WELLBY instead?

Consider the welfare-improvement from allocating $100,000 among a large set of charities/interventions given the information provided by the "best measure" you propose. How much more would it cost to achieve the same outcome using the linear WELLBY? (E.g., 1.1 = 10% more, 1.5 = 50% more, 3 = 3x as much.) If you think WELLBY is optimal, skip this question. This is inherently speculative—rough estimates based on your intuition are welcome.

WELL_04 · Single vs Combined Measures

In contexts where interventions impact mental health, physical health, AND consumption: is it better to use a single WELLBY measure, or measure each dimension separately and then convert/combine?

WELL_07 · What Is Lost?

How much insight is lost by using WELLBY relative to other available measures in the "wellbeing space"? When will it steer us wrong?

WELL_08 · Life Satisfaction vs Experience

Would it be better to base the metric on life satisfaction or instantaneous experience measures (e.g., happiness, affect balance)?

WELL_09 · Cantril Ladder Conversion

If we must rely on the Cantril ladder measure, how would we best convert it into a welfare metric for comparing interventions?

2. Conversions Between Measures

How should we convert between WELLBYs and DALYs/QALYs?

PQ2 · DALY/QALY–WELLBY Conversion · DALY_01/03/05

If some programs are measured in WELLBYs and others in DALYs/QALYs, what is the best numerical conversion or mapping between them—and what method or approach should we use?

From DALY_01: "If the impact of one program is measured in WELLBYs [...] and another in DALYs, what is the best numerical conversion or mapping between them?" Also from DALY_03: "What method or 'mapping structure' should we use?" (Note: QALYs may be more relevant than DALYs for this conversion—see context.)

"Best" here means: the mapping that, if used for funding decisions, would lead to the highest expected welfare. Getting this conversion wrong means systematically over- or under-investing in mental health versus physical health interventions.

  • DALY vs QALY: DALYs measure health burden (years lost to disease/disability); QALYs measure health gained. For conversion purposes, QALYs are often more directly comparable—the canonical questions note "replace DALY with QALY" may be appropriate.
  • Some organizations (including HLI and Founders Pledge) currently treat SDs on different mental health instruments as interconvertible with WELLBY SDs on a roughly 1:1 basis.
  • The conversion between DALYs/QALYs and WELLBYs depends on the "neutral point"[6]The neutral point is the life satisfaction level where welfare equals zero—below this, welfare is negative. If neutral=5, then LS=3 represents negative welfare; if neutral=2, LS=3 is positive. on the LS scale—the point below which life has negative value. This is currently unknown; one small study (Peasgood et al. 2018) suggested LS ≈ 2, but this is tentative.
  • The relationship may also be non-linear—e.g., a WELLBY gained at very low wellbeing could be worth more than one gained at high wellbeing.
  • Approaches include: SD-equivalence (current practice), regression-based approaches (linking LS data to DALY weights in the same populations), time-tradeoff surveys, or maintaining separate analyses and comparing rankings.

An 80% credible interval represents the range you believe has an 80% probability of containing the true value. There should be roughly a 10% chance the true value is below your lower bound, and a 10% chance it's above your upper bound.

This is more informative than a single "confidence" percentage because it captures both your best guess and how uncertain you are. For help calibrating your uncertainty estimates, try the Clearer Thinking calibration tool.

DALY_02 · Founders Pledge Specific

Which mapping between WELLBYs and DALYs should Founders Pledge specifically use for comparisons like the focal example (StrongMinds vs malaria)?

This asks about the best mapping for their particular use case, rather than a general-purpose conversion.

DALY_05 · Loss from SD-SD Approach

What is the loss from the "1 SD change in WELLBY ≈ 1 SD change in DALY" approach currently used by some funders, relative to the best feasible approach?

Where will this approach be particularly incorrect? Consider different intervention types, populations, or contexts.

3. Predictions and Policy

Forecasting questions about expert consensus, research uptake, and measurement impact.

PQ3b · Metaculus-style · Expert Consensus

If The Unjournal were to survey development economists and research-informed practitioners (before end of 2027), what share would agree that "the linear WELLBY (as defined above) is a reasonably useful measure in this context, and switching to a different measure is unlikely to add much value"?

(Note: This is a hypothetical scenario for discussion. We are not currently planning to conduct such a survey, though we would like to if feasible.)

PQ3a · Metaculus-style · Research Uptake

By 2030, will more than 50% of GiveWell's top charities include a WELLBY-based cost-effectiveness analysis alongside or instead of DALY-based analysis?

This illustrative forecasting question gauges whether the WELLBY will gain institutional traction. (Note: This is a discussion question for the workshop, not from the canonical PQ table.)

PQ3c · Metaculus-style · Calibration Impact

If calibration questions and/or vignettes (as in Benjamin et al.) were added to the major wellbeing surveys used in global health RCTs, would the resulting adjustments meaningfully change the cost-effectiveness ranking of the top 5 interventions recommended by Founders Pledge?

  • "Meaningful change" = at least one intervention currently in the top 5 moves out of the top 5, OR the #1 ranked intervention changes.
  • This assumes future RCTs incorporate these methods and Founders Pledge updates their CEA accordingly.
  • Note: This question is somewhat speculative—it asks about counterfactual methodology adoption and its downstream effects.
35%

About You

Your responses are stored securely and will be used to inform the synthesis report.

Questions adapted from the canonical Wellbeing PQ formulations (codes: WELL_01–07, DALY_01–05). Last updated: February 2026.

Notes

  1. Pivotal Questions: research questions where credible evidence could most shift funding decisions. We identify these through stakeholder collaboration, prioritizing by expected value of information.
  2. Happier Lives Institute uses WELLBYs as their primary metric; Founders Pledge incorporates them alongside DALYs. GiveWell has explored WELLBY-based analysis but hasn't fully adopted it.
  3. Scale-use heterogeneity: different people use the 0-10 scale differently—what one person calls "7" might correspond to another's "5." This creates bias when comparing across individuals or groups.
  4. Calibration questions have objectively correct answers (e.g., "1+1=?") that reveal how respondents use scales. If someone rates 2 as "very certain," we know they compress the scale.
  5. Vignettes describe hypothetical people ("John has X, Y, Z characteristics"). By having respondents rate these standardized scenarios, researchers can compare individual scale use.
  6. The neutral point is the life satisfaction level where welfare equals zero—below this, welfare is negative. If neutral=5, then LS=3 represents negative welfare; if neutral=2, LS=3 is positive.