DALY/QALY↔WELLBY Conversion | Wellbeing Workshop

⚠️ DRAFT: AI-Generated Content — Requires Verification (click to expand)

This page was largely generated with AI assistance (Claude Code + deep research, March 2026) and has not yet been fully fact-checked. It is shared for workshop discussion and review purposes only.

Conversion factor ranges (e.g., "7 WELLBYs per QALY") are illustrative and require primary-source verification
Citations to specific papers (Peasgood et al., EEPRU report, etc.) should be checked against originals
Interactive demos use simplified assumptions — not for production cost-effectiveness analysis
Framing and emphasis may not reflect consensus views in the field

Please annotate errors or concerns directly on this page. Your feedback will improve the final version.

Annotate & Comment: Double-click any text to add a Hypothes.is annotation, or use the annotation sidebar icon to view existing comments. No account needed to read; quick signup for a free account to post.

Purpose: This page provides technical background for comparing interventions measured in different units—DALYs/QALYs versus WELLBYs. It does not attempt to establish a single "correct" conversion factor. Instead, it maps the candidate approaches, the assumptions each requires, and the practical implications for decision-making under uncertainty.

Audio Version (~37 min)

Listen to an audio narration of this page (British academic voice):

Download MP3 (15 MB) Text Script

Generated with Microsoft Edge TTS (en-GB-RyanNeural)

Companion Page: For reliability of linear WELLBYs themselves →

Linear WELLBY Analysis

The conversion problem: what are we trying to do?

Funders and evaluators often face a practical comparison problem: one intervention is evaluated in DALYs averted or QALYs gained (health metrics), while another is evaluated in WELLBYs (life satisfaction point-years). To compare them on a common basis requires some form of translation or mapping.

The focal question for this workshop segment is:[1]This framing comes from the Pivotal Questions database, question codes DALY_01 / DALY_03 / DALY_05.

Focal Question (PQ2)

"How should we translate between health measures (DALYs/QALYs) and subjective wellbeing (WELLBYs) for cross-intervention comparison?"

Sub-questions include: What numerical conversion factor should we use? Should it vary by domain? How should we treat uncertainty?

A key insight to internalize up front: a "conversion" is not a single fact like a currency exchange rate. DALYs/QALYs were designed to measure health burden or health-related quality of life, while WELLBYs are anchored to self-reported life evaluation intended to capture welfare across all domains.[2]WHO methods explicitly note that disability weights are intended to quantify "loss of health" rather than general welfare or social undesirability. These are different targets.

Workshop framing: The goal is not to find "the one true conversion factor," but to choose (and stress-test) a mapping structure that leads to the least expected decision error given the information available.

The measurement-to-decision pipeline

flowchart LR A[Intervention A
measured in DALYs] --> C[Mapping / conversion
choice] B[Intervention B
measured in WELLBYs] --> C C --> D[Common comparison unit
or multi-metric frame] D --> E[Ranking / allocation
decision]

The mapping choice sits between measurement and decision. Different mappings embed different assumptions—about what welfare is, how metrics relate to it, and what level of precision is appropriate.

Why this is a "forced comparison" rather than a scientific question

Funders cannot wait for perfect measurement. If you must allocate $1M between a malaria program and a mental health program this year, you are implicitly adopting some conversion—even if that conversion is "treat them as incomparable" (which is itself a choice).

The practical question is: given imperfect information, what mapping structure leads to least regret?[17]"Least regret" (or "minimax regret") is a formal criterion from decision theory: choose the action that minimizes the maximum regret across possible states of the world. Here, it means choosing conversion assumptions that minimize expected decision error across plausible true values. This is different from asking "what is the true conversion factor?"—a question that may not have a coherent answer.

Who uses DALY↔WELLBY conversions in practice?

Founders Pledge: Compares interventions across their four "ways of doing good" (lives, DALYs, WELLBYs, income doublings)[6]Founders Pledge internal framework; they explicitly flag DALY↔WELLBY conversion as a key uncertainty.
Happier Lives Institute: Uses WELLBY-based cost-effectiveness analysis across various interventions (mental health, cash transfers, lead reduction, malnutrition, etc.)[16]HLI has developed explicit WELLBY→monetary conversion methods. See their research library for analyses beyond mental health.
GiveWell: Primarily DALY-based but engages with WELLBY evidence when evaluating mental health[11]GiveWell's StrongMinds analysis explicitly discusses the WELLBY→DALY mapping problem.
UK Government: Uses WELLBYs for policy appraisal via Green Book guidance[7]HM Treasury (2021) provides explicit QALY↔WELLBY conversion methodology.

Definitions and notation

Before discussing conversion, we need clear definitions of the objects being converted. These definitions follow primary sources (WHO, NICE, UK Green Book) rather than informal usage.

DALY (Disability-Adjusted Life Year)

A measure of health burden combining:

YLL (Years of Life Lost): Deaths × years lost to premature mortality
YLD (Years Lived with Disability): Incidence × duration × disability weight

$DALY = YLL + YLD$

$YLD = I \times DW \times L$

where $DW \in [0,1]$ is a disability weight: 0 = full health, 1 = death-equivalent.[3]WHO GHE Methods (2020). Note that GBD 2010+ moved to simplified prevalence-based YLD and removed discounting/age-weighting.

Where do disability weights come from?

Disability weights are elicited through population surveys using choice-based methods:

Paired comparison: "Which of these two health states would you consider worse?"
Population health equivalence: "Which is worse: 1000 people with condition X or 2000 with condition Y?"

The GBD 2019 study collected ~60,000 responses from 9 countries to set disability weights for 234 health states.

Importantly, respondents are asked about health loss specifically—not overall welfare, social functioning, or quality of life. This is a deliberate methodological choice that makes DALYs narrower than some alternatives.

QALY (Quality-Adjusted Life Year)

A measure of health benefit adjusting life-years by health-related quality of life:

$QALY = \sum_t q_t \times \Delta t$

where $q_t \in [0,1]$ is a health utility weight (sometimes negative for "worse than death" states). 1 QALY = 1 year in perfect health.[4]NICE Glossary. QALYs are widely used in health technology assessment.

Key methodological feature: QALY weights typically come from patient self-reports using instruments like EQ-5D (a brief questionnaire covering mobility, self-care, usual activities, pain/discomfort, anxiety/depression) or from time trade-off / standard gamble exercises where individuals value their own or hypothetical health states. This contrasts with DALY disability weights, which come from population surveys comparing hypothetical states (not patient self-assessments).

DALY vs. QALY: What's the relationship?

DALYs and QALYs are often treated as "opposites" (DALYs measure burden; QALYs measure benefit), but the relationship is more complex:

Disability weights ≠ 1 − utility weights. The elicitation methods are different, and empirical mappings show poor correspondence at the extremes.
Reference states differ. DALYs reference "full health"; QALYs allow negative values for "worse than death."
Aggregation differs. DALYs are population-summed burdens; QALYs are individual-level benefit calculations.

For conversion purposes, 1 DALY averted ≈ 1 QALY gained is a common working assumption, but it's not definitionally true.

QALY valuation methods in detail

QALYs typically have a two-stage process:

Health state description: Often comes from the patient. The EQ-5D asks patients to rate their current health across five dimensions (mobility, self-care, usual activities, pain/discomfort, anxiety/depression) at three or five severity levels.
Utility valuation: These descriptions are mapped to utility values (0–1 scale) through scoring tariffs derived from general population valuation exercises.

The valuation exercises that create these tariffs include:

Time trade-off (TTO): "Would you prefer 10 years in health state X, or 7 years in perfect health?" The indifference point reveals the utility.
Standard gamble: "Would you prefer certain health state X, or a 70% chance of perfect health with 30% chance of death?" Reveals risk-adjusted utility.
Discrete choice experiments: Respondents choose between paired health states, allowing statistical estimation of attribute weights.
Visual analogue scale (VAS): Direct rating on a 0–100 thermometer scale (less preferred due to known biases).

The key point: while describing the health state often involves patients, valuing that state typically uses general population preferences—though this varies by decision-maker (NICE prefers general population values; some analyses use patient values).

Whose preferences count? The adaptation divergence

A crucial distinction that affects DALY↔QALY↔WELLBY comparisons:

DALY disability weights reflect the general public's assessment of how bad various health states are. Respondents compare hypothetical states they haven't experienced.
QALY utility values can reflect either general public or patient perspectives, depending on the instrument and decision-maker's preference.

These systematically diverge due to hedonic adaptation. Patients who have adapted to a chronic condition often rate it as less severe than the general public imagines. For example:

People with paraplegia often report higher quality of life than non-disabled people predict they would have (Ubel et al., 2005)
Dialysis patients rate their quality of life higher than the general public's expectations (De Wit et al., 2000)

This creates a fundamental tension: should we value avoiding a condition based on how people imagine it (public preferences) or how people actually experience it (patient preferences)? DALYs implicitly choose the former; WELLBYs implicitly choose the latter.

For cross-metric conversion, this means adaptation effects may be double-counted or ignored depending on which direction you convert and what assumptions you make.

DALY valuation methods: paired comparison surveys

DALY disability weights are derived through a fundamentally different approach than QALY utilities:

Paired comparison: "Which person is healthier—Person A with moderate depression, or Person B with severe hearing loss?" Respondents compare brief lay descriptions of health states.
Population health equivalence: "Which is worse—1000 people living with condition X, or 2000 people living with condition Y for half as long?"

The GBD 2019 study collected approximately 60,000 responses from 9 countries to set disability weights for 234 distinct health states (Lancet GBD 2019).

Key design choices:

Respondents are the general public, not patients with those conditions
They compare hypothetical health states they may never have experienced
The descriptions focus on health specifically—not overall welfare, social functioning, or quality of life

This makes DALYs narrower than WELLBYs by design: they measure "loss of health" rather than "loss of welfare."

WELLBY (Wellbeing-Adjusted Life Year)

A measure of welfare based on self-reported life satisfaction:

$WELLBY = \sum_{i,t} \delta^t \times LS_{it}$

where $LS_{it}$ is reported life satisfaction (0-10 scale) for person $i$ at time $t$. One WELLBY = one person experiencing a one-point LS increase for one year.[5]UK Green Book Wellbeing Guidance (2021). This definition is also used by OECD and in the World Happiness Report.

What question produces the LS score?

The standard OECD/ONS question is:

"Overall, how satisfied are you with your life nowadays?"
(0 = "Not at all satisfied" to 10 = "Completely satisfied")

This is an evaluative measure—it asks for a cognitive assessment of one's life, not a measure of current mood or momentary affect.

Other common SWB questions (happiness, worthwhileness, anxiety) capture different constructs and don't combine into WELLBYs the same way.

Why are these different objects?

The metrics have different scopes and measurement architectures:

DALYs/QALYs focus on health—they do not necessarily capture non-health welfare (income, relationships, meaning).
WELLBYs aim to capture overall evaluative wellbeing via self-report—they integrate across domains but depend on how people interpret and use response scales.
DALYs/QALYs use externally-elicited weights (disability weights, health utility tariffs), while WELLBYs use direct self-reported scores.

Important conceptual note: DALYs measure "loss of health," not necessarily welfare

WHO's methods documentation explicitly states that the DALY framework evolved from earlier "welfare/quality of life" framing toward quantifying loss of health (departures from perfect health). Disability weights are intended to reflect health states, not social value, stigma, or general quality-of-life.

This means DALYs and WELLBYs are measuring different targets, even in principle. A "conversion" is really a mapping between proxies, not a unit transformation within the same construct.

Why DALYs (not QALYs) are the focus for LMIC intervention comparison

This workshop—and most global health cost-effectiveness analysis—focuses on DALYs rather than QALYs because DALYs dominate in the low- and middle-income country (LMIC) contexts where impact-focused funders operate:

Global Burden of Disease (GBD) and WHO-CHOICE frameworks use DALYs as the standard metric for cross-country burden estimation
GiveWell reports intervention cost-effectiveness in terms of "cost per DALY averted"
Disease Control Priorities (DCP3)—the authoritative reference for LMIC health interventions—uses DALYs throughout its analyses[23]
Data infrastructure: DALYs require only epidemiological prevalence data (available from disease registries, DHS/MICS household surveys, vital statistics), whereas QALYs typically require patient-reported outcome data (EQ-5D, SF-6D) that's expensive to collect in resource-limited settings

QALYs remain important in high-income clinical settings (NICE cost-effectiveness thresholds in the UK, FDA value assessments in the US), but when comparing a mental health intervention in Uganda to malaria bednets in Kenya, DALYs are the practical common currency.

The methodological questions about WELLBY-to-health-metric conversion apply similarly to both DALYs and QALYs, so insights from this workshop should transfer to either framework.

Cash transfers and the case for WELLBYs

Cash transfers illustrate why DALY-only measurement can be limiting:

DALYs can capture health impacts of cash transfers (reduced child mortality, improved nutrition), but many benefits don't map neatly onto disability weights: reduced stress, increased agency, improved school attendance, consumption smoothing.
This is precisely where WELLBYs become appealing—life satisfaction surveys can capture broader welfare effects in a single metric that integrates across domains.
GiveDirectly's evaluations use subjective wellbeing measures alongside consumption data to capture these effects.[24]
The Happier Lives Institute has built cost-effectiveness estimates for cash transfers in WELLBY terms specifically to make them commensurable with health interventions assessed in DALYs.

This commensurability question—how many DALYs averted equals one WELLBY gained?—is one of the central tensions this workshop is positioned to address. The conversion is not straightforward and involves strong assumptions about the relationship between health states and life satisfaction.

Why this matters in practice

The conversion problem is not merely academic. Funders like Founders Pledge, GiveWell, and Coefficient Giving must compare interventions across different outcome spaces. Their four main "ways of doing good" include: lives saved, DALYs averted, WELLBYs generated, and income doublings.[6]Founders Pledge internal framing; also reflected in EA-adjacent cost-effectiveness analysis.

Canonical comparison examples

Malaria bednets vs. mental health treatment

Malaria interventions are typically evaluated in DALYs averted (mortality risk reduction + reduced morbidity). Psychotherapy programs like StrongMinds are often evaluated with depression scales or life satisfaction. Comparing them requires mapping across metrics.

Cash transfers vs. health interventions

Cash transfer RCTs often measure consumption, income, and life satisfaction. Health interventions measure DALYs or QALYs. Cost-effectiveness comparison requires choosing how to weight these.

The stakes are real: different conversion assumptions can materially change which intervention appears more cost-effective. This is not a reason to avoid conversion, but a reason to be explicit about what is being assumed.

GiveWell's StrongMinds analysis explicitly highlights that translating depression improvements into life satisfaction is a key uncertainty because many psychotherapy studies do not report LS outcomes directly. The mapping model matters for the final cost-effectiveness estimate.

The StrongMinds / HLI / GiveWell controversy

The Happier Lives Institute (HLI) ranked StrongMinds—a mental health NGO—as potentially more cost-effective than GiveWell's top charities, primarily based on WELLBY-measured impacts from group therapy.[16]HLI (2023). StrongMinds cost-effectiveness analysis. Available at happierlivesinstitute.org.

GiveWell's reanalysis raised several concerns:

Mental health → LS mapping: Studies measured mental health outcomes (PHQ-9 and other instruments), not life satisfaction directly. Converting requires assumptions about the relationship between these measures. (HLI addresses this in their conversion methodology.)
Effect durability: What persistence should we assume for psychotherapy effects? HLI's assumptions were more optimistic.
Spillover effects: HLI counted benefits to household members; GiveWell was more skeptical of the evidence base.

This is an ongoing dialogue—see HLI's response to GiveWell's assessment. This controversy illustrates how DALY↔WELLBY mapping issues can materially affect charity recommendations, not just academic debates.

Implicit conversions: what happens when we don't choose?

Refusing to make a conversion explicit doesn't avoid the problem—it just hides it. Common implicit approaches include:

"Only compare within metrics": This effectively assigns infinite value to one metric and zero to the other across boundaries.
"Fund both categories proportionally": This implicitly assumes some conversion ratio equal to budget shares, regardless of cost-effectiveness.
"Use expert judgment case-by-case": This may embed inconsistent conversions across decisions.

Making conversion assumptions explicit—even with wide uncertainty ranges—is almost always preferable to leaving them implicit.

Candidate conversion approaches

There is no single "correct" method for converting DALYs to WELLBYs. Instead, there are several candidate approaches, each with different data requirements, assumptions, and failure modes.

Approach	How it works	Main strengths	Main limitations
Fixed conversion factor	Assume 1 QALY ≈ X WELLBYs (constant X)	Simple; easy sensitivity analysis; explicit	Hides domain variation; X is contested; may mislead outside calibration range
Anchor-span mapping	Use LS span from "full health" to "as bad as death" to define X	Explicit anchors; traceable to UK guidance	Anchors are empirically uncertain; death-equivalence point is contested
Monetary peg ratio	Convert each metric to £/$ using WTP values, then take ratio	Leverages existing valuations; policy-consistent	Inherits valuation uncertainties; circular if values are WELLBY-derived
SD-equivalence	Treat 1 SD improvement in one metric ≈ 1 SD in another	Standardizes across scales; used in practice	SDs depend on population variance; not measurement-invariant
Empirical crosswalk	Estimate LS = f(health utility, covariates) from datasets with both	Data-driven; can estimate domain-specific mappings	Population-specific; may not generalize; requires joint measurement
Component decomposition	Convert YLL and YLD separately (mortality vs. morbidity paths)	Localizes disagreements; transparent	More complex; requires separate evidence for each path
Multi-metric sensitivity	Present rankings under multiple X values; report robustness	Explicit uncertainty; avoids false precision	Less actionable if rankings are unstable; requires interpretation

When to use each approach

Fixed factor is appropriate when you need a simple baseline for sensitivity analysis, or when communicating to audiences who need a single number.

Anchor-span is preferred when you have good estimates of LS at health extremes and want a traceable derivation. The UK Green Book makes this approach explicit.

Monetary peg works when both metrics already have established £/$ valuations (e.g., from health technology assessment). Useful for policy consistency.

Empirical crosswalk is ideal when you have datasets measuring both health status and life satisfaction for the same individuals. This allows population-specific calibration.

Multi-metric sensitivity is recommended for final decision-making when conversion is contested. It makes uncertainty visible rather than hidden.

Why "empirical crosswalk" is harder than it sounds

The idea of estimating LS = f(health) from datasets that measure both seems straightforward, but complications include:

Selection: People measured on both instruments may not be representative (e.g., patients vs. general population).
Simultaneity: Health affects LS, but LS may also affect health behaviors and reporting.
Ceiling effects: At high health levels, LS variation may reflect non-health factors.
Instrument mismatch: EQ-5D measures "today," while LS measures "overall evaluation." Timing differences matter.

EEPRU's work (Mukuria et al., 2016) found that generic SWB measures are often less sensitive than disease-specific instruments for physical conditions, complicating simple crosswalks.[10]Mukuria, C., et al. (2016). EEPRU research report on SWB measures.

The UK Green Book anchor-span approach (explicit example)

The UK Green Book wellbeing guidance provides an unusually explicit mapping logic:[7]HM Treasury (2021). Wellbeing Guidance for Appraisal: Supplementary Green Book Guidance.

Average LS for those with no health problems ≈ 8 on a 0-10 scale
Assume the LS level equivalent to "as bad as death" (QALY = 0) ≈ 1
Therefore: 1 QALY ↔ (8 - 1) = 7 WELLBY

$X = LS_{full\ health} - LS_{death\ equivalent} = 8 - 1 = 7$

This is valuable not as "the answer" but as an explicit, traceable derivation. The guidance itself flags uncertainty about the low-end anchor and cites alternative evidence suggesting ~2 rather than 1.

What if the death-equivalent anchor is 2 instead of 1?

Peasgood et al. (2018), cited in the UK guidance, found an indifference point around LS = 2. Using this anchor:

$X = 8 - 2 = 6$ WELLBY per QALY

A one-point shift in the anchor changes the conversion factor by ~14%. This illustrates why the neutral/death-equivalence debate is not semantic—it is numerically load-bearing for mortality comparisons.

Core assumptions behind a simple conversion

Any fixed conversion factor (e.g., "1 DALY ≈ X WELLBYs") implicitly assumes several things. Making these explicit helps identify where conversion may be most fragile.

WELLBY cardinality

Equal steps on the 0-10 LS scale correspond to equal welfare changes. A move from 3→4 has the same welfare meaning as 7→8.

If violated: Summing LS points across people/time may distort welfare comparisons.

Interpersonal comparability

A 1-point LS change means the same welfare change for different people (at least approximately).

If violated: Equal reported changes may hide unequal welfare impacts.

Domain invariance

The conversion factor is stable across domains—the same X applies whether the DALY is from malaria, depression, or chronic pain.

If violated: A single factor may systematically over- or under-weight certain domains.

Baseline invariance

The conversion factor doesn't depend on the starting LS or health state of the beneficiary.

If violated: The same health improvement may yield different LS gains at different baselines.

Linearity (no saturation)

Marginal improvements in health produce proportional LS gains across the severity spectrum.

If violated: Linear extrapolation may miss ceiling/floor effects or diminishing returns.

Stable link function

The relationship between health status and life satisfaction is consistent across contexts, populations, and time.

If violated: Cross-study and cross-context comparisons become unreliable.

Plant's Cardinality Thesis decomposition

Plant (2024) provides a useful decomposition of the assumptions needed for cardinal WELLBY use:

C1: Phenomenal cardinality (subjective experiences have inherent magnitudes)
C2: Linearity (equal scale steps = equal welfare differences)
C3: Intertemporal comparability (same person uses scale consistently over time)
C4: Interpersonal comparability (different people's reports are comparable)

This framework helps locate which assumptions are most load-bearing for a given comparison.

Where linear conversion is most likely to go wrong

A constant DALY↔WELLBY factor may be useful as a temporary decision heuristic, but it can fail in predictable ways. This section maps the main failure modes.

Severity and baseline dependence

The LS impact of a given health improvement may depend on the severity of the condition and the baseline LS of the beneficiary. Evidence suggests that people at very low baselines may show either larger or smaller LS responses to health changes, depending on context.

Mental health vs. physical health

EEPRU research found that SWB measures are generally less sensitive to physical health conditions than EQ-5D/SF-6D, while results for depression/mental health are more mixed. This suggests a single conversion factor may systematically mis-weight mental vs. physical health domains.

Duration and adaptation effects

DALYs treat duration as additive (more years = more burden), but LS may show adaptation effects. Someone who adapts to a chronic condition may report similar LS to a healthy person, even though DALYs continue accumulating.

The adaptation paradox for conversion

Adaptation creates a fundamental tension between DALY and WELLBY accounting:

DALY perspective: A person with chronic paraplegia accumulates ~0.3-0.4 YLD per year indefinitely (based on disability weights).
WELLBY perspective: After initial adjustment, LS may return close to pre-injury levels (substantial evidence of hedonic adaptation).

If adaptation is complete, a simple conversion implies the person is "not losing welfare" each year—even though DALYs continue accruing. This is either:

Evidence that DALYs over-count chronic morbidity (the WELLBY perspective wins), or
Evidence that LS under-counts genuine welfare losses that people have adapted to accepting (the DALY perspective wins).

The correct interpretation is a substantive philosophical question, not just a measurement issue.[9]Frijters, P., et al. (2024). Discusses adaptation as a challenge for WELLBY interpretation.

Nonlinearity at extremes

Both scales have ceiling and floor effects:

LS is bounded at 0 and 10; people at high baselines have limited room to improve
Disability weights are bounded at 0 and 1; "worse than death" states are controversial
The relationship between health and LS may be nonlinear, especially at extremes

SD-equivalence fragility

The "1 SD ≈ 1 SD" mapping is particularly vulnerable to variance heterogeneity. If two interventions produce identical welfare impacts but are measured in populations with different baseline variance, they will generate different z-scores. The conversion becomes a function of sample properties, not just treatment effects.

Where simple conversion may work

Within-study comparisons using the same instruments
Similar populations and health domains
Marginal changes from moderate baselines
Sensitivity analysis showing robust rankings

Where simple conversion is risky

Cross-study synthesis with different instruments
Mortality vs. morbidity comparisons
Mental health vs. physical health domains
Extreme severity or extreme baseline LS
LMIC contexts with limited LS calibration data

Conversion Factor Sensitivity Demo

See how the relative ranking of two interventions changes as you vary the assumed DALY↔WELLBY conversion factor.

Intervention A (WELLBY-measured)

Effect size (WELLBYs): 0.5

Cost per person ($): 100

Intervention B (DALY-measured)

Effect size (DALYs averted): 0.05

Cost per person ($): 50

Conversion factor (WELLBYs per DALY): 7

Note: This uses simplified assumptions. Real comparisons involve uncertainty in effect sizes, costs, and the conversion factor itself.

Practical guidance for funders now

Given the uncertainties above, what should funders actually do? This section offers a decision-oriented framework, not a single prescription.

Decision framework by data situation

Only DALYs available: Use DALYs directly; note likely under-capture of mental wellbeing and non-health domains. Consider sensitivity analysis with WELLBY-converted values if mental health is relevant.

Only WELLBYs available: Use WELLBYs; pressure-test measurement assumptions (scale use, comparability). Note that mortality effects may be under-weighted unless explicitly modeled.

Both available: Use both frames; show ranking sensitivity to conversion factor. Report the threshold X* at which ranking reverses.

Mental health central: Do not assume generic DALY conversions are sufficient. Mental health disability weights are contested; direct LS measurement may be more informative.

Mortality comparison: Neutral point assumptions become central. Report results under multiple death-equivalent anchors (e.g., LS = 1, 2, 3).

The least harmful decision procedure

Rather than asking "what is the best exact conversion factor?", consider asking: "which mapping structure causes the least expected decision error?"

This reframing suggests:

Present ranges, not point estimates: A distribution of X values (e.g., 4-10 WELLBYs per DALY) may be more honest than a single number.
Report ranking robustness: If intervention A beats B under all plausible X values, the comparison is robust. If ranking reverses within the plausible range, flag this as a key uncertainty.
Consider domain-specific factors: Use different X values for mortality vs. morbidity, or for physical vs. mental health, if evidence supports this.
Prefer direct measurement where feasible: When possible, measure LS directly rather than converting from DALYs.

Worked example: Finding the "indifference threshold" X*

Suppose you're comparing two interventions:

Intervention A: $50/person, generates 0.3 WELLBYs/person
Intervention B: $100/person, averts 0.05 DALYs/person

Cost-effectiveness in WELLBYs per $1000:

A: 0.3 / $50 × 1000 = 6 WELLBYs/$1000
B (converted): 0.05 × X / $100 × 1000 = 0.5 × X WELLBYs/$1000

B beats A when 0.5X > 6, i.e., when X > 12.

This tells you: if you believe the conversion factor is below 12, fund A; if above 12, fund B. The "indifference threshold" X* = 12 is the key number for decision-making, not the conversion factor itself.

Template: How to report conversion sensitivity in a CEA

When presenting cost-effectiveness analyses that involve DALY↔WELLBY conversion, consider including:

Base case: State your assumed conversion factor and cite the source (e.g., "7 WELLBYs/QALY per UK Green Book").
Sensitivity range: Report results at X = 4, 7, and 10 (or whatever range brackets the literature).
Threshold analysis: Report X* at which ranking reverses. State whether X* falls within the plausible range.
Robustness statement: "Intervention A is preferred under all conversion factors below [X*]" or "Ranking is sensitive to conversion assumptions."

This template makes your assumptions transparent and allows readers with different priors to interpret your results.

What evidence would reduce uncertainty most?

Priority evidence gaps

Direct beneficiary tradeoff studies: How do beneficiaries themselves trade off health improvements against LS improvements? Stated preference methods could anchor conversion factors more directly.
Joint measurement RCTs: Trials that measure both LS and health metrics (EQ-5D, DALYs) for the same intervention, allowing empirical estimation of the LS-health relationship.
Domain-specific mappings: Evidence on whether mental health → LS mapping differs from physical health → LS mapping, and by how much.
LMIC scale-use calibration: Cheap methods for identifying and adjusting scale-use heterogeneity in low-resource settings.
Neutral point studies: Better estimates of the LS level equivalent to "as bad as death" across populations.
SD interchangeability: Evidence on whether SD changes on mental health instruments (PHQ-9, etc.) correspond to comparable welfare changes as SD changes on LS.
Conditions under which rankings change: Systematic analysis of when different conversion approaches lead to different top-charity recommendations.

How might these gaps be filled? Practical research designs

1. Beneficiary tradeoff studies:

Use discrete choice experiments asking beneficiaries to choose between health improvements and income/wellbeing improvements.
Could be embedded in existing RCT follow-up surveys at low marginal cost.

2. Joint measurement:

Add LS questions (1-2 items) to health intervention trials that already measure EQ-5D or SF-6D.
Cost: ~$1-5/participant for additional survey items; high value-of-information.

3. LMIC calibration:

Benjamin et al. (2023) methods could be adapted for LMIC populations using vignettes in local languages.[13]Benjamin, D.J., et al. (2023). "Adjusting for Scale-Use Heterogeneity in Self-Reported Well-Being." NBER WP 31728.
Could be combined with anchoring vignettes (King et al., 2004) for cross-population comparison.

Neutral Point / Mortality Demo

When comparing mortality-reducing interventions to non-mortality wellbeing programs, the assumed "death-equivalent" LS level becomes central.

Death-equivalent LS ($LS_0$): 1

Full-health LS: 8

Life-years saved: 40

Average LS during saved years: 6

Key insight: The UK guidance uses LS = 1 as a working assumption but cites evidence suggesting ~2 may be more accurate. This uncertainty propagates directly into mortality comparisons.

Bottom line

A single, universal DALY↔WELLBY conversion factor probably does not exist in any meaningful sense. The metrics were designed for different purposes, measure different things, and embed different assumptions about what matters for welfare.

The practical goal is not to find "the one true scalar conversion" but to choose the mapping structure that causes the least expected decision error given the information available. Current best practice is likely:

Multi-method: Use more than one conversion approach and compare results
Domain-sensitive: Allow different factors for different health domains if evidence supports
Uncertainty-explicit: Present ranges, scenarios, or distributions rather than single numbers
Decision-focused: Report whether rankings are robust across the plausible range of conversion factors

This page should be useful even for readers who are skeptical of WELLBYs, or skeptical of DALYs. The underlying question—how do we compare interventions that affect different outcomes?—does not go away by ignoring it. Making the mapping structure explicit is better than leaving it implicit.

A note for WELLBY skeptics

If you believe WELLBYs have fundamental measurement problems (scale-use heterogeneity, demand effects, philosophical objections to hedonism), the conversion framework here still applies—just with much wider uncertainty ranges or higher weight on DALY-based evidence.

The practical value of making conversion explicit is that it lets you:

See how much your skepticism matters for specific decisions (threshold analysis)
Communicate your reasoning transparently to others with different priors
Update systematically as new evidence arrives

"WELLBYs are too unreliable to use" is itself an implicit conversion assumption (X ≈ 0 or undefined). Making it explicit is more honest.

A note for DALY skeptics

If you believe DALYs miss important welfare effects (mental health, non-health domains, adaptation, social context), you might prefer to:

Use WELLBYs as the primary metric and convert DALYs to WELLBYs (rather than vice versa)
Apply domain-specific adjustments to DALY-based estimates
Flag DALY-only evidence as potentially understating welfare effects

The conversion framework accommodates this perspective—just flip the direction and acknowledge that mortality comparisons require additional neutral-point assumptions.

Prompts for workshop discussion

These prompts are designed to elicit participant reasoning and surface disagreements:

1. What conversion factor (or range) do you currently use in practice, and what is the basis for it? Is this explicit or implicit in your cost-effectiveness models?

2. Should the conversion factor vary by domain (mental health vs. physical health, mortality vs. morbidity)? What evidence would convince you to use domain-specific factors?

3. How should we handle the death-equivalence anchor? Is LS = 1, 2, or something else the right assumption? Does this depend on population or context?

4. When mapping depression scale improvements to WELLBYs (as in the StrongMinds analysis), what evidence would make the mapping credible? What is the minimum acceptable standard?

5. Is "minimize expected decision error" the right objective, or should we prioritize other properties (transparency, robustness, theoretical consistency)?

6. What single study or evidence type would most reduce your uncertainty about DALY↔WELLBY conversion?

References

The Unjournal Pivotal Questions database, codes DALY_01 / DALY_03 / DALY_05. See also: beliefs elicitation page.
WHO (2020). Methods and data sources for global burden of disease estimates. The framework distinguishes "loss of health" from welfare or quality-of-life. WHO GHE Methods.
WHO (2020). WHO methods and data sources for global burden of disease estimates 2000-2019. Technical paper. PDF.
NICE. "Quality-adjusted life year (QALY)." NICE Glossary. See also: NICE Methods Guide.
HM Treasury (2021). Wellbeing Guidance for Appraisal: Supplementary Green Book Guidance. GOV.UK.
Founders Pledge (2024). Internal cost-effectiveness framework documentation.
HM Treasury (2021). The guidance explicitly derives 7 WELLBYs per QALY from an 8-to-1 LS span.
Peasgood, T., et al. (2018). "The impact of health on wellbeing: A comparison of SWB and health utility instruments." Cited in UK Green Book guidance.
Frijters, P., et al. (2024). "Using wellbeing for public policy: taking stock." Nature Human Behaviour. DOI.
Mukuria, C., et al. (2016). EEPRU report comparing SWB measures and health measures. EEPRU Reports.
GiveWell (2023). "Our Assessment of Happier Lives Institute's Cost-Effectiveness Analysis of StrongMinds." GiveWell.
Plant, M. (2024). "A Happy Possibility About Happiness (And Other Subjective) Scales: An Investigation and Tentative Defence of the Cardinality Thesis." Wellbeing Research Centre working paper.
Benjamin, D.J., et al. (2023). "Adjusting for Scale-Use Heterogeneity in Self-Reported Well-Being." NBER WP 31728.
Bond, T.N. & Lang, K. (2019). "The Sad Truth about Happiness Scales." Journal of Political Economy. DOI.
OECD (2021). "United Kingdom." Case study in OECD Guidelines on Measuring Subjective Well-being. OECD.
HLI (2023). "StrongMinds cost-effectiveness analysis." Happier Lives Institute. happierlivesinstitute.org.
"Least regret" (or "minimax regret") is a formal criterion from decision theory developed by Leonard Savage (1951). The approach chooses the action that minimizes the maximum regret—the difference between the outcome of the chosen action and the best possible outcome—across all possible states of the world.
Murray, C.J.L. & Lopez, A.D. (1996). The Global Burden of Disease. WHO/Harvard/World Bank. Lancet summary.
GBD 2019 Collaborators. "Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019." Lancet 2020. Full text.
EuroQol Group. EQ-5D instruments for measuring health-related quality of life. euroqol.org.
Ubel, P.A., et al. (2005). "Misimagining the unimaginable: The disability paradox and health care decision making." Health Psychology. DOI.
De Wit, G.A., et al. (2000). "Quality of life and the valuation of health state utilities." Journal of Clinical Epidemiology. DOI.
Disease Control Priorities, 3rd Edition (DCP3). World Bank. dcp-3.org. Uses DALYs throughout for LMIC health intervention prioritization.
GiveDirectly evaluations using subjective wellbeing measures alongside consumption data. See GiveDirectly Research.

About this page

Some content on these workshop pages was drafted with AI assistance based on our own source materials, notes, and research. We've reviewed it carefully, but errors or unclear passages may remain. If you spot anything incorrect or confusing, please let us know—ideally via a Hypothes.is annotation on the relevant passage, or by emailing contact@unjournal.org.