DALY, QALY, and WELLBY Conversion: Mapping Structures, Assumptions, and Practical Guidance for Cross-Intervention Comparison

A Technical Briefing from The Unjournal's Pivotal Questions Initiative

---

Important Note Before We Begin

This document was largely generated with AI assistance and has not yet been fully fact-checked. It is shared for workshop discussion and review purposes. Conversion factor ranges discussed—such as "seven WELLBYs per QALY"—are illustrative and require primary-source verification. Citations to specific papers should be checked against originals. The framing and emphasis may not reflect consensus views in the field.

---

Section One: The Conversion Problem—What Are We Trying to Do?

Funders and evaluators often face a practical comparison problem. One intervention is evaluated in DALYs averted—that is, Disability-Adjusted Life Years—or QALYs gained—Quality-Adjusted Life Years. These are health metrics. Meanwhile, another intervention is evaluated in WELLBYs—Wellbeing-Adjusted Life Years—which measure life satisfaction point-years. To compare these interventions on a common basis requires some form of translation or mapping.

The focal question for this workshop segment is: "How should we translate between health measures—DALYs and QALYs—and subjective wellbeing—WELLBYs—for cross-intervention comparison?"

Sub-questions include: What numerical conversion factor should we use? Should it vary by domain? How should we treat uncertainty?

A key insight to internalize up front: a "conversion" is not a single fact like a currency exchange rate. DALYs and QALYs were designed to measure health burden or health-related quality of life, while WELLBYs are anchored to self-reported life evaluation intended to capture welfare across all domains. These are fundamentally different measurement targets.

The World Health Organization's methods documentation explicitly notes that disability weights are intended to quantify "loss of health" rather than general welfare or social undesirability.

Here is the key workshop framing: The goal is not to find "the one true conversion factor," but to choose—and stress-test—a mapping structure that leads to the least expected decision error given the information available.

Let me describe the measurement-to-decision pipeline visually. Imagine a flow chart. On the left, you have Intervention A, measured in DALYs. Below that, Intervention B, measured in WELLBYs. Both feed into a central box representing the mapping or conversion choice. From there, the flow continues to a "common comparison unit or multi-metric frame," and finally to "ranking and allocation decisions."

The mapping choice sits between measurement and decision. Different mappings embed different assumptions—about what welfare is, how metrics relate to it, and what level of precision is appropriate.

Why is this a "forced comparison" rather than a scientific question? Because funders cannot wait for perfect measurement. If you must allocate one million dollars between a malaria program and a mental health program this year, you are implicitly adopting some conversion—even if that conversion is "treat them as incomparable," which is itself a choice.

The practical question is: given imperfect information, what mapping structure leads to least regret? This is different from asking "what is the true conversion factor?"—a question that may not have a coherent answer.

Who actually uses DALY-to-WELLBY conversions in practice? Several organizations do:

Founders Pledge compares interventions across their four "ways of doing good"—lives, DALYs, WELLBYs, and income doublings. They explicitly flag the DALY-to-WELLBY conversion as a key uncertainty.

The Happier Lives Institute uses WELLBY-based cost-effectiveness analysis for mental health interventions.

GiveWell primarily uses DALYs but engages with WELLBY evidence when evaluating mental health interventions.

The UK Government uses WELLBYs for policy appraisal via its Green Book guidance.

---

Section Two: Definitions and Notation

Before discussing conversion, we need clear definitions of the objects being converted. These definitions follow primary sources—WHO, NICE, and the UK Green Book—rather than informal usage.

First, the DALY—Disability-Adjusted Life Year. A DALY is a measure of health burden combining two components:

YLL, or Years of Life Lost, equals deaths multiplied by years lost to premature mortality.

YLD, or Years Lived with Disability, equals incidence multiplied by duration multiplied by disability weight.

Mathematically: DALY equals YLL plus YLD.

The disability weight is a number between zero and one. Zero represents full health, and one represents a death-equivalent state. The Global Burden of Disease 2010 study and onwards moved to simplified prevalence-based calculations and removed earlier discounting and age-weighting.

Where do these disability weights come from? They are elicited through population surveys using choice-based methods. In paired comparisons, respondents are asked: "Which of these two health states would you consider worse?" In population health equivalence questions: "Which is worse: one thousand people with condition X or two thousand with condition Y?"

The GBD 2019 study collected approximately sixty thousand responses from nine countries to set disability weights for 234 health states. Importantly, respondents are asked specifically about health loss—not overall welfare, social functioning, or general quality of life. This is a deliberate methodological choice that makes DALYs narrower than some alternatives.

Second, the QALY—Quality-Adjusted Life Year. A QALY is a measure of health benefit that adjusts life-years by health-related quality of life.

Mathematically: QALY equals the sum over time of the health utility weight multiplied by the time interval.

The utility weight is between zero and one—though sometimes negative for "worse than death" states. One QALY equals one year in perfect health.

What is the relationship between DALYs and QALYs? They are often treated as opposites—DALYs measure burden while QALYs measure benefit—but the relationship is more complex.

Disability weights do not simply equal one minus utility weights. The elicitation methods are different, and empirical mappings show poor correspondence at the extremes.

Reference states differ. DALYs reference "full health," while QALYs allow negative values for "worse than death."

Aggregation differs. DALYs are population-summed burdens; QALYs are individual-level benefit calculations.

For conversion purposes, "one DALY averted equals approximately one QALY gained" is a common working assumption, but it is not definitionally true.

Third, the WELLBY—Wellbeing-Adjusted Life Year. A WELLBY is a measure of welfare based on self-reported life satisfaction.

Mathematically: WELLBY equals the sum across individuals and time of the discount factor multiplied by life satisfaction.

Life satisfaction is measured on a zero-to-ten scale. One WELLBY equals one person experiencing a one-point life satisfaction increase sustained for one year.

What question actually produces the life satisfaction score? The standard OECD and UK Office for National Statistics question is:

"Overall, how satisfied are you with your life nowadays?"

Respondents answer on a scale from zero—meaning "not at all satisfied"—to ten—meaning "completely satisfied."

This is an evaluative measure. It asks for a cognitive assessment of one's life as a whole, not a measure of current mood or momentary affect. Other common subjective wellbeing questions—about happiness, worthwhileness, or anxiety—capture different constructs and do not combine into WELLBYs in the same way.

Why are these different objects? The metrics have different scopes and measurement architectures.

DALYs and QALYs focus on health. They do not necessarily capture non-health welfare—income, relationships, meaning.

WELLBYs aim to capture overall evaluative wellbeing via self-report. They integrate across domains but depend on how people interpret and use response scales.

DALYs and QALYs use externally-elicited weights—disability weights and health utility tariffs—while WELLBYs use direct self-reported scores.

An important conceptual note: WHO's methods documentation explicitly states that the DALY framework evolved from earlier "welfare" and "quality of life" framing toward quantifying loss of health—specifically, departures from perfect health. Disability weights are intended to reflect health states, not social value, stigma, or general quality of life.

This means DALYs and WELLBYs are measuring different targets, even in principle. A "conversion" is really a mapping between different proxies, not a unit transformation within the same underlying construct.

---

Section Three: Why This Matters in Practice

The conversion problem is not merely academic. Funders like Founders Pledge, GiveWell, and Open Philanthropy must compare interventions across different outcome spaces. Their four main "ways of doing good" include: lives saved, DALYs averted, WELLBYs generated, and income doublings.

Let me give two canonical comparison examples.

First example: Malaria bednets versus mental health treatment. Malaria interventions are typically evaluated in DALYs averted—through mortality risk reduction and reduced morbidity. Psychotherapy programs like StrongMinds are often evaluated with depression scales or life satisfaction measures. Comparing them requires mapping across metrics.

Second example: Cash transfers versus health interventions. Cash transfer randomized controlled trials often measure consumption, income, and life satisfaction. Health interventions measure DALYs or QALYs. Cost-effectiveness comparison requires choosing how to weight these different outcomes.

The stakes are real. Different conversion assumptions can materially change which intervention appears more cost-effective. This is not a reason to avoid conversion, but a reason to be explicit about what is being assumed.

GiveWell's analysis of StrongMinds explicitly highlights that translating depression improvements into life satisfaction is a key uncertainty, because many psychotherapy studies do not report life satisfaction outcomes directly. The mapping model matters for the final cost-effectiveness estimate.

Let me elaborate on the StrongMinds, Happier Lives Institute, and GiveWell controversy.

The Happier Lives Institute ranked StrongMinds—a mental health NGO—as potentially more cost-effective than GiveWell's top charities. This ranking was primarily based on WELLBY-measured impacts from group therapy.

GiveWell's reanalysis raised several concerns.

First, the depression-to-life-satisfaction mapping: Studies measured depression severity using instruments like the PHQ-9, not life satisfaction. Converting requires assumptions about the depression-life-satisfaction relationship.

Second, effect durability: What persistence should we assume for psychotherapy effects? The Happier Lives Institute's assumptions were more optimistic.

Third, spillover effects: HLI counted benefits to household members; GiveWell was more skeptical of the evidence base for these spillovers.

This controversy illustrates how DALY-to-WELLBY mapping issues can materially affect charity recommendations—not just academic debates.

What about implicit conversions? What happens when we don't choose explicitly?

Refusing to make a conversion explicit doesn't avoid the problem—it just hides it. Common implicit approaches include:

"Only compare within metrics"—This effectively assigns infinite value to one metric and zero to the other across boundaries.

"Fund both categories proportionally"—This implicitly assumes some conversion ratio equal to budget shares, regardless of cost-effectiveness.

"Use expert judgment case-by-case"—This may embed inconsistent conversions across different decisions.

Making conversion assumptions explicit—even with wide uncertainty ranges—is almost always preferable to leaving them implicit.

---

Section Four: Candidate Conversion Approaches

There is no single "correct" method for converting DALYs to WELLBYs. Instead, there are several candidate approaches, each with different data requirements, assumptions, and failure modes. Let me walk through seven approaches.

First approach: Fixed conversion factor. This assumes one QALY equals approximately X WELLBYs, where X is a constant. The main strengths are simplicity, easy sensitivity analysis, and explicitness. The main limitations are that it hides domain variation, the value of X is contested, and it may mislead outside its calibration range.

Second approach: Anchor-span mapping. This uses the life satisfaction span from "full health" to "as bad as death" to define X. Its strengths are explicit anchors and traceability to UK guidance. Its limitations are that the anchors are empirically uncertain and the death-equivalence point is contested.

Third approach: Monetary peg ratio. Convert each metric to pounds or dollars using willingness-to-pay values, then take the ratio. Strengths include leveraging existing valuations and policy consistency. Limitations include inheriting valuation uncertainties, and potential circularity if the monetary values were themselves WELLBY-derived.

Fourth approach: Standard deviation equivalence. Treat a one standard deviation improvement in one metric as approximately equal to one standard deviation in another. Strengths include standardization across scales—this is used in practice. Limitations: standard deviations depend on population variance and this approach is not measurement-invariant.

Fifth approach: Empirical crosswalk. Estimate life satisfaction as a function of health utility and covariates from datasets that measure both. Strengths: data-driven; can estimate domain-specific mappings. Limitations: population-specific; may not generalize; requires joint measurement of both outcomes.

Sixth approach: Component decomposition. Convert Years of Life Lost and Years Lived with Disability separately—that is, separate mortality and morbidity conversion paths. Strengths: localizes disagreements; transparent. Limitations: more complex; requires separate evidence for each path.

Seventh approach: Multi-metric sensitivity. Present rankings under multiple values of X; report robustness. Strengths: explicit uncertainty; avoids false precision. Limitations: less actionable if rankings are unstable; requires interpretation by decision-makers.

When should you use each approach?

Fixed factor is appropriate when you need a simple baseline for sensitivity analysis, or when communicating to audiences who need a single number.

Anchor-span is preferred when you have good estimates of life satisfaction at health extremes and want a traceable derivation. The UK Green Book makes this approach explicit.

Monetary peg works when both metrics already have established pound or dollar valuations—for example, from health technology assessment. Useful for policy consistency.

Empirical crosswalk is ideal when you have datasets measuring both health status and life satisfaction for the same individuals. This allows population-specific calibration.

Multi-metric sensitivity is recommended for final decision-making when conversion is contested. It makes uncertainty visible rather than hidden.

Why is "empirical crosswalk" harder than it sounds? The idea of estimating life satisfaction as a function of health from datasets that measure both seems straightforward, but complications include:

Selection: People measured on both instruments may not be representative—for example, patients versus the general population.

Simultaneity: Health affects life satisfaction, but life satisfaction may also affect health behaviors and reporting.

Ceiling effects: At high health levels, life satisfaction variation may reflect non-health factors.

Instrument mismatch: The EQ-5D measures health "today," while life satisfaction measures "overall evaluation." Timing differences matter.

Research from the EEPRU group—Mukuria and colleagues, 2016—found that generic subjective wellbeing measures are often less sensitive than disease-specific instruments for physical conditions, complicating simple crosswalks.

Now let me explain the UK Green Book anchor-span approach with an explicit example.

The UK Green Book wellbeing guidance—published by HM Treasury in 2021—provides an unusually explicit mapping logic.

Step one: Average life satisfaction for those with no health problems is approximately eight on the zero-to-ten scale.

Step two: Assume the life satisfaction level equivalent to "as bad as death"—where QALY equals zero—is approximately one.

Step three: Therefore, one QALY corresponds to eight minus one, which equals seven WELLBYs.

In mathematical notation: X equals life satisfaction at full health minus life satisfaction at death equivalent, which is eight minus one, equaling seven.

This is valuable not as "the answer" but as an explicit, traceable derivation. The guidance itself flags uncertainty about the low-end anchor and cites alternative evidence suggesting approximately two rather than one.

What if the death-equivalent anchor is two instead of one? Peasgood and colleagues, in 2018—cited in the UK guidance—found an indifference point around life satisfaction of two. Using this anchor:

X equals eight minus two, which equals six WELLBYs per QALY.

A one-point shift in the anchor changes the conversion factor by approximately fourteen percent. This illustrates why the neutral point and death-equivalence debate is not semantic—it is numerically load-bearing for mortality comparisons.

---

Section Five: Core Assumptions Behind a Simple Conversion

Any fixed conversion factor—such as "one DALY equals approximately X WELLBYs"—implicitly assumes several things. Making these explicit helps identify where conversion may be most fragile. Let me describe six core assumptions.

First assumption: WELLBY cardinality. This assumes that equal steps on the zero-to-ten life satisfaction scale correspond to equal welfare changes. A move from three to four has the same welfare meaning as a move from seven to eight. If this is violated, summing life satisfaction points across people and time may distort welfare comparisons.

Second assumption: Interpersonal comparability. A one-point life satisfaction change means the same welfare change for different people, at least approximately. If violated, equal reported changes may hide unequal welfare impacts.

Third assumption: Domain invariance. The conversion factor is stable across domains—the same X applies whether the DALY is from malaria, depression, or chronic pain. If violated, a single factor may systematically over- or under-weight certain health domains.

Fourth assumption: Baseline invariance. The conversion factor doesn't depend on the starting life satisfaction or health state of the beneficiary. If violated, the same health improvement may yield different life satisfaction gains at different baselines.

Fifth assumption: Linearity, or no saturation. Marginal improvements in health produce proportional life satisfaction gains across the severity spectrum. If violated, linear extrapolation may miss ceiling effects, floor effects, or diminishing returns.

Sixth assumption: Stable link function. The relationship between health status and life satisfaction is consistent across contexts, populations, and time. If violated, cross-study and cross-context comparisons become unreliable.

Michael Plant, in a 2025 working paper, provides a useful decomposition of the assumptions needed for cardinal WELLBY use. He identifies four components:

C-1: Phenomenal cardinality—subjective experiences have inherent magnitudes.

C-2: Linearity—equal scale steps correspond to equal welfare differences.

C-3: Intertemporal comparability—the same person uses the scale consistently over time.

C-4: Interpersonal comparability—different people's reports are comparable.

This framework helps locate which assumptions are most load-bearing for a given comparison.

---

Section Six: Where Linear Conversion Is Most Likely to Go Wrong

A constant DALY-to-WELLBY factor may be useful as a temporary decision heuristic, but it can fail in predictable ways. Let me map the main failure modes.

First failure mode: Severity and baseline dependence.

The life satisfaction impact of a given health improvement may depend on the severity of the condition and the baseline life satisfaction of the beneficiary. Evidence suggests that people at very low baselines may show either larger or smaller life satisfaction responses to health changes, depending on context.

Consider the comparison between mental health and physical health. Research from the EEPRU group found that subjective wellbeing measures are generally less sensitive to physical health conditions than instruments like the EQ-5D or SF-6D, while results for depression and mental health are more mixed. This suggests a single conversion factor may systematically mis-weight mental versus physical health domains.

Second failure mode: Duration and adaptation effects.

DALYs treat duration as additive—more years equals more burden. But life satisfaction may show adaptation effects. Someone who adapts to a chronic condition may report similar life satisfaction to a healthy person, even though DALYs continue accumulating.

This creates what I call the adaptation paradox for conversion.

From the DALY perspective: A person with chronic paraplegia accumulates approximately 0.3 to 0.4 Years Lived with Disability per year, indefinitely, based on disability weights.

From the WELLBY perspective: After initial adjustment, life satisfaction may return close to pre-injury levels—there is substantial evidence of hedonic adaptation.

If adaptation is complete, a simple conversion implies the person is "not losing welfare" each year—even though DALYs continue accruing. This is either evidence that DALYs over-count chronic morbidity—meaning the WELLBY perspective wins—or evidence that life satisfaction under-counts genuine welfare losses that people have adapted to accepting—meaning the DALY perspective wins.

The correct interpretation is a substantive philosophical question, not just a measurement issue.

Third failure mode: Nonlinearity at extremes.

Both scales have ceiling and floor effects.

Life satisfaction is bounded at zero and ten. People at high baselines have limited room to improve.

Disability weights are bounded at zero and one. "Worse than death" states are controversial.

The relationship between health and life satisfaction may be nonlinear, especially at extremes.

Fourth failure mode: Standard deviation equivalence fragility.

The "one standard deviation equals one standard deviation" mapping is particularly vulnerable to variance heterogeneity. If two interventions produce identical welfare impacts but are measured in populations with different baseline variance, they will generate different z-scores. The conversion becomes a function of sample properties, not just treatment effects.

Let me summarize where simple conversion may work versus where it is risky.

Simple conversion may work in these situations: within-study comparisons using the same instruments; similar populations and health domains; marginal changes from moderate baselines; and sensitivity analysis showing robust rankings.

Simple conversion is risky in these situations: cross-study synthesis with different instruments; mortality versus morbidity comparisons; mental health versus physical health domains; extreme severity or extreme baseline life satisfaction; and low- and middle-income country contexts with limited life satisfaction calibration data.

The written version of this report includes two interactive demonstrations. Let me describe what they show conceptually.

The Conversion Factor Sensitivity Demo allows you to see how the relative ranking of two interventions changes as you vary the assumed DALY-to-WELLBY conversion factor. You can adjust the effect sizes and costs of two hypothetical interventions and observe how the cost-effectiveness comparison changes across different conversion factors. The key output is the "indifference threshold"—the conversion factor at which the ranking reverses.

---

Section Seven: Practical Guidance for Funders Now

Given the uncertainties described above, what should funders actually do? This section offers a decision-oriented framework, not a single prescription.

Here is a decision framework organized by data situation.

If only DALYs are available: Use DALYs directly. Note the likely under-capture of mental wellbeing and non-health domains. Consider sensitivity analysis with WELLBY-converted values if mental health is relevant to the intervention.

If only WELLBYs are available: Use WELLBYs directly. Pressure-test measurement assumptions around scale use and comparability. Note that mortality effects may be under-weighted unless explicitly modeled.

If both metrics are available: Use both frames. Show ranking sensitivity to the conversion factor. Report the threshold X-star at which ranking reverses.

If mental health is central to the intervention: Do not assume generic DALY conversions are sufficient. Mental health disability weights are contested. Direct life satisfaction measurement may be more informative.

For mortality comparisons: Neutral point assumptions become central. Report results under multiple death-equivalent anchors—for example, life satisfaction equal to one, two, or three.

The least harmful decision procedure

Rather than asking "what is the best exact conversion factor?", consider asking: "which mapping structure causes the least expected decision error?"

This reframing suggests four practices.

First, present ranges, not point estimates. A distribution of X values—for example, four to ten WELLBYs per DALY—may be more honest than a single number.

Second, report ranking robustness. If intervention A beats intervention B under all plausible X values, the comparison is robust. If ranking reverses within the plausible range, flag this as a key uncertainty.

Third, consider domain-specific factors. Use different X values for mortality versus morbidity, or for physical versus mental health, if evidence supports this.

Fourth, prefer direct measurement where feasible. When possible, measure life satisfaction directly rather than converting from DALYs.

Let me walk through a worked example: Finding the "indifference threshold" X-star.

Suppose you're comparing two interventions.

Intervention A costs fifty dollars per person and generates 0.3 WELLBYs per person.

Intervention B costs one hundred dollars per person and averts 0.05 DALYs per person.

Calculate cost-effectiveness in WELLBYs per one thousand dollars.

For Intervention A: 0.3 divided by fifty, multiplied by one thousand, equals six WELLBYs per one thousand dollars.

For Intervention B, converted: 0.05 times X, divided by one hundred, multiplied by one thousand, equals 0.5 times X WELLBYs per one thousand dollars.

Intervention B beats Intervention A when 0.5X is greater than six—that is, when X is greater than twelve.

This tells you: if you believe the conversion factor is below twelve, fund A; if above twelve, fund B. The indifference threshold X-star of twelve is the key number for decision-making, not the conversion factor itself.

Here is a template for how to report conversion sensitivity in a cost-effectiveness analysis.

Step one—Base case: State your assumed conversion factor and cite the source. For example: "seven WELLBYs per QALY per UK Green Book."

Step two—Sensitivity range: Report results at X equals four, seven, and ten—or whatever range brackets the literature.

Step three—Threshold analysis: Report X-star at which ranking reverses. State whether X-star falls within the plausible range.

Step four—Robustness statement: Either "Intervention A is preferred under all conversion factors below X-star" or "Ranking is sensitive to conversion assumptions."

This template makes your assumptions transparent and allows readers with different priors to interpret your results.

---

Section Eight: What Evidence Would Reduce Uncertainty Most?

Let me outline the priority evidence gaps.

First gap: Direct beneficiary tradeoff studies. How do beneficiaries themselves trade off health improvements against life satisfaction improvements? Stated preference methods could anchor conversion factors more directly.

Second gap: Joint measurement randomized controlled trials. Trials that measure both life satisfaction and health metrics—such as EQ-5D and DALYs—for the same intervention would allow empirical estimation of the life-satisfaction-to-health relationship.

Third gap: Domain-specific mappings. We need evidence on whether mental health to life satisfaction mapping differs from physical health to life satisfaction mapping, and by how much.

Fourth gap: Low- and middle-income country scale-use calibration. We need cheap methods for identifying and adjusting scale-use heterogeneity in low-resource settings.

Fifth gap: Neutral point studies. We need better estimates of the life satisfaction level equivalent to "as bad as death" across different populations.

Sixth gap: Standard deviation interchangeability. We need evidence on whether standard deviation changes on mental health instruments like the PHQ-9 correspond to comparable welfare changes as standard deviation changes on life satisfaction.

Seventh gap: Conditions under which rankings change. We need systematic analysis of when different conversion approaches lead to different top-charity recommendations.

How might these gaps be filled? Here are some practical research designs.

For beneficiary tradeoff studies: Use discrete choice experiments asking beneficiaries to choose between health improvements and income or wellbeing improvements. These could be embedded in existing RCT follow-up surveys at low marginal cost.

For joint measurement: Add life satisfaction questions—just one or two items—to health intervention trials that already measure EQ-5D or SF-6D. Cost: approximately one to five dollars per participant for additional survey items, with high value of information.

For LMIC calibration: Benjamin and colleagues' 2023 methods could be adapted for low- and middle-income country populations using vignettes in local languages. Could be combined with anchoring vignettes from King and colleagues, 2004, for cross-population comparison.

The written version includes a second interactive demonstration—the Neutral Point and Mortality Demo—which shows how the assumed "death-equivalent" life satisfaction level affects comparisons between mortality-reducing interventions and non-mortality wellbeing programs. The key insight is that the UK guidance uses life satisfaction of one as a working assumption but cites evidence suggesting approximately two may be more accurate. This uncertainty propagates directly into mortality comparisons.

---

Section Nine: Bottom Line

A single, universal DALY-to-WELLBY conversion factor probably does not exist in any meaningful sense. The metrics were designed for different purposes, measure different things, and embed different assumptions about what matters for welfare.

The practical goal is not to find "the one true scalar conversion" but to choose the mapping structure that causes the least expected decision error given the information available.

Current best practice is likely:

Multi-method: Use more than one conversion approach and compare results.

Domain-sensitive: Allow different factors for different health domains if evidence supports this.

Uncertainty-explicit: Present ranges, scenarios, or distributions rather than single numbers.

Decision-focused: Report whether rankings are robust across the plausible range of conversion factors.

This analysis should be useful even for readers who are skeptical of WELLBYs, or skeptical of DALYs. The underlying question—how do we compare interventions that affect different outcomes?—does not go away by ignoring it. Making the mapping structure explicit is better than leaving it implicit.

A note for WELLBY skeptics: If you believe WELLBYs have fundamental measurement problems—scale-use heterogeneity, demand effects, philosophical objections to hedonism—the conversion framework here still applies. You would simply use much wider uncertainty ranges or place higher weight on DALY-based evidence.

The practical value of making conversion explicit is that it lets you see how much your skepticism matters for specific decisions through threshold analysis. It lets you communicate your reasoning transparently to others with different priors. And it lets you update systematically as new evidence arrives.

"WELLBYs are too unreliable to use" is itself an implicit conversion assumption—essentially setting X to zero or undefined. Making it explicit is more honest.

A note for DALY skeptics: If you believe DALYs miss important welfare effects—mental health, non-health domains, adaptation, social context—you might prefer to use WELLBYs as the primary metric and convert DALYs to WELLBYs rather than vice versa. You might apply domain-specific adjustments to DALY-based estimates. Or you might flag DALY-only evidence as potentially understating welfare effects.

The conversion framework accommodates this perspective—just flip the direction and acknowledge that mortality comparisons require additional neutral-point assumptions.

---

Prompts for Workshop Discussion

These prompts are designed to elicit participant reasoning and surface disagreements.

Prompt one: What conversion factor—or range—do you currently use in practice, and what is the basis for it? Is this explicit or implicit in your cost-effectiveness models?

Prompt two: Should the conversion factor vary by domain—mental health versus physical health, mortality versus morbidity? What evidence would convince you to use domain-specific factors?

Prompt three: How should we handle the death-equivalence anchor? Is life satisfaction of one, two, or something else the right assumption? Does this depend on population or context?

Prompt four: When mapping depression scale improvements to WELLBYs—as in the StrongMinds analysis—what evidence would make the mapping credible? What is the minimum acceptable standard?

Prompt five: Is "minimize expected decision error" the right objective, or should we prioritize other properties—transparency, robustness, theoretical consistency?

Prompt six: What single study or evidence type would most reduce your uncertainty about DALY-to-WELLBY conversion?

---

This concludes the technical briefing on DALY, QALY, and WELLBY conversion. The full written version with interactive demonstrations and detailed references is available at the Unjournal Wellbeing Workshop website.

Thank you for listening.