Speakers: Dan Benjamin (UCLA/NBER), Miles Kimball (CU Boulder)
Authors will share an outline of their presentation and invitation for feedback in advance. Participants are encouraged to pre-read and pre-ask questions.
Unjournal Evaluation: Scale-Use Heterogeneity
View the evaluation summary and full paper on PubPub →
Overview
This segment presents the core research findings from Benjamin et al.[1]Benjamin, Cooper, Heffetz, Kimball & Zhou (2023). "Adjusting for Scale-Use Heterogeneity in Self-Reported Well-Being." Full paper available via the Unjournal evaluation. on scale-use heterogeneity in wellbeing surveys. The authors will discuss how different individuals use the 0-10 life satisfaction scale differently,[2]Scale-use heterogeneity can manifest as: (1) different "anchoring" points, (2) different ranges used, (3) different interpretations of scale labels. All create bias in cross-person comparisons. and propose calibration methods to address this measurement challenge.
Key Topics
- Evidence for scale-use heterogeneity across populations[3]The paper finds substantial heterogeneity in how people use scales—but also that changes in wellbeing may be more comparable than levels, which is encouraging for intervention evaluation.
- Implications for WELLBY calculations and comparisons
- Proposed calibration approaches (vignette anchoring, etc.)[4]Vignette anchoring: asking respondents to rate hypothetical people's wellbeing. This reveals individual scale-use patterns and enables adjustment. Trade-off: increased survey length.
- Limitations and open questions[5]Key open questions include: (1) Do calibration methods work in LMIC contexts? (2) Does scale-use differ systematically between treatment and control groups (causing bias)? (3) How much precision do simple methods sacrifice?
Notes
- Benjamin, Cooper, Heffetz, Kimball & Zhou (2023). "Adjusting for Scale-Use Heterogeneity in Self-Reported Well-Being." Full paper available via the Unjournal evaluation.
- Scale-use heterogeneity can manifest as: (1) different "anchoring" points, (2) different ranges used, (3) different interpretations of scale labels. All create bias in cross-person comparisons.
- The paper finds substantial heterogeneity in how people use scales—but also that changes in wellbeing may be more comparable than levels, which is encouraging for intervention evaluation.
- Vignette anchoring: asking respondents to rate hypothetical people's wellbeing. This reveals individual scale-use patterns and enables adjustment. Trade-off: increased survey length.
- Key open questions include: (1) Do calibration methods work in LMIC contexts? (2) Does scale-use differ systematically between treatment and control groups (causing bias)? (3) How much precision do simple methods sacrifice?