google-site-verification: google61e178fb7836a7e6.html
 

Week 4: Examining Metrics

How do we measure impact?

Overview

Millions of pounds annually are diverted to GiveWell top-rated charities, predominantly focusing on global health and development. The impact of these charities are largely determined according to metrics such as QALYs, DALYs, and WALYs. Founders Pledge recommends climate change charities according to how many tonnes of CO2e they avert per dollar, as well as using the scale, neglectedness and tractability framework to determine which category of interventions to focus on. Animal Charity Evaluators used to use animals spared per dollar until 2019, at which point they moved to a more qualitative approach. Countless other organisations use similar metrics to prioritise their allocation of resources. But what do these metrics necessarily represent, how are they computed, and what are their limitations and assumptions?

Much of the Effective Altruism community’s strengths arise from its commitment to the effective allocation of marginal physical and human resources to contribute to the common good. Understanding how prominent organisations such as GiveWell make analyses that contribute to our understanding of how to undertake this task in the field of global health, for example, can better inform our giving and cause prioritisation decisions, while also illuminating the strengths and weaknesses of focusing on quantitative evidence.

This week we will dig into what these metrics actually represent, how they’re calculated in practice, as well as evaluating how effective they are in our quest to maximally improve the world.

Goals for this week

  • Note that quantitative estimates are often used to determine how to effectively allocate resources

  • Explore the idea of using quantitative estimates to determine which interventions or charities are most effective

  • Consider the weaknesses of using some of these metrics, and of optimising for metrics, when making decisions

 

Core Reading

 

Metrics and how they are used

 

Human health and wellbeing

Optional preliminary if you are unfamiliar with QALYs.

 [10m] The cost of NHS health care: Deciding who lives and who dies

 

At least one of

[10m] A guide to Quality Adjusted Life Years (QALYs) - Scottish Medicines

[20m] Calculating QALYs, comparing QALY and DALY calculations  (not for the mathematically faint-hearted, includes some calculus)

 

[10m] Cost-effectiveness - GiveWell

[5m] We care about WALYs not QALYs

 

Not core reading, but highly recommended

[30m] Happier Lives Institute Measuring Happiness (7 posts)

 

Animal welfare

At least one of

[20m] Animal Charity Evaluators Use of Cost-Effectiveness Estimates

[5m] Is it better to be a wild rat or a factory farmed cow? A systematic method for comparing animal welfare

 

Climate change

At least one of

[40m] Climate Change Cause Area Report Chapter 2, pp 26-70

[5m] Top charities for climate change

 

Limitations

[15m] Qaly league tables: Handle with care 

[5m] The Optimizer’s Curse & Wrong-Way Reductions, Summary and Part 1 core reading, rest of the post highly recommended

 

Not core reading, but highly recommended

[60m] Book Review: Seeing Like A State

Exercise (45 mins) 

 

Critical analysis (15 mins) 

 

Using metrics in the real world requires an understanding of how those metrics are best utilised, what they assume about the world, and their limitations. Optimising for the wrong metric or a metric that doesn’t capture the underlying value we’re trying to capture can lead to bad outcomes if we aren’t careful. 

 

In this exercise, we want to try to evaluate the strengths and weaknesses of the application of metrics such as DALYs when trying to maximise the amount of good we do in the world. This will provide us with tools that we can use to determine which interventions, charities or cause areas are most important and deserve our attention and resources. 

 

Choose one established short-term metric from the reading above, and write down 5-10 (or as many as you can of) considerations for and against using such a metric to guide your decisions (15 mins)

 

Metric Creation (20 mins)

 

One potential step in the process of determining which actions to take is evaluating which causes we should focus our time trying to improve or better understand, such as broad categories like “reducing existential risk” and “reducing animal suffering.” 

 

A second potential step is evaluating which interventions within those causes are the best to undertake / fund in order to make progress on improving that specific state of affairs, such as evaluating which charities or policies best contribute to reducing x risk or animal suffering. 

 

In this exercise, we’ll try to create a metric that can be used to guide our decisions about which interventions to choose from when trying to make progress on improving a specific cause area. 

 

Note down which cause areas you think are particularly important / pressing. Choose one of them, and try to create a metric that can be used to analyse the effectiveness of interventions that contribute to improving that cause area. Think about how this metric would be measured in practice, and whether optimising for it would lead to the outcomes you’re trying to create. This may require you to break down the metric into smaller “sub-metrics” that are easier to measure or determine. (15 mins)  

 

This could look something like this: 

  • I care about reducing animal suffering, so want to evaluate charities that reduce animal suffering 

  • I want to maximise the reduction in animal suffering, so a metric could be hours of suffering prevented per pound spent, let’s call this HOSPPS (doesn’t sound quite so cool as a DALY! Hopefully you have better acronym skills than I do) 

  • So how do we measure a charity’s HOSPPS? Let’s break it down and make some assumptions 

    • Assume the average charity is working in a specific geographical location, looking to improve the wellbeing of chickens in ~ 10 factory farms 

    • Assume hours of suffering is equivalent to the number of factory farmed chickens * some number which reflects the “welfare” of those chickens (1 being torturous conditions, 0 being happy chickens) 

    • Analyse the long-term trends in the welfare of these chickens to gauge how things would change if we did nothing 

    • Determine how much the chickens’ welfare changed in these farms since the charity starting operating in this region, providing us with a counterfactual “welfare improvement” 

    • Multiply the number of chickens affected over the time period analysed in the 10 factory farms by our estimate for the counterfactual “welfare improvement,” then divide this total by the charity’s expenditure over this period 

 

This is clearly a very basic example, but starts to get us thinking about how to compare different interventions. 

 

Once you’ve got your metric, write down some ways this metric may not quite capture what you’re trying to measure. The purpose of this is not to make the case that metrics are bad (quite the contrary), but to elucidate the things we should be considering when trying to create robustly good metrics that work well in a variety of situations. (5 mins) 

 

  • In what ways could this be missing a big chunk of the charity’s impact? 

    • The charity may have long-term impact that we don’t capture by looking at changes that have already happened 

    • Best practices may spread outside of the geographical location we’re analysing 

    • Our estimate for how the chickens’ welfare may have changed over time without our intervention may not be accurate 

    • Etc. 

 

Economists and organisations such as GiveWell spend decades fine-tuning the metrics they use for evaluating which interventions are best, or how well certain systems are doing, so don’t worry at all if your metric looks totally flawed. The benefit of this exercise is the process of thinking hard about what we’re actually trying to achieve, and the myriad of ways we can try to measure our progress in achieving these goals. 

 

Further Reading

 

Human health and wellbeing

Using Subjective Well-Being to Estimate the Moral Weights of Averting Deaths and Reducing Poverty (35 mins) 

 

Climate Change 

 

Animal Welfare 

 
google-site-verification: google61e178fb7836a7e6.html