Goal Setting: “Measurable” shouldn't mean “Miserable”

A quick exercise for defining “measurable” OKRs
11 minute read

By now we’re all familiar with OKRs. Objectives and Key Results were invented by Intel’s Andy Grove, passed onto Kleiner Perkins’ John Doerr, who then spread the gospel to his portfolio companies, most notably Google. Larry Page credits OKRs as the managerial secret-sauce behind their rapid growth.

As a testament to the power (and the pain) that comes in defining OKRs, Google’s CEO Sundar Pichai describes the process as “agonizing.”

“There are single OKR lines on which you can spend an hour and a half thinking, to make sure we are focused on doing something better for the user.” — Sundar Pichai to John Doerr in “Measure What Matters”

(Fun Fact: Sundar’s annual comp from Google is ~$80M / year (src), so that comes out to a cost of ~$60k per Key Result).

Key Results (KRs) are “measurable”

Assuming your organization has also imbibed the OKR Kool Aid, you’ve probably run into this “agonizing” guideline: Key Results are how we “prove” that our team has accomplished the objective, so they should be measurable and indisputable.

“The key result has to be measurable. But at the end you can look, and without any arguments: Did I do that or did I not do it?” — John Doerr (src)

In “How to set good OKRs,” Weekdone offers even stricter guidelines, advising that Key Results should NOT be:

  • “Binary, they should be numeric and measurable”
  • “Tasks to be achieved. Key results are metrics!”

Are Measurable K.R.s just B.S.?

At first I thought strict measurability was just productivity dogma, another thing an agile coach preaches, because their own agile guru passed it down to them. But measurable KRs can really help avoid common goal-setting traps:

If our KRs become a todo list, we’re incentivized to just check the boxes rather than accomplish the intent of the objective. For example, if our objective is: “Bake the greatest pizza in the world,” it’s easy to a) buy great ingredients, b) find a brick oven, c) bake the pizza, etc. but there is a lot of wiggle room between a, b, c and “the greatest pizza in the world.”

If we dictate tasks, we rob our teams of the chance to develop their own solutions for accomplishing the objective. Giving teams space to solve problems creatively and independently is one of the biggest benefits of human-centered performance metrics.

Thinky Pain

The challenge:

Defining “measurable” Key Results (KRs) is easier said than done. Even after years of practice, I still struggle against the impulse to think “stuff we need to do” instead of “measurable outcomes we want to accomplish.”

Plenty of my clients and teammates struggle with this as well. The root cause: Our brains are better at articulating things that are immediate and familiar. In the case of performance goals, the tasks that we do all-day-every-day are more immediate and familiar, therefore they come to mind first. But metrics are more abstract and removed, so it takes effort to think in those terms.

This struggle is totally normal. The “tasks” we think of first are just our subconscious pointing the way to the “key results” we actually care about.

And just because “measurable” key results don’t come naturally doesn’t mean we shouldn’t strive for them. As Doerr describes it, OKRs are a muscle that people need to exercise to develop.

The Hacks:

One day, the brilliant Liz DeLuca and I were working on OKRs and, unsurprisingly, we noticed many of the teams’ goals weren’t goals at all, they were tasks. The conversation turned introspective (as it often would). Why is it so hard to break this habit? Why isn’t this more intuitive?

During this particular session I happened to be reading Douglas Hubbard’s “How to Measure Anything.” In addition to offering an entire philosophy of measurement and decision-making, the book offers a mountain of useful tactics for thinking about measurement.

As an experiment, we decided to apply some of Hubbard’s approaches to our goal-setting exercises — and after some trial and error — it works! We used these to successfully define OKRs across a variety of initiatives and it’s been my goto approach ever since.

When defining Key Results, instead of fighting against the instinct to generate Tasks, use them as a starting point, then do the following steps to dive deeper:

  1. Ask “Why?” First, ask “Why might this task be important?” What would the task accomplish? Repeat this until you arrive at a root Outcome. This helps decompose the Objective into the Outcomes that really matter.
  2. Ask “What will we observe?” Second, for each Outcome from #1, ask “What might we observe in the world if this were true?” Brainstorm different possible changes you might observe in the world around you.

The Steps: Consider an Objective → Brainstorm Tasks → Ask “Why?” → Outcomes → Ask “What will we observe?” → Observations → Good KRs.

Let’s explore a pizza-based example.

Pizza Examples are the Best Examples

Objective: “Bake the greatest pizza in the world”

Let’s give ourselves a vague, but inspiring objective to test out these techniques. Let’s say we want to “bake the greatest pizza in the world.”

Like most people, when we think “What does it mean to bake the greatest pizza in the world?” we immediately think of the tasks we need to do:

  • Task: Buy really fresh ingredients
  • Task: Watch 200 pizza related shows on Netflix
  • Task: Find a brick oven
  • Task: Settle on a recipe

When we bake the greatest pizza in the world, we might in fact do all of the Tasks listed above, but none of these tasks prove that we’ve baked the greatest pizza in the world. They also don’t leave room for alternative solutions (e.g. there are surely better ways to learn than “watching Netflix”).

These tasks aren’t yet good KRs, but they’re a fair starting point. We just need to dig a bit deeper with some some “why?” laddering.

Hack #1: Asking “Why?”

Let’s unpack 2 of the Tasks we generated above as examples.

Example Task # 1: Why would we “buy fresh ingredients?”

It seems intuitive that the Task “buy fresh ingredients” is important, but what’s the impact we’re expecting as a result of fresh ingredients? A few possibilities:

  • Fresh ingredients will make the pizza taste better,
  • Fresh ingredients might make it healthier (or healthy-ish?)
  • And maybe fresh ingredients help with texture and structure?

These are a great start. By looking at the potential impacts of the Task “Buy fresh ingredients,” we’ve identified 3 qualities that are important to our objective: “Great pizza” means “good taste, healthy-ish, well-structured.”

Example Task #2: Why “watch 200 pizza-related shows on Netflix?”

While Netflix surely has 200 full episodes of celebrity / comedian chefs sampling the best pizza from Antartica to Atlantis, spending 100+ hours watching Netflix seems like a circuitous route to baking the greatest pizza in the world.

But let’s go with it. Remember, the tasks we first think of are just our subconscious’ weird way to point us to the Outcomes that matter. So what are the positive impacts we might expect from binge-watching cooking shows?

  • Cooking shows help us identify “unknown unknowns” in making pizza
  • Shows help us understand the different varieties of pizza (and nuances to the craft of pizza making)
  • It’ll help us learn new techniques without re-inventing the wheel

We’re getting closer, but these still feel distant. So let’s ask “why?” again.

  • How will “identifying ‘unknown unknowns’” help us bake the greatest pizza in the world? So that we’re not surprised when we try to make the pizza for ourselves. If we invite our friends over for pizza, we want to feel confident we can serve something edible. We want a reliable process for baking the pizza.
  • How will “understanding the nuance of pizza” help us bake the greatest pizza in the world? Maybe because it helps us bake tastier pizza (this is the tasty Outcome from our first example).
  • How will “not reinventing the wheel” help us bake the greatest pizza in the world? It means we don’t need to spend 30 years learning the tricks for ourselves, we can just learn from the great pizza chefs who came before us. We don’t want to spend a million years learning to do this.

Aha! Now we’ve got some real Outcomes… the Task “Watch 200 pizza cooking shows on Netflix” is really about: learning a reliable process, baking tasty pizza and learning quickly.

Recapping Example #1 and #2:

Now we’ll know we accomplished our Objective (bake the greatest pizza in the world) if the following Outcomes are true:

  • The pizza is tasty
  • It’s healthy
  • It’s well-structured (not soggy)
  • We have a reliable process to prepare the pizza
  • We didn’t spend a million years on this. We learned quickly

These are good Outcomes. We’ve decomposed bake the greatest pizza in the world into the 5 components that we care about; at the very least these will help us align our team around impact (v.s. dictating tasks).

But these Outcomes still aren’t measurable KRs. How do we quantify these?

Hack #2: Ask “What might we observe?”

The next trick comes from “How to Measure Anything,” where Douglas Hubbard argues that if an outcome is worth pursuing, then it must have some observable effect.

“First, we recognize that if X is something we care about, then X, by definition, must be detectable in some way. How could we care about things like “quality,” “risk,” “security,” or “public image” if these things were totally undetectable, in any way, directly or indirectly? If we have reason to care about some unknown quantity, it is because we think it corresponds to desirable or undesirable results in some way. Second, if this thing is detectable, then it must be detectable in some amount. If you can observe a thing at all, you can observe more or less of it… If we can observe it in some amount, then it must be measurable.” - Douglas Hubbard, “How to Measure Anything

If our KRs correspond to some desirable results, we should be able to observe some effects of those results. So what might we observe in the world that indicates the Key Results were accomplished?

If you get stuck, Hubbard offers a thought experiment you can try:

“Imagine you are an alien scientist who can clone… entire organizations. Let’s say you were investigating a particular fast food chain and studying the effect of a particular intangible ‘employee empowerment.’ You create a pair of the same organization calling one the ‘test’ group and one the ‘control’ group. Now imagine that you give the test group a little bit more ‘employee empowerment’… What do you imagine you would actually observe — in any way, directly or indirectly — that would change for the test organization?”

Let’s try this for each Outcome from above.

  1. tasty: If we’ve baked a truly tasty pizza, then we might observe… People tell us it’s good. Even pizza snobs say they like it. We see people choose our pizza over other pizzas.
  2. healthy: If the pizza is healthy, then we might observe… We don’t feel like falling into a food coma after we eat it. We calculate that the # calories in our ingredients are lower than calories for other pizzas.
  3. well-structured: If the pizza is well-structured, then we might observe… When people hold a slice, we don’t see the cheese immediately slide off. We don’t see a bunch of greasy plates after serving a group.
  4. reliable process: If we have a reliable process, then… When cooking test pizzas, our trash can doesn’t fill up with burned pizza. Each pizza requires the same amount of time and ingredients. We never feel surprised.
  5. learned quickly: If we learned quickly then… When we calculate the time we spent on this project, it’s not too many days.

For each KR, we’ve now got a few options for things we might observe. So which observations should we use to define the KR?

One consideration: How hard / expensive would it be to actually measure the observation? For example, to measure the tasty Objective, we could spend $50k hiring a research firm to double-blind-placebo-proof-focus-group the pizza with n=300 participants. Or we could sponsor a few meetups, send them our pizza + 2 competitive pizzas and see which pies get eaten first. Since we’re probably not trying to get these results published in a peer-reviewed journal of pizza, the former is probably overkill. The latter is maybe less precise, but it’s much cheaper and still a fair signal.

Recap:

We originally started with a list of top-of-mind Tasks for the Objective bake the greatest pizza in the world. With Hack #1, we asked “Why?” to transform those Tasks into the Outcomes we care about (tasty, healthy, well-structured, reliable process, learn quickly). Then with Hack #2, we brainstormed and chose Observations we might see if the Outcomes were true. The final step is putting it all together and assigning targets for the Observations. This might look like…

Objective: “Bake the greatest pizza in the world

  • Key Result: At 8/10 meetups, our pizza is eaten before 2 distinctly marked, but unbranded competitive pizza choices
  • Key Result: <200 calories when we add up the ingredients in a slice
  • Key Result: For 100% of test pizzas, the cheese does not slide off when held at a 45° angle
  • Key Result: We bake 3 test pizzas in a row in 30 minutes, w/ total ingredients costing < $5 / pizza
  • Key Result: We spend less than 5 days learning to do this
abstract goal

Takeaways

  • Measurable Key Results require some extra thinking-pain. But they’re worthwhile because they keep us outcome focused and leave teams space to apply their own creative solutions toward the Objective.
  • When defining KRs for an Objective, it’s okay to start with Tasks. This is your subconscious pointing you to the Outcomes that really matter.
  • To turn a Task into an Outcome, work backwards by asking “Why do this? What do I think this will accomplish?”
  • To turn an Outcome into a quantifiable Key Result, ask “What would I observe in the world if this were true?” Choose an Observation that’s reliable and efficient to track ask your KR.

a curious guinea pig

Would you like to be a guinea pig?

Join 3iap’s mailing list for early access to the latest dataviz research, writing, and experiments.

No guinea pigs (or humans) have been harmed in the course of 3iap’s research, writing, or experiments.