Receipts or Results – Part 1: The One-to-One Trap

In 1968, the US government passed a federal law requiring all new cars to be fitted with seatbelts but failed to require that people wear them. For the next sixteen years (yes, you read that correctly), the success of the seatbelt was measured by the number of installations, not their impact, which turned out to be close to non-existent. Falling fatality rates between 1968 and 1974 were for reasons largely unrelated to seatbelt installation (speed limits, oil crises). Seatbelts only improved safety when people were made to wear them. It took so long to realize the lack of impact because they were measuring impact using the installation of the seatbelt. The intervention became the metric of its own success.

This is exactly where human risk management and cybersecurity find themselves today. We measure what we put in place but rarely measure what it did. The problem that fuels this issue is what I’ll be examining across this series – Receipts or Results – starting with the most common offender: the one-to-one trap.

Introducing the one-to-one trap

Most human risk programmes fall into what I call the one-to-one trap – the assumption that one activity metric maps directly to one outcome. Phishing click rate measures phishing risk. Training completion measures awareness. Policy sign-off measures compliance.

None of these tell you whether anything actually changed in how your people think and behave. They tell you what you did. The seatbelt was fitted. The policy existed. The training was completed. And like the empty seatbelt hanging in a 1968 Ford, the metric of success looked great while the problems remain unsolved.

The question isn’t what did we measure. It’s what we were actually trying to change. So how do we combat this issue?

Start with the outcome, not the intervention

When a doctor prescribes medication, they don’t measure success by whether the patient collected the prescription. They measure it by whether the patient gets better. The collection is necessary but it’s not the point.

Before you ask what to measure, ask what you actually want to change. Not “we want people to click less on phishing emails” – that’s the intervention goal. The real outcome is something closer to “we want people to make more security-conscious decisions under pressure.” That’s a subtly different thing, and it opens up a much richer set of signals to look for. It will be a constellation of data points that paint a picture, not a single number.

What your data is already telling you

Most organisations already have access to more meaningful signals than they realise. We just need to reframe them as outcome metrics. Below are some easy win outcome metric options to add to your phishing click rates:

Email flagging – volume with quality in mind. Email flagging is a useful and easy to calculate outcome metric – but it can be a trap in itself. When people flag suspicious emails, the instinct is to just measure how many. A population that reports more is not automatically a population that is getting better – volume without judgement can quickly become noise. What you’re looking for is whether the things being flagged are genuinely suspicious and whether the reasoning is improving over time. Should someone flag an email that turns out to be genuine, it’s evidence of people pausing to question rather than acting on impulse, which is exactly the behaviour we need to build.

Time to report is underused. If the average time between receiving a suspicious email and flagging it is falling, that suggests the behaviour is becoming instinctive rather than deliberate. The caveat is that time to report should always be read alongside volume and quality. Someone reporting everything in seconds isn’t exercising judgement; they’re just reporting everything. Falling time to report only means something when the quality of what’s being reported holds up alongside it.

Repeat clickers over time tell a very different story than one-off clickers. Someone who clicks once and never again is not the same risk as someone clicking repeatedly across multiple simulations. This distinction matters enormously for where you direct your programme and who genuinely needs more support.

Performance on ambiguous simulations is perhaps the most revealing signal of all. Easy simulations tell you very little – almost everyone passes them eventually. Ambiguous ones, where the correct decision is genuinely unclear, tell you whether people are developing real judgement or just pattern recognition.

Read these together and they tell a coherent story. Click rates falling while reporting rates also fall suggests people have learned to spot your simulations, nothing more. Click rates falling while reporting quality improves, time to report decreases, and ambiguous simulation performance climbs – that is a programme that is working well.

Timing matters as much as the metrics

None of this means anything measured at the wrong moment. Immediately after an intervention everything looks better, but this is recency, not behaviour change. When measuring the impact of an intervention, consistent behaviour is what tells you something has actually changed. One off or temporary improvement tells you very little – people have good days. What you’re looking for is whether the change will last 7, 30, and 90 days. This is not to say that individual incidents and edge cases don’t matter for security; they matter enormously. But they are a different conversation from whether your programme is working.

The reinvestment problem

Organisations reinvest in what the numbers tell them is working – if those numbers are receipts rather than results, budget follows the wrong things year after year.

Phishing is just one example of where the one-to-one trap costs you. In part 2 of this series, we’ll look at data handling and DLP – and what your environment is already telling you that your policy acknowledgement rate can’t.

Ask your metrics the same question you’d ask any intervention: not what did we do – but what did it do?

Introducing the one-to-one trap

Start with the outcome, not the intervention

What your data is already telling you

Timing matters as much as the metrics

The reinvestment problem

Ready to measure your security culture?