The Causality Credo in Infosec and how to leave the club
This post is very much inspired and follows from Hollnagel’s work in “Safety-1 and Safety-2” book, which I’d recommend everyone read.
A challenge we have in Infosec, namely in the disciplines of Risk Management and Compliance, is something Hollnagel termed the “causality credo”.
Causality credo is the “unspoken assumption that outcomes can be understood as effects that follow from prior causes”, which is an assumption that works well for the simple linear-causality world, not so much for the complexity we live with all around us.
This assumption can be expressed as: [next section is ADAPTED from Hollnagel]
- “things that go right and things that go wrong both have their causes, but the causes are different. The reason for adverse outcomes (accidents, incidents) is that something has gone wrong. Similarly, the reason for successful outcomes it that everything worked as it should, although this is rarely considered”
- “Since adverse outcomes have causes, it must be possible to find these causes provided enough evidence is collected. Once the causes have been found, they can be eliminated, encapsulated, or otherwise neutralised. Doing so will reduce the number of things that go wrong, and hence improve [security]”
- “Since all adverse outcomes have a cause (or causes) and since all causes can be found, it follows that all accidents can be prevented.”
This credo has a direction, a forwards causality model that the causes we reason about produce a certain effect. The problem is that we assume it also means the opposite, it prompts us to reason from effect to its cause and that it can be done just as well (backwards causality), and if we just find out the effect that we can reason back through time to find a cause, which is often called the rationality assumption. This can be true for simple systems, but not for complicated or complex ones.
This has led to some accident causation models, such as Heinrich’s Domino Model or the Swiss cheese model. I’m not saying these are completely useless models, as one can use these to engineer a number of robust controls to help prevent security incidents and breaches, but I am saying that it’s a reductionist model that will give you a false sense of security when reasoning about what happens in our your organisation, and these have largely been replaced in the Safety field as well (or at least widely debunked for those keeping up with research).
There are real world consequences of adopting and being cheerleaders to this causality credo, as it rests on the following assumptions
- that we can decompose all elements into simple parts and understand the behaviour of the whole by looking at what the parts do and look like
- that the performance of each part is a bimodal (on or off, true or false, etc)
- that if a sequence of events was determined to contribute to an incident, that we fully understand and modelled the pre-determined sequence by which those events will happen in the future
- that combinations of sequences are tractable and non-interacting with the wider context
- that the influence from the context or environment is limited and quantifiable, and as such if we just adjust for the probability of a failure or malfunction that we’re effectively managing risk
These assumptions tend to not play well in the real world. We now have new accident analysis models that we can leverage to think differently about how we think of cause and effect on what leads to security incidents or breaches, we just need to expand where we’re looking for answers.
And as keep saying, the Safety sciences are way ahead of us in this. So I present an alternative way to think about incident and breach causality, that is a model developed by Dr. Walker (you can find her dissertation here) called “Systemic Contributor Analysis Diagram”
In this model, we don’t stop at “human error” with regards to the adaptations that we find (Bob touched the button, then Alice did that other thing that coalesced into the incident). That’s linear-causality thinking and embodiment of the causality credo, that will always leave us ignorant of the wider system dynamics in which those adaptations occurred.
Instead, we go look for which were the goal conflicts that were present and that led to humans adapting the way they did. It’s about asking “how and why did it make sense at the time, for that person to act like that?” But for us to illicit information from others in this regards, we need to get better at asking questions. If you ask “why didn’t you follow the procedure” you already lost any opportunity or affordance you could have to actually understand what was happening, as your stakeholder is now defensive as they know full well the culpability is coming. See below, from the “Etsy debriefing facilitation guide” this and other examples on how to ask good questions to actually understand your context.
And after you understand those goal conflicts, try to go further and understand what were the business pressures that led to those goal conflicts that eventually led to those adaptations. This is how we can all get better at resolving the systemic issues in our organisations that are very likely to continue producing incidents and accidents, and being courageous enough and willing to go raise them with the “powers that be” as actual risk management strategies, and not our typical linear-causality risk analysis that whack-a-mole a bunch of controls for any entry in your risk register.
It’s this type of thinking that can help us up our game in our industry, and that I hope we all start adopting more. If we don’t treat and highlight those bigger issues upstream in our organisations, we don’t have a fighting chance to improve the performance of our industry overall, and likely of your organisation in particular.