Data deficiencies
Dangerous deficiencies: When gaps in data can cost both life and business
Under 2. World War I it was not unusual for British bombers to return from missions in a somewhat more holed version than when they took off. Bullets had banked their way into the hull, and the wise heads on the ground immediately began to analyze where the damage was worst.
The plan was straightforward: Reinforce the places where the planes had been hit – and future planes would have better chances of survival. Brilliant, right?
No.
Because they had forgotten something quite important: What about the planes that didn't come home? The planes that took the decisive shots but never made it back to base? They were not included in the analysis, because they were already smashed in a field in France or at the bottom of the English Channel.
What they thought was the answer was actually a huge blind spot. And that's exactly the mistake companies today still make when making decisions based on incomplete data.
(If you want to geek out all the way down in history, you can read more here: Trevor Bragdon: When Data Gives the Wrong Solution)
Absent data is expensive
The problem then is exactly the same today: We make decisions based on the data we have – but we forget to ask: What about the data we don't have?
Yahoo committed one of the biggest roars in tech history when they declined to buy Google. They looked at the data they had and concluded that search engines were not a big deal. An expensive decision that showed how dangerous it can be to overlook the data you don't have.
For many companies, the customers they don't have are at least as important as the ones they do. But how much do they really know about them? The answer is often: scary little. If data is never collected, you sail in blind – and this can have disastrous consequences.
Imagine that your company is a ship passing through ice-filled waters without information about icebergs. Without that knowledge, even small mistakes can lead to big disasters. This is not only a problem for companies, but also for researchers and analysts who must constantly navigate incomplete data sets.
Missing data occurs in different ways:
- Random missing data (MAR)
Data is missing, but there is a system. For example, we do not have CPR numbers from before 1968, because the Central Personal Register did not exist. These shortcomings can often be compensated mathematically. - Completely random missing data (MCAR)
Here, the absence of data is completely random and not related to the values that are missing. - Not random missing data (MNAR)
This is the most dangerous category. Here, data is missing in a way that is connected to the unobserved data. In other words, we don't know what we're missing out on – and it can be expensive.
Understanding why data is missing is the first step to closing the gaps. When we locate the hidden flaws, we can't just avoid Yahoo's mistakes – we can discover new possibilities that would otherwise be invisible.
Schweizerost-Dilemmaet
In reality, most datasets are filled with holes like a classic Swiss cheese. It is our job to identify these gaps, understand why they are there and find out what we can do about them. Ignoring missing data can be like navigating through the fog: without a clear direction, we risk getting lost.
Therefore, it is time to give the missing data the attention they deserve.
In my career in Business Intelligence, I have not yet participated in a customer meeting that was about the data we do not have.
Isn't it time we started asking the right questions?
Perhaps we can hope that that will change soon.