shutterstock_211792090Marc Andreessen, the Silicon Valley entrepreneur and investor, once remarked that “software is eating the world” as more businesses and industries are being run on software and delivered as online services, from films and agriculture to national defence. He was right, but he might also have said that data is eating the world.

Thanks to the digital revolution, businesses are so awash with data, from spreadsheets, legal contracts, HR records and PowerPoints to employee holiday snaps and cat videos, that they are at risk of drowning in it. It has never been easier simply to hit the “save” button. But holding on to data is costly (storage is not free, even in the cloud) and could be risky if it contains personal or commercially sensitive information. Not only is the scale of the issue significant, but companies often have little idea what most of the stuff hoarded on their systems is and whether it has any commercial value.

An estimated 52 per cent of all information stored and processed by organisations around the world is considered “dark” data, with an unknown value, according to Veritas Technologies, an information management company. A further 33 per cent is classified under the splendid acronym ROT, as in redundant, obsolete or trivial.

Left unchecked, hoarded data on this scale could mean that companies in Europe face $891 billion of avoidable storage costs globally by 2020, Veritas reckons. That figure is based on storage in on-premises servers and so may significantly overestimate the pure storage costs, as subscription-based cloud computing services are making things cheaper. Even so, there are fees involved in maintaining and analysing data stored in the cloud.

Yet it’s not only the costs of storing hoarded data that should alarm companies. Within dark data lies information that may be needed for compliance, is business-critical or breaches copyrights or data privacy rights. Some data might even be harmful. But if you can’t find it, or don’t know it’s there, you’re in a dangerous position. It’s the old problem of the unknown unknowns. Unencrypted files increase the risk of a data breach, but high data volume can hamper a quick response. Searching for five target files among five million can take time and with petabyte after petabyte of information scattered on company servers across the globe, where are the hidden risks?

The problem is partly a cultural one. Some employees may bring their bad hoarding habits into the office. But it is, after all, a normal human behaviour, an existential imperative, you might even say, to hold on to things that are not useful at present in the belief that they will be in future. Indeed, a survey of more than 10,000 IT managers found that 47 per cent were afraid to delete digital information because they think they’ll need it again.

But it’s also a systemic issue for companies that, required to keep some documents for legal reasons, err on the side of caution and keep everything or nearly everything. We’ve all been there. Confronted with an inbox of several hundred unread emails or ten abandoned drafts of a document, it can be overwhelming. Far easier to hit “save” than figure out what to keep. It is often also a question of simple logistics. Most IT decision-makers believe that their company doesn’t give them sufficient time and resources to implement a data management policy and that non-IT executives don’t understand the extent of the issue. Which they probably don’t.

For companies, there are also more serious compliance issues. The clock is ticking on a new European General Data Protection Regulation, a set of EU-wide laws designed to harmonise data protection across the region from May 2018. Maximum non-compliance fines are the higher of €20 million or 4 per cent of worldwide turnover. Forget Brexit. The new rules, which will punish companies for lax data security and data misuse, will apply initially to the UK and even after Britain has finally extricated itself from the EU, all companies will still be covered if they do business with EU-based companies and individuals. Moreover, it is quite likely, as public consciousness about the importance and commercial value of stored data continues to grow, that Britain will come up with its own regulations.

Companies such as Veritas have a vested interest in spreading alarm because they offer information management systems and behavioural advice to identify redundant, orphaned and duplicated data and ease the worst problems of data hoarding. But it seems highly likely that with developments in machine learning, businesses will soon be able to buy “robot” software off the shelf that will do this for them.

In the meantime, the other obvious issue here is one of personal survival. It is estimated that the majority of office workers knowingly and routinely store items that could be harmful to their career prospects. These include unencrypted personnel records (from a scanned passport image to saved passwords for online shopping), kept, for example, by 44 per cent of all IT decision-makers (of all people). Also routinely saved by IT professionals are job applications to other companies (44 per cent), unencrypted company secrets (38 per cent), embarrassing employee correspondence (25 per cent), and — doh! — data of a sexually explicit nature (16 per cent). It is quite possible that your employer can’t be bothered to scrutinise all the data on your workplace devices, but as software becomes increasingly adept at identifying unstructured data, do you really want to take the risk?