Data is not the New Oil

Data is not the new oil. It is infinite and unconstrained by geography. It is not destroyed when it is consumed and can be used simultaneously or repeatedly without degradation in quality. Countries should not try to regulate data like they regulate oil - by bringing it under their physical control. They should not only try and force big tech to share the datasets they have created, but also make the effort to learn what it takes to build datasets of our own—and then go about building them relevant to our context.

This article was first published in The Mint. You can read the original at this link.

On 6 May 2017, The Economist published an article that inserted itself into the zeitgeist like nothing that venerable publication had ever published before. It was titled ‘The Worlds Most Valuable Resource is No Longer Oil, But Data’, and dealt for the most part with the antitrust implications of Big Tech. So visceral was the analogy that the phrase “data is the new oil" became the defining metaphor of the digital age.

The Economist was trying to make a limited point—that all new industries grow rapidly when they start out, only to be eventually reined in by regulators once they get too big. They literally said as much in the opening lines of the article:

A new commodity spawns a lucrative, fast-growing industry, prompting antitrust regulators to step in to restrain those who control its flow. A century ago, the resource in question was oil. Now similar concerns are being raised by the giants that deal in data, the oil of the digital era.

But the analogy they hit upon was too powerful to remain confined to that narrow context. Before long, other parallels were being drawn—similarities between how tech smoothens the friction in our daily lives and the lubricant properties of oil—between the manner in which data fuels the modern economy just as oil did in the Industrial Age.

Oil has been so deeply integrated into the functioning of an economy for so long that its price has long been a key economic indicator. Oil companies were some of the most valuable on the planet, and—despite the dominance of tech giants—are still quite high up the order in terms of market capitalisation today. But Big Oil has been dwarfed by Big Tech companies, some of which are among the wealthiest corporate entities to ever have existed. This is at the heart of the comparison between the oil industry and the data industry, but it is also where the similarity ends.

Data like Oil

Despite the fact that Big Oil and Big Tech are among the most highly valued on earth, at their core the two industries couldn’t be more fundamentally different. Oil is a scarce natural resource extracted from deep inside the earth in a form that is largely useless until it has been thoroughly processed and refined. It is, by definition, finite—so much so that its market price fluctuates based on its availability. In its most common form, oil is a single-use product that literally has to be destroyed in order to release the energy that we use it for.

Data could not be more different. It is, for all practical purposes, infinite, limited only by our imagination—in terms of what exactly we want to measure and in how much detail. Unless fettered by regulation, it is unconstrained by geography—a fact borne out by the manner in which Big Tech companies collect data from all over the world without ever having to leave the shores of the countries in which they are based. Finally, not only is data not destroyed when it is consumed, modern data technologies excel at allowing the same item of data to be used by many different people, either simultaneously or again and again, without any degradation in quality. For all these reasons, the value of data bears no relationship with scarcity, the cost of extraction or the lottery of geography.

Which would suggest that in actual fact, data could not be more unlike oil.

Regulating Data Like Oil

This, unfortunately, is a nuance that seems to have escaped those tasked with its regulation. Around the world, more and more laws are seemingly being written on the presumption that datasets are scarce natural resources whose benefits need to properly accrue to those who have contributed their personal data to it. And, by extension, the countries in which they are resident. This appears to be the line of thinking behind many recent judicial and regulatory developments, such as the second Schrems decision in the European Union and India’s inclusion of data localisation provisions in its forthcoming privacy law. But this is an approach that fails to engage with the essential attributes of data, and, as a result, not only results in imperfect regulation, but also misses the opportunity to properly capitalise on all that data has to offer.

As long as the internet exists, data is capable of being accessed from anywhere. Regulations that require that data to be physically stored within the geographical boundaries of a particular country don’t take into account this essential attribute. We’d be far better off focusing on improving our ability to access all the data that we need, rather than assume that by forcing data to be localised, we will have access to all the data we want.

The How

Finally, it is important to recognise that there is more that goes into the value of a dataset than the elements of data of which it is comprised. It is far more useful to understand how to create a useful database than it is to amass many datasets that we don’t fully know what to do with. To do that it is essential to develop an understanding of how data is collected and arranged. Determining what to measure, how to collect it, and the fields of data with which it should be associated, is a specialised skill. As is ensuring that a database once created remains free of bias and usable.

Unfortunately, our regulation of data is so focused on asserting control over the datasets accumulated by Big Tech companies that we have forgotten that we also need to build this muscle. If we want data to power decision-making, we need to ensure that instead of simply lifting and shifting datasets, we actually make the effort to learn what it takes to build datasets of our own—and then go about building ones of our own, that are relevant to our context. It is only when we make this shift in our thinking that we will become a data-first economy.