The needle in the digital haystack

June 05, 2019

Given the extensive surveillance efforts by the US and UK governments, as revealed by Edward Snowden, there has been a push for data localization laws in various countries. We need to question the effectiveness of mass data collection in preventing terrorism since the growing volume of data may render such efforts futile. At the same time we need to question the approach of data localization, given the difficulty in extracting actionable intelligence from vast amounts of information.

This article was first published in The Mint. You can read the original at this link.

When Edward Snowden first revealed just how extensively the governments of the US and the UK had infiltrated our personal space, it was almost unbelievable. The National Security Agency (NSA) of the US in conjunction with the UK’s Government Communications Headquarters (GCHQ) had tapped directly into the fibre optic cables running around the world, and the two were intercepting, at source, the data flowing through the pipes of the internet. Operation Dishfire served up nearly 200 million text messages a day, while Operation Prism provided on-demand access to data in the servers of the largest tech companies of the world. Sophisticated tools like XKeyscore allowed them to pore through browsing and search histories, content of emails and online chats as well as other forms of metadata while scores of decryption techniques were used to bypass traditional web encryption techniques.

So extensive was the snooping that even world leaders were not spared. This included German Chancellor Angela Merkel, Brazilian President Dilma Rousseff and Mexican President Felipe Calderon, the French foreign ministry and leaders at the G8 and G20 summits in Toronto. According to reports, in March 2013 alone, the NSA collected 6.3 billion pieces of information from internet networks in India and 6.2 billion pieces of information from the country’s telephone networks, and had specifically tapped information from the Indian diplomatic mission to the United Nations and its embassy in the US.

The reason why it was so easy for the US to insinuate itself into global communications is because almost all the large global technology companies—that between them serve up virtually all social media and over-the-top communication services used by internet users across the planet—are registered in the US. By virtue of having these companies under its direct legal authority, it appears that the US government has been able to coerce them into either handing over information on demand under US law or opening up back doors through which US law enforcement could directly tap in. Had large tech been more evenly distributed around the world, it might have been much harder for any one government to assume this sort of disproportionate control.

It is partially in response to this that an increasing number of countries—including India—have begun to enact data localization laws requiring the companies on whose platforms their citizens exchange data to establish local data centres so that the data that their citizens generate remains within their jurisdiction. The thinking seems to be that by forcing big tech companies to localize their data as a condition to market access, these countries will be able to wrest back control over strategically important data that they have no other means to access. As soon as local data centres are established, I have no doubt that law enforcement will ask that they be allowed to dip into that local pool of data whenever they need to.

Unfortunately, we can no longer be certain that the approach of collecting vast volumes of data in the hope that this will allow us to uncover the information hidden within it still makes sense. Internet users already generate 2.5 quintillion bytes of data a day. By next year, that number will have risen to 1.7 megabytes a second. According to some estimates, it is already going to take a single person 181 million years to download all the data on the internet. If you factor in data that will in the near future be generated by Internet of Things (IoT), the cameras that surround us and look down at us from the sky and the smart devices that we’ve increasingly welcomed into our homes, the volume of data that law enforcement will have to wade through is truly staggering.

We’ve known for some time that despite the efforts of the US government to collect vast amounts of data, its ability to glean actionable intelligence out of it has been poor. In 2009, despite the efforts of the NSA, the “underwear bomber" was able to board a flight from Amsterdam to Detroit armed with explosives and only failed to blow up the plane because of his own ineptitude. Even former US President Barack Obama agreed that the problem was not a failure to collect intelligence, but “to integrate and understand the intelligence that we already had". In 2014, a study by the New America Foundation analysed 225 terrorism cases in the US since 11 September 2001 and found that bulk collection of phone records had little to no discernible impact on preventing terrorism. The breakthroughs in most instances had come from traditional methods of investigation. This corroborated the findings of a review group appointed by the White House that found that the NSA’s counterterrorism programme did not really prevent attacks.

The world is generating too much data for any agency, no matter how much technology it uses, to be able to separate useful information from the noise. As much as the governments of the world might chafe at the US government’s ability to dip directly into the fire hose of data, they should realize that this is no longer the advantage it might once have been.

And then they should re-evaluate their insistence on data localization as a means to get access to useful intelligence. If the chance of finding a needle in the digital haystack is so low, is there still merit in requiring all that data to be brought onshore?