Why community data trustees should also be regulated

July 15, 2020

The first draft report of India’s Non-Personal Data Committee suggests democratizing data and introducing a new category of data principal - the community. However, absent guidelines for data trusts, the potential risks of politicization and favoritism within community action groups could result in potential misuse.

This article was first published in The Mint. You can read the original at this link.

When it was first announced, the creation of a Non-Personal Data Committee caused considerable consternation. Never before had any country shown this level of interest in regulating data that wasn’t personally identifiable, and the implications on data-driven businesses were bound to be significant. In the course of the many months during which it worked, the Kris Gopalakrishnan committee stayed steadfastly out of the public gaze, surfacing only behind closed doors to meet with select stakeholders who were invited to make presentations on specific aspects of non-personal data and how it should be regulated. Which is probably why, when the report was finally made public last Sunday evening, it just materialised, silently and without flourish, on the MyGov website.

In terms of rationale, the report doesn’t stray far from the expected. It argues that data companies have only been able to secure a first-mover advantage because the data environment in India has, so far, been unregulated. This has allowed them to accumulate vast lakes of data that they have then used as a moat to keep new entrants out. This, in turn, has resulted in the creation of data monopolies with such a disproportionate data advantage that even the Indian government has been unable to shake them. The report goes into detail about the risks of leaving non-personal data unregulated—the fact that individuals can be re-identified even after their data has been anonymized, and that large data companies can easily derive personally identifiable insights even from aggregated data sets.

The main focus of the report is to suggest ways in which the value of all non-personal data currently being stored in data silos (whether controlled by the government or private enterprise) can be unlocked. Conceptually, this is an approach that is hard to argue with. After all, if it is possible to unlock more value from data than is currently being done without diminishing the value that the original data collectors derive from it, there is no reason why we should not do so. Data is non-rivalrous, which means is that it can be re-used ad-infinitum without any degradation in value. That being the case, it stands to reason that it should be possible for greater value to be unlocked from data silos than is currently being done.

The trouble with this approach is that one of the main reasons why data collectors collect and maintain vast stores of user generated data is because of the ability this gives them to monetize the underlying value of these data sets. If data is democratized, will data collectors simply lose the incentive to collect? The report seems to proceed on the assumption that this would not be the case, trusting that the requirement to make raw user data freely accessible will simply be yet another constraint that data businesses will learn to deal with.

One of the mechanisms that the report has suggested for the purpose of unlocking value from existing stores of user data is the creation of a new category of data principal: the community. The report defines a community as a group of people bound by common interests and involved in social or economic interactions. A data trust representing any such group of people will be able to make a request for any non-personal data currently being held by a data custodian so that it can be accumulated in trust for and on behalf of the entire community. Once these datasets are accumulated, the data trustee will make use of this non-personal data on behalf of the community.

While this is all well and good, what I found missing from the report was a set of rules and guidelines that describe the framework within which data trusts and their respective trustees would be required to operate.

The report seems to have assumed that data trusts will always act in the best interests of the communities they represent. However, we know from lived experience that this is not the way things always pan out. Community action groups are easily and often politicized. Even though they start out with noble objectives, they often stray from that path, and end up catering to the interests of a few powerful individuals, instead of the community at large as originally intended.

If we are going to create a brand new trustee and invest it with the ability to determine what can or cannot be done with our non-personal data, we must articulate the framework within which this trustee has to function. This should, at the very least, include an obligation to always act in the best interests of the community at large with appropriate penalties for failing to do so or showing favour to select members of the community. We should take the trouble to set up systems which ensure that all actions taken by a data trustee vis-a-vis non-personal data are taken transparently, in a public and auditable manner.

I can see why the committee felt the need to articulate a community interest in non-personal data that is distinct from the individual interest. Certain types of data have a different value in the community context, and that should be catered to. However, unless we clearly articulate how the data trustee is expected to behave, there is a risk in vesting a data trustee with the authority to deal with non-personal data on our behalf. We should take care to ensure that in our attempt to loosen the stranglehold that big data companies have over our non-personal data, we don’t end up exchanging one giant evil for many small ones.