Breaking Down Data Silos

Individuals generate vast amounts of medical data through various healthcare interactions and personal devices. Machine learning can unlock proactive diagnoses, but current data silos and over-protective attitudes hinder access and sharing. A proposed electronic data request framework in India aims to place control with the data subject, enabling personal analysis and broader applications, such as credit-worthiness assessments.

This article was first published in The Mint. You can read the original at this link.


In the course of our lives we visit many doctors. We go to the neighbourhood general practitioner for simple ailments, trusting their experience to either cure us of our minor diseases or point us in the direction of experts. For more specific problems or where we know we have a recurring condition, we go to specialists—and when we intuitively know that the situation is critical, we get ourselves admitted into a hospital.

Each of these visits generates a record of our immediate symptoms and the prescribed treatment—a discrete set of data points that forms part of our longitudinal medical record. We have, in the process, submitted our bodies to dozens, maybe hundreds, of tests to supply doctors with the data they need for their diagnosis. And then there is the data generated by activity trackers, these sensor-laden wireless devices that so many of us have taken to wearing on our person. In this manner, each of us has generated a large volume of useful medical data over the years but so far, we’ve had no way to extract value out of these data sets.

Given the rapid development of machine learning algorithms designed to find patterns where no human can see any, all that is about to change. Before data was digitally recorded, it would have been near impossible to correlate symptoms observed during a visit to the dentist with subsequent gastroenterological complications. Now we can use machine learning algorithms to make these sorts of correlations—to use the evidence of oral ulcers to diagnose Crohn’s disease, or use diminishing jaw bone density data as an early indicator of osteoporosis. If we can apply big data analytics to our medical history, we will be able to proactively diagnose a number of diseases and mitigate their consequences.

Unfortunately, despite the way in which data has insinuated itself into our world, we lack the structural framework to achieve these results. Though most doctors maintain digital records of their patients and their treatment, they keep that data locked away in private databases under the direct control of the institutions to which they are affiliated. To a hospital, patient data is valuable business information, very often a competitive advantage in the cutthroat world of medical research. Consequently, hospitals have no incentive to share this information with other institutions—even if this non-cooperative attitude denies significant health benefits to the very patient to whom the data pertains.

It is not just the medical industry that has developed this over-protective attitude to data. Businesses across the board have recognized the value of the data they hold and have begun to take steps to secure it. Every consumer-facing company is locking down customer data into silos, securing them to preserve competitive advantage. More often than not they do so even before they know how they are going to monetize this data—relying on the conviction that a business model will eventually present itself.

No one will argue that a data subject should be denied access to his or her data. In fact, many countries require data controllers to take specific actions to ensure that data subjects can review and correct personal data of the data subject that is under their control. However, very few countries go so far as to create a mechanism through which data subjects are able to port their data from one data controller to another.

Last week I participated in a panel discussion on a new framework for electronic data requests. This electronically intermediated consent-based data request framework has been designed to place the data subject at the centre of the privacy ecosystem. It allows anyone to electronically request a data controller to extract his personal data from within the data silo managed by that controller and transfer it to another entity. If implemented, it will transfer control over individual elements of personal data from the data controller into the hands of the data subject.

There are many use cases in which I can see the benefit of having such a data request framework. It can be used in microfinance and alternative lending to allow borrowers who might otherwise have been ineligible for a loan to present proxies for their credit-worthiness—such as their GST returns or history of mobile payments—to prove to lenders that they have the ability to service the loan. More importantly perhaps, this framework will finally allow patients to free their personal medical data from the silos in which they are currently trapped so that they can analyse their own medical history to get a better assessment of their personal health.

If the Indian government is serious about personal privacy, it would do well to consider innovative frameworks such as these and incorporate them into the design of their data infrastructure for GSTN and other data networks—maybe even encourage the justice Srikrishna Committee, formed to deliberate on a data protection framework for India, to recommend its inclusion into the proposed legal framework.