Store data efficiently to get insights for justice reforms

The Indian judicial system is in dire need of reform, with cases piling up and delays becoming a norm. While some digital information exists, the system remains largely analog, lacking crucial metrics and insights. Agami, an organization supporting legal innovation, is working to build a repository for legal data sets, aiming to develop a cloud storage system for collecting, storing, and updating legal data. This project is seen as a first step towards using data to understand and address the inefficiencies in the Indian legal system.

This article was first published in The Mint. You can read the original at this link.

One rarely hears any argument about the fact that the Indian judicial system needs to be reformed. Our courts are logjams of procrastination where cases have piled up so high that the time taken to arrive at a resolution is often punishment in itself. And while we might reassure ourselves that the Indian legal system eventually ensures that justice is done, beyond the wistful hope that this is the case, there is no data to prove it. While every other sector of the economy is rapidly reinventing itself, it seems that the judiciary in India is doing as little as it can to evolve.

We know that the first step along the path to innovation is digitization. Data provides insights that intuition overlooks. It shines light on inefficiencies in the system in a way that those who live within it cannot, allowing us to better understand what it will take to fix them. Unfortunately, the level of digitization of the Indian legal system is still minimal at best.

To be clear, there is a lot of digital information sloshing around the ecosystem—our cause lists are maintained digitally, decided judgements are generally accessible online, and legal research has almost entirely migrated away from paper to the cloud. However, as much as we have managed to reduce our dependence on paper, when it comes to using data to explain the shortfalls in the ways in which our legal system actually operates, we remain uncomfortably analog in India. We lack metrics on the time taken to complete dispute resolution, or where delays in the system really lie. We have little information on the manner in which new regimes like the bankruptcy code and competition law are functioning, or whether or not statutes like the Protection of Children against Sexual Offences have achieved the objective of protecting those they were supposed to.

This is not to say that there is no data at all. As a matter of fact, raw data exists—for the most part in digital form—on almost every facet of the legal system. If one cares to put in the effort to find it, this data is also relatively easy to access. However, at present, they are sub-optimally organized in silos of unstructured information that do not inter-operate efficiently with other data sets. If this information is to be organized into a useful format of sufficient quality so as to be capable of providing useful insights, considerable effort will be required to reorganize these data sets. A necessary prerequisite might be to make sure that they are all stored on a common platform so as to allow greater interoperability between them.

As a first step in this direction, Agami, an organization supporting legal innovation in India, recently selected a candidate to build a repository for legal data sets. The project is intended to develop a cloud storage system on which different types of legal data can be collected, stored and, where necessary, updated in real time. It is hoped that through the creation of infrastructure like this, not only will we be able to build a storehouse of legal data, but at the same time create a community of analysts and researchers who will both contribute to and partake of a growing body of legal data.

It is essential that this repository be designed so that the various databases it hosts are capable of effectively interacting with one another. For this, not only will the data sets need to conform to a certain basic taxonomy, there will, to the extent possible, need to be some sort of functional equivalence between the data elements of disparate databases. This will serve as a useful form of cross-validation as well as allow for a wider range of insights to be derived from inter-connected data sets. In order to avoid double counting, identifiers such as the Company Identification Number (CIN) should be used to establish linkages between different databases.

None of this will come to pass unless all the data that is contributed to the repository is open and free to use without any licence restrictions. While there is no doubt that the contributors will have invested considerable effort in the collection of the data they contribute, hopefully they will all agree that the true value of this exercise will only be realized once the data stored on these platforms is made amenable to full interoperability.

The benefits of this sort of a dedicated legal database will only be evident at scale. It is only when different and disparate data sets are allowed to interact with each other, that we will get unexpected insights. If we can create the equivalent of a GitHub for legal research, the data stored on the repository will start to make sense in ways that would otherwise not have been possible.

To start with, Agami has selected four data sets as the initial few with which the repository will be populated. These include databases that cover matters as diverse as contract enforcement analysis, death penalty sentencing, prevention of child sexual offences, and an analysis of firm-level litigation at the National Company Law Tribunal, and will use a combination of manual coding and natural language processing to extract the data and assimilate it into useful data sets. Hopefully, in time, more researchers will be inclined to use the platform.

While it is impossible to say exactly what purpose will be served by getting information as diverse as this onto a single platform, that is precisely why we need projects like this. After all, we will only find the unexpected if we make an effort to look for it.