On India’s Data Policy – The promise and peril of big data in India’s policy space

News: Even years after India initiated an ‘open data’ policy, openness, of public datasets, is still very rare. Unlike in the West, the statistical ecosystem in India is not such that administrative datasets can be scrutinized by independent researchers and then deployed for policymaking.

What are the issues/ challenges in using administrative data for policymaking?

Firstly, the issue of the opaqueness of government departments on data.

Three years before, the ‘long walk home’ of migrant workers during the pandemic, a study in India’s pre-budget Economic Survey, claimed that the actual number of migrant workers may be roughly double that of census estimates. This study was based on a dataset of unreserved passenger traffic between every pair of railway stations in India.

Thus, the study argued that social security benefits should be portable across states to provide protection to such workers.

However, prominent academics raised questions about the study, arguing that its estimates could not be taken seriously.

If the government had released the raw data behind this study, researchers could have verified the calculations of the study. It would have resulted in attention towards interstate migrants even before the pandemic.

Even our Census can easily miss out on short-term and circular migration flows. Because it is conducted only once in ten years.

Secondly, lack of respect for basic data norms. For instance, during the EPFO data-mining exercise, lack of respect for basic data norms has made it unusable.

The Employees Provident Fund Organisation (EPFO)maintains the digitized records of employees receiving provident fund benefits in this country. This data is a valuable resource to track the movement of people in and out of formal jobs across different sectors.

To obtain valuable information from this EPFO data, the Niti Aayog invited two economists. However, this was against the basic data norms.

In most mature democracies, a public agency would either have published the entire dataset for everyone to use or have invited researchers through a transparent process to mine the dataset for research.

The issue got strengthened when the ‘selected’ researchers suppressed the issue of the incompleteness of EPFO records, in their published version of the study.

As a result, the findings from EPFO data and data itself became questionable in the eyes of serious researchers.

Thirdly, the issue of using unverified administrative data sets without public scrutiny. For instance, the untested MCA-21 data, was used to calculate India’s gross domestic product (GDP) 2014-15, despite being warned by an independent expert. This created a big controversy.

If the government had opened up the MCA-21 dataset, suspicions could have been avoided in the early stages.

What are the reasons for the occurrence of such issues?

India’s data facilitator National Statistical Commission (NSC) is severely under-equipped to perform such a role. It also lacks statutory backing, and independent funding has disenfranchised the National Statistical Commission.

What is the way forward?

An empowered statistical regulator should be set up in place of NSC. It would make sure that clear norms for data sharing and accessibility are in force.

Many of India’s big administrative datasets are flawed. Simply opening up these databases for public scrutiny will ensure that errors and inconsistencies are quickly identified. Transparency will lead to accuracy and raise public confidence.

Source: This post is based on the article “The promise and peril of big data in India’s policy space” published in Livemint on 21st Dec 2021.

Print Friendly and PDF