What Limits Data-Driven Governance?

Accessing government data takes time - even for the government. Why and how to solve it?

Sep 10, 2022

The PowerPoint slide displayed a picture of confused but cute Labrador dog with paint all over it. The audience laughed - a much needed respite in a full-day technical workshop. Almost a hundred officials from Ministry of Rural Development - from MIS executives to the Secretary of Rural Development sat diligently through an internal workshop where Richa (our star product manager) gave a master-class on visualizations, user-centered design and enabling Programme Divisions (PDs) to build their own analytics and dashboards. For the purpose of this piece, PDs are the business units which administer policy (domain, business, bureaucrats etc).

Traditionally data analytics and dashboards have had a strong dependency on IT teams. IT has access to databases, can write code to derive analytics or write JavaScript to build charts. Personally, I've not held high opinions of dashboards, having built many and having rarely seen any used. But Richa has made me a fence-sitter again on this topic. While, I may still remain skeptical about "dashboards" within the context of public sector, I do believe analytics should be in the hand of domain experts and not IT.

Having been intimately involved in various part of the data pipeline in the government, I've seen a pattern on how "everyday" data science decisions are taken. By everyday data science, I mean:

ad hoc policy decisions (a list of poorly performing projects are to be identified to issue warnings)
RTI requests (many RTI requests ask for data which is not readily available publicly and need some mining )
Parliamentary questions (same as above, but heavily concentrated when parliament is in session)
Review PPTs

Aakash and I have a not-yet-public working paper describing "everyday" data science in government which describes in detail one instance of this practice.

Traditionally - this how an "everyday" decision is taken:

[video-to-gif output image] — Source: Public Sector Data Science Course Workshop at MoRD

The time gap between the data request and data receipt decides whether the decision will be data-driven or data-backed. The PDs aren’t the the only victims here. The technical teams can’t focus on development activities and have to context switch to address these often urgent and one-time data requests. This in context of already being overburdened, under-capacity and struggling to meet routine development timelines.

This is how a self-service model would look in comparison:

This definition from Gartner (sue me) on self-service analytics is quite exhaustive:

Self-Service Analytics is a form of business intelligence (BI) in which line-of-business professionals are enabled and encouraged to perform queries and generate reports on their own, with nominal IT support. Self-service analytics is often characterized by simple-to-use BI tools with basic analytic capabilities and an underlying data model that has been simplified or scaled down for ease of understanding and straightforward data.

With equal amounts of astonishment and begrudging, I have to admit that this executive gibberish actually makes sense and is completely on point.

Last year, I was pushing for hiring dedicated data analysts into programme divisions. But now from the vantage point of my current team, I understand that the blocker isn't specialized skillsets but instead agile data access. I was overestimating the need of expertise to do ad-hoc analysis, probably coloured by my own journey within the Government. But most everyday data science tasks within the government require just about adequate tooling i.e. Excel and PowerPoint. Even fancier dashboards can be developed without writing any code with PowerBI or Tableau etc. So the larger issue is that of agile access to data. Programme Divisions though cannot and should not be given access to databases in their raw form. Database generally evolve over time and carry their quirks. We need simplified versions of databases that can be shared with Program Divisions who in-turn can access the data either through BI or SQL or Python. These simplified versions of databases are common in digital private sector firms either by the name of data models/warehouses/marts.

Some challenges to this:

Designing and creating simplified data models is an art and continuous maintenance; maintenance is a hard problem to solve in public sector be in digital or physical goods.
There will be a lot of initial confusion with what's ITs role and what's PDs - note we aren't eliminating IT's role from data reporting; long-form reports and ETL still is with IT. Only adhoc analysis is being moved to PD.
Culture change: Boils down how much interest PD wants to take - while data access becomes faster, it does bring additional responsibilities. You can no longer shirk data-driven decision making because of data accessibility issues.

Creating these data models also solves the problem of single-source of truth. Most government MIS end up with reports that are built over time and sometimes have the same columns displaying differing values. For eg. total funds disbursed to contractors may have two different values depending on the report opened and the developer who built it. This happens because for each report the business logic is re-written and the same definition isn't honored.

Ministry of Rural Development has embarked on the journey of moving towards self-serving analytics across all its schemes. Very ambitious and it is definitely going to take a while but nonetheless a step in the right direction. I am optimistic about what self-service paradigm can achieve. It's not some lofty futuristic goal. While I was preparing the presentation for the workshop, I realized I've already been part of government program which unknowingly and by happen-stance operated in a radical version of self-service in 2015-16 but that's a story for later.

Submitted please.

#sarkari

Discussion about this post