The Opacity of Administrative Data and Limitations of the Desk
[Guest Noting] A recent public sector data scientist’s reflection on the relation between the field and the desk data work.
Today’s noting is written by Dibyendu who has been working as a data scientist at the Ministry of Rural Development’s Data & Insights Unit since mid 2021. He has been involved in projects ranging from writing computer vision models to flag irregular maintenance payments in road works to understanding delays in entitlements for Self Help Groups (SHGs) under NRLM. He constantly questions what we do, strives to be interdiscplinary and has been a joy to work with.
When I had joined the government back in 2016, the then COO of DDU-GKY had a very strict policy for newcomers: Stay quiet till you’ve actually visited the field. Obvious right? How can you make solutions or policy without ever having visited the first mile where it gets operationalized? Still, the field remains a romantic idea and still not routine for people who work at headquarters. I don’t talk about my meetings with the bureaucrats in Delhi with the same tone as my visits to the field, and therein lies the problem. With our team at the Ministry, we’ve tried to carry the same philopsophy as my COO and have made the first mile insights a priority. We also have a dedicated HCI Researcher to bring in deeper qualitative insights and critical inquiry into the tech work done by the team.
The following post is Dibyendu’s reflection from some of his visits to the first mile, written from the lens of a data scientist:
A few months into my job as a data scientist in the government, I found myself seated near the overburdened desk of a clerk in Rajasthan who was responsible for the management of the MIS system of an entire district. While on a tight deadline for the data entry and report making for a government scheme, coordinating with frontline officers on the phone and also prepping for a sudden visit from the chief minister, they were trying with their patchy internet to take us through their day to day on the MIS system in an effort to help us with our scoping work. Needless to say, we were also amongst the many ‘tasks’ that they had to attend to that day. While the exercise was helping us scope our project and collect qualitative insights, it was also revealing and ‘making real’ the complexities and ambiguities behind producing a single datapoint on my screen. There was hard labour, intricate negotiations, competing values and perspectives involved in the journey of what would become a neat little cell on my dataframe. Having gone through scheme guidelines, circulars and been in spaces discussing and working on human computer interaction, I had still become a disconnected actor. Data, related metrics, and code have this very ability to disguise these convoluted realities in the form of spick and span ‘data realities’. This not only argues for an introspection of a data scientist’s agency and responsibilities in the entire system but also modifications in the data science life cycle itself.
A few months later and for a different government program, on a rather long visit to government offices at various administrative levels and self help groups in Jharkhand, the extensive and arduous journey of the implementation of a scheme came into light (something that can also remain hidden from a data scientist). What seemed to take a few kBs of my laptop’s RAM, took up thousands of square kilometres of land in reality encompassing lakhs of people. Solutions had now become objects being used by real people with their own complex positioning and objectives. In separating the process into tidy stages, the data left out some very important points of concern. Could my solution space then be sufficient by being blind to these complexities or without taking into account each actor’s needs ? Would my job as a data scientist be complete by just trying to improve my ML model’s accuracy or helping accomplish a certain metric? Have I not been given immense power by getting to decide on these? Sitting in a remote underserved adivasi village, it was becoming increasingly difficult to imagine the sparkling tech solution proposed by a consultant that had caught everyone’s fancy. Can one really design for someone without interacting with them and getting to know their lived experiences?
As has been mentioned in previous notings by Harsh, data scientists/engineers are given a large and shiny seat on the table. They are often at the centrestage of formulating problems and deciding on them. Their inherent ideologies and biases frame solutions and thus the existence of a ‘neutral and apolitical’ data scientist is a fallacy. Infact, this very stance is quite a conservative one. Their decisions could lead to beneficiaries being unfairly denied resources. One could argue that a lot of these could be taken care of by making the data scientists talk to the product managers and researchers on the team but without recognising their political position, empathising with the beneficiaries, and being thoughtful partners in the process, a lot of their inherent biases would creep up into the system which would be hard to keep checks and balances for. The life cycle of a data science project can’t hence end at iterating through scoping, collecting data, training a model and deploying it. It must incorporate the process of designing with (in opposition to designing for) the actors and beneficiaries and adapting to their needs at every step, sometimes even scraping the entire project if deemed necessary.
P.S: Everything being said, data scientists should always tag along field visits and not lead them. The qualitative/policy researchers are better trained to do them. The post only encourages thoughtful collaboration and does not endorse taking up more space as the field has been weirdly doing for some time now.
Submitted Please.
Love the sincerity in the insight!