If Data Is The New Oil, What’s Happening To Its Precious New Source?
In March 2018, The New York Times reported that researchers had gained access to the data of millions of Facebook users and then misused it for political ads during the 2016 US presidential election. It was one of the most significant data leaks in Facebook’s history.
The data, captured by Cambridge Analytica, allowed the consulting firm to exploit the private social media activity of more than 80 million voting Americans without their consent to build psychographic profiles, determining users’ personality traits based on their Facebook activity. The data underpinned the political advertising strategy of President Trump’s campaign in 2016.
In the years following the 2016 election, government regulators and privacy experts ramped up investigations into practices that Facebook employed that failed to protect its users’ personal data. The upshot: Facebook treated user data not as something to be protected but as a tool to be exploited for maximum profit.
Far from the research labs at Cambridge Analytica and equally far removed from the U.S. presidential election of 2016, new light is being shone on the concept of data ethics by a group of AI researchers and the authors of a paper titled “Narratives and Counternarratives on Data Sharing in Africa” presented at the ACM Conference on Fairness, Accountability, and Transparency.
While ethics in AI models is one of the most talked-about issues in the industry today, insights from the paper reveal that the origin, collection, and sharing of the data that informs AI research and application are often overlooked but an equally critical component of AI ethics.
We talked to Abeba Birhane, George Obaido, Kehinde Aruleba, and Sekou Remy to learn more about issues concerning data collection in Africa and the Global South.
What is data ethics?
Authors: While it’s not easy to boil down data ethics to a straightforward answer, it’s essential to acknowledge that many data initiatives in Africa are driven by well-intentioned efforts to alleviate poverty, inequality, and derivative effects on the continent.
However, the fundamental issue with many of these initiatives is that they are driven by “deficit narratives” where they focus on the negative perception of the continent, ignoring the positive contributions (from music to poetry to medicine) that the continent has to offer. Furthermore, data sharing tends to be inherently extractive, where data is collected from African communities without consideration as to how such a practice should pay back to communities where data is extracted from. What’s more, data sharing initiatives are typically driven by non-African stakeholders, and data subjects themselves are not even considered stakeholders in the data sharing process.
In our paper, we explore the concept of data colonialism, where data sharing practices by non-African organizations lead to entire heterogeneous geographies of people having their data accessed and shared, and yet they realize little if any benefits, and are often harmed by these practices.
In data ethics, there is the construct of moral distance. What is it, and how does it apply to Africa and, more broadly, the Global South?
Authors: Simply put, moral distance refers to the use of data by those a distance away from where it was collected.
For example, suppose you are going to collect data from a particular area of Kenya. In that case, the primary beneficiaries of that project should be the Kenyan community that serves as the data source. Too often, data is collected and shared to reflect the values and interests of organizations that are not connected to and don’t have a vested interest in creating value for the communities that are providing the data.
In the example above, the only way to ensure you are capturing the proper social and cultural context while reducing inequities in data distribution is to involve local Kenyan researchers and community members in every step of the process, from project sponsorship to local community input into how the data is shared.
What have been the barriers to more ethical data collection and use in Africa?
Authors: Data sharing and open data practices in African are often informed by Western perspectives and are driven by the interests of Western researchers.
We are not opposed to data sharing. But we don’t think data sharing is unequivocally good either. Responsible and equitable data sharing processes can be beneficial to communities as well as science in general. The main barriers to accountable and equitable data sharing include good infrastructure, trust (or rather lack thereof), awareness of the context of data, and structural obstacles.
What is a scenario where the question of data ethics is raised in your paper?
One scenario we outline in the paper is that of a doctoral candidate attempting to investigate the fertility of soil samples in an African country to find the best approaches to assist farmers. The intended recipients of the data are the government and NGOs.
The researcher cannot gain access to the samples from the local community, which has already collected such data for their internal purposes and is already using it to organize within their communities.
Lacking necessary historical context of local communities, the researcher assumes that the farmers are reluctant to participate in the research because they don’t want their efforts to be replicated.
Further investigation reveals that local farmers are hesitant to share soil samples because they fear that the government might want to claim ownership of their lands.
There is a perceived notion (based on the colonial and apartheid pasts of the country) that people from outside the community are here to steal from them, whether that be data or resources. This reflects the importance of understanding cultures, context, and history and then building trust with communities before embarking on data collection.
Conclusion
Like the farm-to-table movement involves growing and harvesting local ingredients and using them to serve meals to the local community, we should be thinking similarly about data, particularly related to using data to nourish rather than extract resources from local communities.
Progress within the African continent poses a unique set of challenges to researchers and other data practitioners, requiring a harmonious partnership between all stakeholders. To achieve this, ethical data collection and sharing must be further studied, and the following challenges outlined in the paper must be addressed:
- Colonial-era oppression has been reincarnated in various data practices, including data collection, sharing, and analysis. The AI industry needs to acknowledge the gravity of the power asymmetries in Africa. As a community, we must recognize that data belonging to African communities is being extracted and used in ways that unintentionally harm these communities and disproportionately benefit or empower non-African stakeholders.
- The importance of building trust and the methods used to achieve it are in urgent need of re-evaluation. Trust is the fundamental component of all relationships in a data-sharing ecosystem. In Africa, data-sharing practices can contribute to a lack of trust among stakeholders in the data-sharing ecosystem. Researchers need to respect that people often agree to provide sensitive data because they trust the researcher to take proactive measures to guarantee that the data collected will not be shared in a way that would be harmful to them.
- Awareness of “context and local knowledge” will improve the effectiveness of data. Data researchers can no longer turn a blind eye to local norms and contexts. Contexts are crucial to understanding data thoroughly, and data sharing practices that discard contexts risk becoming irrelevant and potentially harmful to local communities.
The focus of AI Ethics is primarily centered around models after they’ve built from data, yet there is an even broader conversation on the Ethics of Data Collection and Sharing. As an AI community, to ensure that the systems we build are fair, ethical, and just, we need to start at the beginning: the data.