August 18, 2023

AI’s Data Governance Challenge: Where to Start

Governments and companies are increasingly focused on developing data governance systems that will enable AI progress while minimizing the potential for negative impacts.

The following is an edited version of the content presented in the video.

Governments around the world are drafting legislation and guidelines to strengthen data governance practices related to AI. In this piece, we focus on four areas that are critical to data governance for providers and users of AI models.

Provenance and Privacy

The European Artificial Intelligence Act, or EU AI Act, which is expected to pass this year, sets out particular data governance practices for validating data used in training — as well as for testing AI models to ensure they adhere to existing privacy and data protection rules (e.g., GDPR), including standards related to data quality and integrity. For an overview of the act, see our piece “Primer on the EU AI Act: An Emerging Global Standard for Artificial Intelligence.”

In July, the US Federal Trade Commission sent one of the leading platforms in this space a request for documentation of AI governance practices, including a series of questions that give a sense of the commission’s priorities. Some questions focus on how a company develops and trains its AI models, including what data it uses, how it is collected, how long it is stored, and whether individuals can opt out so their data is not used in training or for other purposes.

To get started down the road to better data governance, companies should ask the following questions about the data that they use to train their AI models: 

  • How was the data obtained? For example, was it scraped from the Internet, purchased from third-parties, or collected from first-hand interactions?
  • Does the dataset contain personal information, and is it possible to identify who the information is about?
  • Does the dataset contain proprietary information (e.g., trade secrets or other IP)? 
  • Is the data accurate? Is it up-to-date?
  • How is the data stored and is it adequately protected? How long will it be stored?
  • Could similar outcomes be reached while using strictly de-identified data?

Transparency and Explainability

The concepts of transparency and explainability are at the fore in discussions about ethical and trustworthy AI. They are often core principles in existing and proposed legislation and standards, including in the EU AI Act and frameworks issued by the OECD and the US National Institute of Standards and Technology (NIST).

When it comes to transparency, the focus is often on ensuring users understand when they are interacting with an AI system. Regulatory requirements and customer expectations on transparency will drive businesses to disclose the existence of and logic behind AI interactions.

The goal of explainability is to ensure that humans understand the rationale behind an AI algorithm’s output. This is complex, because the way AI algorithms make decisions is often a mystery — not only to the user, but often to the developer and provider of the AI model as well. Indeed, that is one of the core features of advanced AI. The user often bears the risk associated with using AI models, and so it is important that users understand what the risks are. 

When developing explainability standards and policies, it is important to account for the needs of all users, stakeholders, and affected groups, and ensure explanations are not too technical for the general public to understand.

When selecting third-party partners, including those that provide APIs, consider whether they can guarantee the kind of transparency and explainability required, particularly if it is not easy to engage with the third-party’s developers or evaluate its supply chain.

Bias and Discrimination

Algorithmic due process is a concept that is central to frameworks for ensuring AI models are not biased and do not discriminate. This is particularly critical in areas where AI models drive decisions that can significantly affect people’s lives, including decisions about who has access to credit or education.

This is a central concern in the White House Blueprint for an AI Bill of Rights and the framework issued by the US NIST. The California Privacy Protection Agency recently issued rules requiring a focus on eliminating bias and discrimination in automated decision-making. During a visit to the White House in July 2023, leaders from seven AI companies issued a set of commitments that included reducing bias and discrimination and protecting privacy.

The EU AI Act requires companies to validate training practices and test data to ensure it is representative, error free, and complete. This requires use of statistical approaches that ensure accurate representation of different groups. The EU AI Act also focuses on diversity, nondiscrimination, and fairness, which includes equal access, gender equality, and cultural diversity.

Compliance Mechanisms

Governments increasingly mandate that companies that provide or use AI establish data governance systems to uphold the principles discussed above. These systems typically include mechanisms for senior-level accountability, tracking and documenting use of data by AI models, and training staff in compliance. There is increasing attention given to whether legislation should require companies to appoint specific people to oversee AI governance within their organizations. Some companies task their chief information officers with oversight in this area, but AI raises complex legal, ethical, cybersecurity, cultural, and organizational issues, and companies might need to create multidisciplinary teams to assemble the range of knowledge and skill necessary to meet requirements.