AI/ML @ AEM

Understanding When AI Needs to be Governed at an Enterprise Level

Written by Stephan J. Lemmer, PhD | Oct 24, 2025 10:33:55 PM

Numerous articles detail the importance of AI Governance, what an AI governance group should do, who should be in it, and how to balance risk and benefit. 

Lacking in these articles, however, is a way to determine whether a project needs to be reviewed by the governance group---a definition of Governable AI. Such a definition is critical---it is impossible to thoroughly review every new product---and every use of every product---that is marketed as AI, but AI used inappropriately (even without malicious intent) can still cause significant harm and legal risk.

While there are many excellent definitions of AI that target different industries and agencies, none of them properly capture the essence of whether a project should be reviewed by an AI governance group: some definitions are too broad, which risks overworking the governance team, delaying projects that contain only nominal AI components, and (because of these delays) reducing compliance. Some definitions are too narrow and may miss projects capable of causing significant harm. Still others try to split the difference but create definitions with too much technical nuance for project managers and others who must make project-related decisions but may not have deep AI expertise.

So, in the interest of getting a second deliverable out of something I had to do anyway illuminating an oft-overlooked challenge in AI governance, I’m detailing my thoughts on a definition of Governable AI that can be used by project managers as a checkpoint during project startup.

Such a definition must be:

Comprehensive: The governance group should be asked to assess risk and provide mitigations any time the system can cause harm by, for example, making incorrect or biased decisions, providing potentially harmful advice, or leaking sensitive data.
Precise: The governance group should not need to review proposed tools and uses that are incapable of causing harm, particularly due to the proliferation of AI in commercially available tools.
Approachable: Decisions on whether to consult the governance group will need to be made by individuals with subject matter expertise, but a limited technical AI background.


Existing Federal Definitions of AI

As a government contractor, a natural first step is to review definitions present in standards and legal documents. The tangled web of (often unattributed) references reveals five “originating” definitions used in federal policy:

FY 2019 National Defense Authorization Act

In this section, the term ‘‘artificial intelligence’’ includes the following:

(1) Any artificial system that performs tasks under varying and unpredictable circumstances without significant human oversight, or that can learn from experience and improve performance when exposed to data sets.

(2) An artificial system developed in computer software, physical hardware, or other context that solves tasks requiring human-like perception, cognition, planning, learning, communication, or physical action.

(3) An artificial system designed to think or act like a human, including cognitive architectures and neural networks.

(4) A set of techniques, including machine learning, that is designed to approximate a cognitive task.

(5) An artificial system designed to act rationally, including an intelligent software agent or embodied robot that achieves goals using perception, planning, reasoning, learning, communicating, decision making, and acting

DOD 2018 AI Strategy

AI refers to the ability of machines to perform tasks that normally require human intelligence – for example, recognizing patterns, learning from experience, drawing conclusions, making predictions, or taking action – whether digitally or as the smart software behind autonomous physical systems

National Artificial Intelligence Initiative Act of 2020

The term “artificial intelligence” means a machine-based system that can, for a given set of human-defined objectives, make predictions, recommendations or decisions influencing real or virtual environments. Artificial intelligence systems use machine and human-based inputs to—

(A) perceive real and virtual environments;

(B) abstract such perceptions into models through analysis in an automated manner; and

(C) use model inference to formulate options for information or action.

 

OECD Recommendation on the Council of Artificial Intelligence

AI system: An AI system is a machine-based system that, for explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments. Different AI systems vary in their levels of autonomy and adaptiveness after deployment.

ISO/IEC 22989:2022

[AI system:] engineered system that generates outputs such as content, forecasts, recommendations or decisions for a given set of human-defined objectives

 

 

 

The conditions put forth in these definitions capture three general approaches to defining AI: human-like, techniques and technologies, and what it does. We review these themes, the intuition behind them, and how they satisfy our requirements of comprehensive, precise, and approachable, in detail below.

Theme 1: “Human-Like”

Relevant Passages

[…] that solves tasks requiring human-like perception, cognition, planning, learning, communication, or physical action. (FY 2019 National Defense Authorization Act)
•An artificial system designed to think or act like a human [...] (FY 2019 National Defense Authorization Act)
[…] designed to approximate a cognitive task. (FY 2019 National Defense Authorization Act)
•An artificial system designed to act rationally […] (FY 2019 National Defense Authorization Act)
•AI refers to the ability of machines to perform tasks that normally require human intelligence (DOD 2018 AI Strategy)


Intuition

Thinking like a human was the original definition of AI, and if you ask most people what AI is, they will say something along these lines.

How well does it satisfy our requirements?

Academics first defined AI in 1955, when a contemporary computer had 450 vacuum tubes, communicated through a typewriter, and was used mostly for mathematics. Nowadays, a processor commonly used in cellphone chargers has 12,000 transistors[1]. There was no need for a nuance when distinguishing “human-like thought” from “computer-like thought” because computer-like thought was so limited.

Today, computer-like thought is a much broader, and fuzzier, category. The goalposts of AI have moved again and again: once a computer does something competently, it becomes computer-like thought (chess is the canonical example of this). This goalpost shifting has already moved the conception of “human-like thought” well beyond algorithms that can cause harm: while many are already using “AI” and “Large Language Models” synonymously, much simpler techniques can cause significant harms when used for critical decisions[2]. Because of this, the definition is neither comprehensive nor precise, although it is approachable.

Speaking of simpler techniques, that brings us to…

Theme 2: Techniques and Technologies

Relevant Passages

•[…] including cognitive architectures and neural networks. (FY2019 National Defense Authorization Act)
•[…] including machine learning. (FY2019 National Defense Authorization Act)
•[…] can learn from experience and improve performance when exposed to data sets (FY2019 National Defense Authorization Act)
•[…] recognizing patterns, learning from experience […] (DOD 2018 AI Strategy)
•[…] abstract such perceptions into modes through analysis in an automated manner
•use model inference […] (National Artificial Intelligence Initiative Act of 2020)

Intuition

Some of the need for AI governance is due to the complexity of modern techniques. Modern learned systems generalize a relatively small number of samples across very large and complicated distributions, leading to powerful behaviors that can be counterintuitive, unpredictable, and opaque. This is exacerbated by how much easier it is to implement and advertise an AI solution (download scikit-learn or HuggingFace, throw in your data, make some slides) than to understand and evaluate its performance (choose your metrics, define “fair”, evaluate disparities across social groups, determine statistical significance, make sure your data accurately reflects your application, monitor dataset drift, etc…). If governance is based on the specific technology used, tools that accidentally become harmful due to this complexity are avoided.

How well does it satisfy our requirements?

Classifying something as AI or not AI based on the specific techniques that are used has meaningful shortcomings related to all three of our requirements. The most notable shortcoming is with respect to the approachable requirement: to make the determination, the decision maker must both know the underlying technology they are using (often unavailable in commercial products) and understand whether it fits into one of the defined categories.

This definition strategy can also have issues related to the paradox of the heap: how many parameters cause a linear regression to become “AI”? An LLM is just Excel’s trendline function, blown up to a mind-numbingly large scale, and an additive model[3] is technically a one-layer neural network. How many if-then clauses separate an expert system-based AI from ordinary software? Is a logistic regression model automatically safe because it only has two parameters?

The answer to the last question is emphatically no, meaning such definitions are unlikely to be comprehensive. In addition, even a hypothetically perfect technique-based definition written today will become obsolete as new AI technologies are invented.

Precision-wise, technologies that are defined as AI under these regulations are becoming (have become?) ubiquitous in safe everyday applications. Neural networks, the term most closely associated with current technical definitions of AI, have been used for traditionally non-AI tasks such as image and video compression and noise reduction for hearing aids. Should the governance group be consulted if someone wants to make a video call using a program with neural video compression? Or if someone wants to install a new spell checker that uses a transformer?

Theme 3: What It Does

Relevant Passages

•achieves goals using perception, planning, reasoning, learning, communicating, decision making, and acting (FY 2019 National Defense Authorization Act)
•perception, cognition, planning, learning, communication, or physical action (FY 2019 National Defense Authorization Act)
•drawing conclusions, making predictions, or taking action (DOD 2018 AI Strategy)
•make predictions, recommendations, or decisions influencing real or virtual environments. (National Artificial Intelligence Initiative Act of 2020)
•perceive real and virtual environments (National Artificial Intelligence Initiative Act of 2020)
•formulate options for information or action (National Artificial Intelligence Initiative Act of 2020)
•generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments. (OECD Recommendation)
•generates outputs such as content, forecasts, recommendations, or decisions (ISO/IEC 22989:2022)


Intuition

In addition to a strong alignment with the notion of “human-like”, processes such as perceiving, formulating options for action, generating forecasts, and (most importantly) making decisions all pose the risk of an AI creating harmful outputs. The way that these outputs influence “real or virtual environments” can amplify those risks[4]. Because these outputs are harmful, and these harms are amplified by their effect on the environment, such tooling should be reviewed by a governance group.

How well does it satisfy our requirements?

While “what it does” is the definition most closely tied to the potential to cause harm, and therefore the need for review, the requirements as written are challenging to interpret and untenably broad. In other words: neither approachable nor precise. The former shortcoming (not approachable) is likely self-evident. As evidence of its poor precision, we apply the OECD’s definition of AI to the wheel of lunch:

AI system: An AI system is a machine-based system (✔) that, for explicit or implicit objectives (✔ the objective is find me a place to eat), infers, from the input it receives (✔ local restaurants[5]), how to generate outputs such as predictions, content, recommendations (✔), or decisions that can influence physical or virtual environments (✔ tells me where to eat). Different AI systems vary in their levels of autonomy and adaptiveness after deployment.

Creating an Actionable Definition

None of the policy-based definitions meet the requirements of comprehensive, precise, and approachable in a satisfactory way. However, they do provide insights into how to build a usable definition based on the concepts of what it does and techniques and technologies.

A new what it does definition is straightforward: if the application takes an incorrect action (planning AI is planning to fail, remember?), takes incorrect actions repeatedly, or has its outputs used in unintended ways, can it cause harm? While it does not align with the definitions of AI enumerated above, it is also not unprecedented in policy: the NIST AI RMF uses risk as their primary decision variable, while the EU and US Governments have published lists of prohibited, high-risk, rights- and safety-impacting, and high-impact applications. Such policy-backed definitions can be adjusted to your risk tolerance and will improve over time. Such a definition also has the benefit of leveraging expertise most effectively: a project manager knows the harm a system or the data it uses can cause in a way that an AI generalist does not, while an AI expert can then determine the risk and mitigation strategies.

This definition is comprehensive and approachable but not precise. For example, any calculation performed within the field of civil engineering meets this definition. To address this, we draw the line between “safe” engineering practice and Governable AI based on a techniques and technologies-like definition. Specifically, we focus on the interaction between inputs and outputs: if the full range of possible inputs can be defined and compared to some failure criteria, then AI governance isn’t required. Like the definition of risk above, this is something that is interpretable by the SMEs that know the risks and potential inputs that they are dealing with.

A brief definition of governable AI---which would be expanded on in a full governance document---could then be:

A Governable AI use is a one that:

May receive inputs for which outputs are not explicitly defined or characterized; and

Performs or could be used to perform tasks that are prohibited, high-risk, rights impacting, safety impacting, high-impact, or may otherwise cause harm due to incorrect outputs, misused outputs, or data leakage.

This can be evaluated succinctly in two questions: Can its behavior be predicted in every possible condition? and can it be used such that data loss, incorrect outputs, or biased outputs can cause harm?

Risks and Mitigations

Classifying projects by risk has the benefit that it aligns which AI tools that are reviewed with the goal of AI governance---ensuring there are no unintended harms---and caters to the strength of the person who needs to make a decision. In other words, a subject matter expert on the project is asked to judge the potential risks, which is something an AI generalist is distinctly unsuited for. However, there are some risks that must still be considered with this definition.

Risk 1: Unforeseen Applications

The biggest risk of enumerating high-risk applications is the diversity of applications that exist. Because of this, a project that would be classified as high-risk could end up being treated as low-risk simply because it wasn’t written into policy. In the proposed definition, we add the statement may otherwise cause harm to address this. While this relies on a judgment call on the part of an SME, the SME is also the individual most capable of making this judgment call. If the chance of a high-risk application bypassing review in this manner is unacceptable, tasks that make an AI governable can be defined in the opposite way---enumerating safe uses and requiring review for all others.

Risk 2: Side Effects

Side effects---unintentional consequences of a low-risk system working as intended---are a significant problem in engineering, traditional computing, and AI: changes in social media feeds can significantly alter users’ moods, social media itself increases political polarization and decreases happiness, and AI chatbots predictably affect people’s political beliefs in discussions despite (or perhaps because of) efforts to ensure they are safe. Although these applications would have passed a governance review at their conception, I would argue that a filter bubble appreciably impairs an individual’s ability to make an informed decision, and operates beyond a person’s consciousness, meaning that, knowing what we know now, social media would be prohibited under Article 5 1(a) of the EU AI Act.

This isn’t uniquely an AI problem, nor can it be comprehensively anticipated---some problems are only revealed by scale. In the context of AI governance the mitigation is to ensure that a plan is developed to monitor the real-world effects of deployed tools during the initial review process.

Risk 3: Misuse

In misuse, an individual creates an unanticipated risk by using an AI-enabled tool outside of its intended design. This may be due to malicious intent, such as generating fake receipts or asking a chatbot how to perform illicit activities, or accidental due to misaligned expectations (and overzealous marketing). Misuse is a longstanding risk in AI---a classical and readily available tracking-by-detection algorithm to help self-driving cars avoid pedestrians can be trivially repurposed help the military find targets---but the flexibility of modern generative AI tools exacerbates the problem, as any of the 8.2 billion people on Earth can think of a way to misuse the system.

Misuse is difficult to predict, but good general practice is to ensure that capabilities are made clear and expectations are managed to prevent incidental misuse. If possible, the interface should constrain the user flow to ensure that the user stays within the system’s designed and validated capabilities. If this is not possible (such as in a natural language chatbot), such risks should be mitigated to the extent possible via thorough red teaming and usage should be monitored to identify “in-the-wild” failures.

Handling AI Tool Use

The last challenge related to defining AI for the purposes of governance is related to the question How do we govern employee use of ChatGPT/Gemini/CoPilot/Claude/StEVE? [6]

The first---and most important---note about AI tool use is that whether a tool is or isn’t commercially available is unrelated to whether it is governable. The potential harm caused by commercial tools used in unsafe ways is the same as when internally developed tools are used in unsafe ways. This---alongside the definition of governable AI---must be made clear to anyone who will be working with AI tooling.

For uses that aren’t governable, the answer is a little muddier. It is, of course, important to handle concrete requirements: does the contract allow AI use? Does the data need to be protected? Beyond this, the evaluation should be similar to the underlying motivation of governance: what happens when the AI is incorrect, can it be fixed, and is it reasonable to expect that it will be fixed? For common uses, it is worthwhile to provide concrete guidelines appropriate use and failure mitigation. For other uses---such as those that can be created and passed to ChatGPT on a whim---individuals should be given education, examples, and guidance, to evaluate appropriate use and mitigation on their own.

Conclusion

While many definitions of AI have been proposed for other purposes, such definitions fail to capture AI uses that may cause harm, result in too many false positives, and may not be understood by the people who have to decide if they are using AI. Because of this, these definitions of AI will result in a governance process with a heavy backlog that simultaneously doesn’t review potentially unsafe AI.

For this reason, we propose a definition of Governable AI that focuses on being comprehensive, precise, and approachable. This definition downplays the term AI itself, instead treating AI as normal technology and focusing on harms and uncertainties. This is, of course, only a start: socialization is a challenge, loopholes will be found, and any judgment of potential harm is subjective. But a start is better than nothing, and I look forward to honing the process as the number and specificity of AI projects increases.

[1] This number may not be exact, but it works well for illustrative purposes. A friend of mine that works in embedded software once told me “The standard line about your cellphone having more processing power than the Apollo landers is definitely true, but these days, it’s likely that your cellphone charger has more processing power than the Apollo landers,” and I’ve anchored on that despite my dubious understanding of semiconductors.

[2] The earliest record I could find of the controversial COMPAS tool was 2015. While their exact algorithms are proprietary, this makes reasonable to assume that they used “computer-like” statistical methods: neural networks started appearing frequently in research in 2012, and the transformer architecture that powers modern LLMs was created in 2017.

[3] An additive model is that thing your doctor does where they add up a few numbers and say “your risk score is six, we’d like it to be a four” and you say “There’s no way this is accurate, you just added five numbers together.”

[4] I’m not convinced this phrasing actually narrows the list of things that are AI, but it is illustrative in this case.

[5] Unfortunately, Wheel of Lunch can no longer use the Yelp API. It used to, though, so I’m still counting it.

[6] I made this one up, but wouldn’t an AI assistant with the branding Steve be funny? Instead of “Hey Gemini,” you could say “EH STEVE!