From Model Cards to Runtime Governance: Operationalizing AI Risk Management for Production Agentic Systems

For years, AI governance has leaned on documents.

Model cards. Risk assessments. Data sheets. Validation reports. Approval memos. Vendor questionnaires. Policy attestations.

Those artefacts still matter. But for production agentic systems, they are no longer enough.

The reason is simple: agentic systems do not merely produce outputs. They call tools, retrieve context, trigger workflows, update records, hand off to other agents, and sometimes act before a human reviews every step. A static document can describe what the system was supposed to be. It cannot prove what the system did at 10:42 on Tuesday when a customer record changed, a tool call failed, a prompt was updated, and an agent chose an exception path.

That is the shift from model cards to runtime governance.

Static Governance Has a Production Problem

Traditional AI governance assumes a fairly stable object of review. A model is trained. It is validated. It is approved. It is monitored periodically. Changes go through review.

That model still works for some predictive systems. It is weaker for generative AI, and weaker still for agentic AI.

An agentic system is a live operating pattern, not just a model. Its behaviour depends on the model, prompt, retrieval source, tool permissions, memory, orchestration logic, policy checks, user intent, downstream system state, and external data. Change any of those and the risk profile can move.

I once reviewed a customer-service automation where the approved model was not the problem. The issue was the retrieval layer. A policy document had been updated in the knowledge base, but the retrieval pipeline still surfaced an older version for a narrow class of refund cases. The audit file said the model had passed validation. The customer experience said something else.

That is the danger. Static governance can be technically true and operationally blind.

MAS Moves the Discussion Toward Operational Evidence

Singapore’s Monetary Authority of Singapore pushed this conversation forward in March 2026 when it announced the completion of phase two of Project MindForge and the publication of an AI Risk Management Toolkit for the financial services sector. The toolkit was developed with 24 industry partners and covers traditional AI, generative AI, and emerging agentic AI technologies.

The package includes an AI Risk Management Operationalisation Handbook and a supplement of case studies from financial institutions. MAS said the handbook is organised around four areas aligned with its proposed guidelines: scope and oversight, AI risk management, AI lifecycle management, and enablers. It also said the handbook will be updated periodically as AI use matures and supervisory expectations evolve.

That matters because it signals a practical direction. Regulators are not asking only whether a firm has principles. They are moving toward whether firms can show working governance across the AI lifecycle.

For agentic systems, that means evidence generated during operation.

The Runtime Governance Stack

Runtime governance is the set of controls that operate while the AI system is being used, not only before it goes live.

For CIOs, I would break the stack into six layers.

Inventory: what agents exist, who owns them, what systems they touch, and what risk tier they carry.
Identity and permissions: what each agent can read, write, call, approve, or escalate.
Policy enforcement: what rules are checked before an action changes state.
Observability: what the agent did, why it did it, which tools it used, and what outcome followed.
Intervention: when humans, guardian agents, or automated controls can pause, block, roll back, or reroute.
Evidence: what artefacts prove the control actually ran.

The difference from traditional governance is timing. Runtime governance does not wait for a quarterly review to discover that a control was weak. It makes the control part of execution.

Permission Boundaries Are the New Access Control

Agentic AI turns access control into a business risk issue.

A human user usually has a job role, a manager, training, and disciplinary consequences. An agent has a token, credentials, tool bindings, and whatever constraints engineering and governance teams remembered to implement.

That is not enough.

Microsoft’s April 2026 Agent Governance Toolkit is a useful signal of where runtime governance is heading. Microsoft described open-source runtime security for AI agents, including policy-engine deployment patterns such as running governance as a sidecar container alongside agents or using middleware integration for agents built on Foundry. The important idea is not the specific toolkit. It is the architectural principle: governance should sit in the path of agent action, not only in a document repository.

Permission boundaries should be explicit. An agent may read customer records but not export them. It may draft a refund decision but not approve it. It may update a case note but not change account status. It may call an internal search tool but not an external browser. It may recommend containment during a cyber incident but require human approval before disabling a production account.

The question is not “Can the model do it?” The question is “Should this agent, in this context, with this confidence level, be allowed to do it now?”

Monitoring Must Catch Behaviour, Not Just Availability

Most IT monitoring answers familiar questions. Is the system up? Is latency acceptable? Are errors rising? Is capacity sufficient?

Agentic systems need those metrics, but they also need behavioural monitoring.

Is the agent using the right tools? Is it escalating at the expected rate? Are users overriding its recommendations? Are outputs drifting after a prompt change? Are certain data sources producing weak answers? Are tool calls failing silently? Are actions clustered in unusual patterns? Are agents using more tokens, more retries, or more external context than expected?

NIST’s AI Risk Management Framework and Generative AI Profile both emphasise lifecycle risk management rather than one-off review. The practical implication is that production AI needs continuous measurement across validity, reliability, safety, security, accountability, transparency, and other trust characteristics. For agentic AI, those characteristics must be monitored through runtime signals, not just pre-production tests.

This is where many firms will struggle. They have model validation teams, cyber monitoring teams, and application support teams, but no one owns agent behaviour as a production risk category.

That ownership gap will become expensive.

Drift Is No Longer Just Model Drift

When leaders hear “drift”, they often think of model performance decay.

In agentic systems, drift has more shapes.

Context drift happens when retrieval sources change, business policies change, or stale knowledge appears in answers. Tool drift happens when connected systems change APIs, permissions, fields, or workflows. Behaviour drift happens when prompt changes, model upgrades, or memory patterns alter how the agent plans. Risk drift happens when a low-risk use case gains new data access or starts affecting more critical decisions.

A model card will not catch that by itself.

Runtime governance should track material changes across the whole agent system: model versions, prompts, retrieval indexes, tools, data paths, policy rules, permission scopes, human approval thresholds, and vendor-managed capabilities. Each material change should trigger an appropriate level of retesting and evidence capture.

The hard truth is that agentic AI makes change management continuous. If the governance process assumes change is occasional, the control model is already behind.

Audit Evidence Should Be Generated by the Workflow

Audit evidence cannot depend on people reconstructing history from chat logs.

For every material agentic workflow, the system should preserve evidence of:

the user request or trigger;
the agent plan or task decomposition;
data sources retrieved;
tool calls made;
policy checks applied;
approvals requested and granted;
actions taken in systems of record;
exceptions raised;
human overrides;
final outcome.

This evidence needs to be searchable, time-stamped, tamper-resistant, and tied to owners. It should support internal audit, regulator review, incident investigation, and operational improvement.

In finance, this will become especially important. MAS’s toolkit is practical guidance rather than a final rule, but it reflects supervisory direction: AI risk management must be operationalised. A bank will not be able to rely on high-level principles if an agentic system creates customer harm, exposes data, or takes an unauthorised action. It will need to show the control trail.

The Operating Model

Runtime governance is not only a tooling problem.

It needs an operating model.

The CIO should own the technical control environment. Risk and compliance should define policy requirements and evidence standards. Business owners should own the outcome of each agentic workflow. Security should own identity, access, monitoring, and response integration. Internal audit should test whether controls work, not merely whether policies exist.

This is where many organisations need to grow up fast. Agentic AI is too operational for an ethics committee alone, too probabilistic for traditional application support alone, and too business-facing for engineering alone.

I would start with a production readiness gate for agentic systems. Before go-live, require an inventory record, owner, risk tier, permission map, test results, monitoring plan, rollback process, incident playbook, and evidence plan. After go-live, require periodic review of telemetry, exceptions, drift indicators, override rates, and control failures.

That is not slowing innovation. It is making production safe enough to scale.

From Documentation to Proof

Model cards were useful because they forced teams to describe a model’s purpose, data, limitations, and evaluation.

Runtime governance goes further. It asks whether the system behaves within boundaries while real work is happening.

That is the future of AI risk management. Not less documentation, but more live proof. Not only “we approved the model”, but “we can show what the agent did, what control ran, who approved the exception, and how we recovered when the workflow failed.”

For production agentic systems, governance has to move into the runtime. The organisations that understand this will build trust faster. The ones that do not will discover that static documentation is a thin shield against dynamic risk.