MAS AI Risk Management Handbook: The New Evidence Burden for Financial Institutions

Financial institutions have spent the last two years saying the right things about responsible AI. They have created steering committees, drafted principles, approved pilots and published statements about fairness, transparency and human oversight. Useful work, certainly. But in 2026, the question from regulators, internal audit and boards is becoming much sharper: can you prove it?

That is the real significance of the Monetary Authority of Singapore’s latest Project MindForge output. In March 2026, MAS announced the successful conclusion of phase two of Project MindForge, culminating in an AI Risk Management Toolkit for the financial services sector. The toolkit, developed with a consortium of 24 banks, insurers, capital markets firms and other industry partners, includes an AI Risk Management Operationalisation Handbook and a supplement of case studies from financial institutions.

The title sounds procedural. The impact is strategic. MAS is nudging the industry away from abstract AI ethics towards operational evidence: inventories, risk assessments, lifecycle controls, accountable owners, review records, case studies and implementation resources. Frankly, this is where AI governance becomes real. A principle can sit in PowerPoint. Evidence has to survive supervisory review.

From principles to proof

Banks already know how to manage formal risk domains. Credit risk has models, limits and approvals. Technology risk has controls, testing and incident processes. Outsourcing risk has inventories, due diligence and contracts. AI risk has often been treated as something softer: a blend of innovation policy, data science discipline and legal caution.

That approach will not survive agentic AI. A model that recommends a product is one thing. An AI agent that initiates a remediation workflow, drafts a customer response, opens a fraud investigation or routes a credit exception is closer to an operational actor. It touches customers, systems, staff and controls.

I once advised a Singapore bank where the most difficult part of a data transformation was not the analytics. It was proving who owned each decision. The data team could explain the model. The business could explain the product. Compliance could explain the policy. But when an exception crossed all three, accountability blurred. AI will magnify that problem unless firms deliberately build evidence around ownership.

The MindForge handbook matters because it speaks the language of operating discipline. According to MAS-linked coverage, it is organised around four areas aligned with proposed MAS Guidelines on AI Risk Management: scope and oversight, AI risk management, lifecycle management, and enablers. Those headings may sound dry, but they map neatly to the questions every serious financial institution must answer.

Scope and oversight: what AI do you actually run?

The first evidence burden is simple: know what exists.

Many institutions underestimate this. They maintain formal inventories for approved models, but miss the AI embedded in vendor products, contact-centre tooling, developer assistants, document review platforms and marketing workflows. The board may think the firm has 40 AI use cases. The real estate, including shadow and vendor-provided AI, may be far larger.

Scope and oversight should force a cleaner conversation. What counts as AI? Which systems are in scope? Who owns an AI use case when the model comes from a vendor but the customer impact belongs to the bank? Which committee sees the full picture rather than a curated innovation dashboard?

The hard truth is that an AI inventory is no longer administrative hygiene. It is the control plane for AI risk. Without it, firms cannot prioritise reviews, monitor high-risk use cases, manage vendors or explain exposure to supervisors. In practical terms, the inventory should include purpose, owner, model or vendor, data used, impacted customers, decision rights, human controls, materiality rating and review date.

That may sound heavy. It is lighter than discovering during an incident that nobody knew a vendor assistant was summarising customer complaints using sensitive data.

Risk management: materiality before bureaucracy

The second burden is materiality. Not every AI use case deserves the same governance weight. A tool that summarises public research does not carry the same risk as a model influencing credit decisions or an agent triaging fraud alerts.

Good AI governance separates low-risk productivity tools from systems that affect customers, markets, capital, conduct or regulatory reporting. The MindForge framing around AI usage identification, risk materiality assessment and inventorisation is useful because it prevents two bad outcomes: over-governing harmless tools and under-governing consequential ones.

I have seen both. One insurer buried simple internal summarisation tools under weeks of review, frustrating staff and encouraging workarounds. Another organisation allowed a seemingly modest analytics model to shape frontline decisions without documenting the data assumptions behind it. The first killed adoption. The second created silent risk. Neither was mature governance.

A practical materiality model should ask five questions:

Does the AI affect a customer outcome or financial decision?
Does it use sensitive, confidential or regulated data?
Can it trigger an action, or only provide advice?
Is the output explainable enough for review and challenge?
Would failure create regulatory, conduct, operational or reputational harm?

The answers should determine control depth. High-materiality AI needs stronger testing, independent review, human approval, monitoring and evidence retention. Low-materiality AI needs proportionate guardrails, not a bureaucratic maze.

Lifecycle management: AI risk changes after launch

The third evidence burden is lifecycle control. Financial institutions are comfortable approving systems before go-live. AI makes post-launch discipline more important because performance, usage and context can drift.

A model trained for one customer segment may behave differently when used in another. A fraud model may degrade as criminals adapt. A generative AI assistant may produce acceptable answers in testing but fail when connected to fresh product documents. An agent may behave safely in a pilot, then become risky when teams connect it to more tools.

Lifecycle management means documenting controls from design through retirement. That includes data selection, model validation, testing, approval, deployment, monitoring, incident handling, change control and decommissioning. For agentic AI, it also includes tool permissions, action limits, escalation rules and kill switches.

This is where many firms will struggle. Their model governance teams, technology teams and business units often operate on different rhythms. Data scientists iterate quickly. Technology release teams focus on deployment. Risk teams look for control evidence. Business teams want outcomes. Without a common lifecycle, AI governance becomes a relay race where each runner uses a different map.

The bottom line is simple: an AI system is not “approved” once. It earns the right to remain in production through continuing evidence.

Enablers: governance needs capacity, not slogans

The fourth area, enablers, is often treated as secondary. It is not. MAS-linked summaries describe enablers as organisational capabilities, infrastructure and resources needed for ongoing responsible AI use and risk management. That is consultant language for a very practical point: governance fails when nobody has the time, skills or tooling to do it.

A bank can approve the most elegant AI policy in the market and still fail if model owners do not understand it, internal audit cannot test it, engineers lack monitoring tools, and procurement does not know how to assess AI vendors.

Financial institutions should therefore invest in three enabling capabilities.

First, they need AI risk literacy beyond the data science team. Product owners, compliance officers, legal teams, operations managers and technology leaders need a shared vocabulary. They do not all need to become machine-learning engineers. They do need to understand materiality, data leakage, hallucination, drift, bias, explainability, tool access and human accountability.

Second, they need evidence infrastructure. Spreadsheets will not scale. Firms need workflow systems, model inventories, approval records, test repositories, monitoring dashboards and audit trails that connect AI use cases to owners and controls.

Third, they need vendor governance. Much of the AI used in financial services will arrive inside platforms, SaaS products and outsourced processes. If procurement only asks whether a vendor uses AI, it is asking the wrong question. It should ask where AI is used, what data is processed, how outputs are tested, whether human review exists, how incidents are reported and how the firm can obtain evidence during audits.

The case-study signal

One of the more useful features of the MindForge toolkit is the case-study supplement. Case studies matter because financial institutions learn risk management from messy implementation, not from perfect diagrams.

A case study can show where ownership broke down, which controls were too slow, how teams measured model performance, what evidence auditors requested, and how business teams interpreted AI outputs. That is far more valuable than another high-level statement that AI should be fair, transparent and accountable.

This is also why Singapore’s collaborative approach is worth noting. MAS has long used industry projects to turn emerging technology into shared practice. Project MindForge follows that pattern. By working with financial institutions and then using BuildFin.ai to support further implementation resources and knowledge sharing, the regulator is signalling that AI risk management must become an industry capability, not a private experiment inside each bank.

What CIOs should standardise now

For CIOs, chief risk officers and AI governance leads, the immediate task is not to wait for perfect regulation. It is to build a minimum evidence baseline now.

Standardise the AI inventory. Standardise risk-tiering. Standardise approval templates. Standardise testing evidence. Standardise human accountability. Standardise third-party AI questions. Standardise incident categories. Standardise retirement criteria. These are not glamorous activities, but they are what make scale possible.

I would also insist on a monthly AI risk pack for senior management. Not a glossy innovation update. A proper operating report: number of use cases by risk tier, high-risk approvals, overdue reviews, incidents, vendor AI exposure, models with declining performance, exceptions accepted by business owners and controls still missing.

That report changes behaviour. When leaders see AI as a live risk portfolio rather than a collection of exciting pilots, funding decisions become more honest. Some projects move faster because they have clear controls. Others stop because the evidence is weak. That is exactly the discipline financial services needs.

The new burden is evidence

The MAS handbook should not be read as a Singapore-only compliance artefact. It is part of a broader shift in financial services. Regulators, boards and customers are becoming less impressed by AI ambition and more interested in AI control.

That shift is healthy. Financial institutions do not win trust by promising that AI is responsible. They win trust by proving that AI is inventoried, assessed, tested, monitored, owned and stopped when necessary.

The hard truth is that many firms will discover their AI governance is more performative than operational. They will have principles without inventories, committees without evidence, pilots without owners and vendors without transparent controls. The better firms will treat the MindForge handbook as a rehearsal for the future of supervision. In that future, the winning question will not be “Do you use AI?” It will be “Show me how you know it is under control.”