AI Agents in Banking Need Better Audit Trails. Here's an Azure Pattern Worth Considering.
Treasury's new FS AI RMF gives banks a practical lens for AI governance. Here's an executive view of the audit-trail questions to think through, and why Azure may be a pragmatic option.

On February 19, 2026, the US Treasury published the Financial Services AI Risk Management Framework. It's voluntary. It's non-binding. It contains 230 control objectives across a Risk and Control Matrix, a self-assessment questionnaire, and a Guidebook. In practical terms, it may be the most operationally detailed US framework yet for thinking about AI risk in financial services.
I do not hear enough bankers talking about it.
I've spent over two decades in financial services, and I'll tell you what I told my team when we started looking at AI agents for merchant onboarding and underwriting: I would not assume the audit trail an examiner, auditor, or internal risk team wants already exists in a turnkey product. In many cases, some portion of it will need to be designed deliberately, or at least validated much more rigorously than most teams expect.
The good news is that the components exist, the patterns are knowable, and Azure in particular has assembled a stack that can get you much of the way there. The harder part is that the most important piece of the architecture — the agent decision record — is still something each institution will need to define for itself.
Here is the operating pattern I think is worth considering.
The Gap Nobody Is Talking About
If you ask a mid-sized community bank today how they're logging AI agent activity, you'll typically hear one of three answers:
- "We're using specialized vendors for observability, we capture everything."
- "We have observability on our cloud provider."
- "We're not using AI agents in production yet."
The first two answers describe operational telemetry. The third is becoming less common. Neither of the first two answers, by itself, describes the kind of durable audit trail a regulated institution may eventually be asked to produce.
Here are the kinds of questions I would expect risk, audit, and examiner teams to gravitate toward, based on SR 11-7, SR 21-8, the FFIEC IT Examination Handbook, and the new Treasury FS AI RMF:
- Show me your model inventory. Pull three models. Show me the validation report and the last three monitoring cycle outputs.
- For this AI system, show me how you determined whether it qualifies as a "model" under SR 11-7.
- Show me the decision your agent made on this customer's underwriting on this date. Walk me through what the model was asked, what tools it called, what data it retrieved, what it returned, and who approved it.
- If your LLM provider updated the underlying model between January and March, how did you detect that, and what did you do about it?
These are not theoretical questions. Even without a formal published league table of AI exam findings, inventory gaps, weak documentation, and poor reproducibility are all easy places for a program to come under pressure.
General-purpose observability platforms may help with uptime and debugging, but they do not automatically answer those questions. LLM tracing tools help with workflow visibility, but they are not the same thing as a regulated recordkeeping approach. Cloud provider audit logs are useful, but they usually do not capture the full business context, tool activity, retrieval state, or human override story on their own.
What many banks may ultimately need is an agent decision record stored in a tamper-evident system, separated from operational telemetry, retained in line with legal and policy requirements, and organized so it can be reproduced during audit, validation, or examination.
That is the part I would not assume any vendor has fully solved for you.
What This Means for Banks
You do not need a new AI-specific rulebook to see where this is heading.
SR 11-7, SR 21-8, the FFIEC IT Examination Handbook, and now the Treasury FS AI RMF all point in the same direction: know what AI you are using, know how it is governed, know how it is monitored, and be able to explain a specific decision after the fact.
For executives, that translates into four practical questions:
- Do we have a complete inventory, including AI embedded in vendor products?
- Have we decided which use cases fall into model risk governance and documented why?
- Could we reconstruct a meaningful record of a past AI-assisted decision if risk, audit, or exam teams asked for it?
- Are we relying on generic observability tools where we really need governance evidence?
That is why Treasury's FS AI RMF matters. It is voluntary, but it gives banks a usable benchmark and a common language for these conversations.
The Architecture That Matters
The pattern I think is worth considering separates the system into three distinct planes: the live agent runtime, an operational telemetry layer for engineering, and a longer-lived decision record for governance and audit. In my experience, when banks collapse all of this into one observability backend, they often end up with a system that is easier to operate but harder to defend.
Three design choices matter more than the rest:
- For higher-risk decisions, consider making the decision-record write blocking. If the record cannot be written, there is a good argument the workflow should not proceed.
- Separate operational telemetry from governance-grade records. Engineers and auditors usually need different data, different retention, and different access controls.
- Minimize plaintext PII in long-lived logs. Tokenization or vault-based patterns can reduce privacy exposure and make retention decisions easier to manage.
Why Azure Specifically
Every major cloud provider has the building blocks for some version of this — managed model hosting, WORM object storage, and cloud-native audit logging. They all work if you're already committed to that ecosystem.
But for banks deciding where to land AI workloads in 2026, Azure is worth serious consideration. Not because every individual component is technically superior, but because the compliance posture and the legal/procurement surface can be easier to work through at many institutions that already have Microsoft deeply embedded.
Here is why Azure stands out to me in this context:
Governance and procurement
Microsoft Azure AI Foundry's ISO/IEC 42001:2023 certification is useful for vendor review conversations. More broadly, many banks already know how to diligence Microsoft as a strategic provider, which makes procurement easier than introducing a net-new observability vendor.
Tamper evidence and retention
Azure Immutable Blob Storage gives you practical WORM retention. Azure Confidential Ledger adds cryptographic receipts for the subset of decisions where stronger integrity proof matters. You may not need both for every workflow, but together they create a more defensible story.
Data governance
Azure documents that prompts, completions, and related customer data stay within the Azure service boundary and follow Azure's privacy commitments, though the exact data-processing model still depends on deployment type and geography. For many banks, that is a more straightforward governance conversation than sending prompts to another SaaS platform.
The tradeoff
Azure's weakness is that its native AI monitoring is less polished than the best purpose-built LLM observability tools. For a regulated bank, I would still solve the audit-trail problem first and accept that the debugging experience may be less elegant.
What the Decision Record Should Capture
You do not need a giant technical spec to start. For higher-risk AI decisions, I would want the institution to be able to reconstruct at least this much:
- Which agent or workflow ran, when it ran, and what version was in production
- Which model deployment was used, plus any provider response ID or version metadata available
- What prompt or instruction set was in force, ideally via versioning or hashing
- What tools were called and what outside data materially shaped the result
- What documents or retrieval sources were consulted, and which corpus version was in scope
- What the model produced, with sensitive data tokenized or otherwise protected
- Whether a human reviewed, approved, rejected, or modified the result
- Whether the final record can be shown to be complete and untampered with
That is the core idea. The exact schema, tiering logic, retention policy, and failure-mode policy are still yours to define. No vendor can do that institutional thinking for you.
A 90-Day Playbook for Banks
If you're running a community or regional bank looking at AI agents and you haven't started on this yet, here's a practical sequence to consider:
Days 1-30: Assess
- Read the Treasury FS AI RMF Risk and Control Matrix. Map your existing AI controls to its 230 objectives. The gaps are your roadmap.
- Inventory every AI system in production, including vendor AI features embedded in purchased SaaS products. This is one of the simplest places for governance blind spots to form.
- Document your determination on whether each AI system qualifies as a "model" under SR 11-7. Get a defensible written answer on file before an examiner asks.
Days 31-60: Architect
- Pick your platform strategy. If you're already on Azure, this article may be a useful starting point. If you're on another major provider, the same core design questions still apply.
- Decide where the long-lived decision record will live, who can access it, and how long it needs to be retained.
- Decide how sensitive customer data will be protected in prompts, logs, and downstream review workflows.
- Decide which decisions are important enough to justify stronger integrity controls and fuller documentation.
Days 61-90: Build
- Implement the decision record in the workflow itself. For higher-risk decisions, consider requiring the record to be written before the process can continue.
- Create a simple tiering model so not every use case is governed the same way.
- Write the policy for failure handling, retention, review, and human override. Get governance approval.
- Build the crosswalk from your controls to SR 11-7, the FS AI RMF, and NIST AI RMF.
- Document your institution's position on the gray areas before someone asks under pressure.
This is a quarter of disciplined work. For many institutions, it becomes a foundation they can reuse across future AI deployments rather than reinventing controls one use case at a time.
The Window Is Open
Treasury has now given the industry a practical framework it did not have before. The interagency position remains that existing guidance generally applies. My expectation is that AI-specific questions in bank governance, audit, and examination contexts will become more structured from here, even though the FS AI RMF itself is technically voluntary.
Banks that build the decision record, the data-handling controls, the tiering model, and the SR 11-7 / FS AI RMF crosswalk early will likely have a cleaner answer when those questions come.
This is less about a single hard architecture than about operating discipline. The components exist. Azure has assembled them in a way that I think is practical for many banks. The harder part is usually not technology. It is deciding what your institution wants to be able to prove later.
Instead of starting with the observability vendor list, start with the audit-trail requirements.
That is the foundation everything else stands on.
Corey Young is EVP of Fintech Banking at Commercial Bank of California and former CEO of Agile Financial Systems. He has over two decades of experience in financial services, payment processing, and fintech.
Further Reading
Regulatory
- SR 11-7 — Federal Reserve Supervisory Letter on Model Risk Management
- FDIC FIL-22-2017 — Adoption of SR 11-7
- SR 21-8 — Model Risk Management for BSA/AML
- Treasury FS AI Risk Management Framework (February 2026)
- Treasury AI in Financial Services Report (December 2024)
- FFIEC IT Examination Handbook DAM Booklet (September 2024)
- GAO-25-107197: AI Use and Oversight in Financial Services
- ABA Statement for the Record on AI Innovation (December 2025)
- NIST AI Risk Management Framework Crosswalks
Industry Frameworks
- FINOS AI Governance Framework v2.0
- JPMorgan Chase: AI and Model Risk Governance
- BPI: Navigating Artificial Intelligence in Banking (April 2024)
Azure Components
Enjoyed this post?
Subscribe to get my latest insights on fintech, payments, and AI.