Sebastien Rousseau
Get in touch ›

Agentic Engineering for Banks: A 2026 Blueprint for the C-Suite and the Engineers Who Will Build It

Agentic AI has crossed from pilot into production. 70% of banks are using it; only one in five has a mature governance model. Adversaries are operating at machine speed, the legacy estate was written for the batch-processing assumptions of the 1960s, and the EU AI Act's high-risk deadline is twelve weeks away.

34 min read

Agentic Engineering for Banks: A 2026 Blueprint for the C-Suite and the Engineers Who Will Build It

Agentic AI has crossed from pilot into production across global banking. Seventy per cent of institutions are using it to some degree; only one in five has a mature governance model. Meanwhile, autonomous adversaries are operating at machine speed, the legacy COBOL estate the new systems must interoperate with was written for the batch-processing assumptions of the 1960s, and the EU AI Act's high-risk deadline is twelve weeks away. This is the engineering and governance position a bank needs to hold.


Key Takeaways

  • The transition from vibe coding to spec-driven development is no longer aspirational. Andrej Karpathy, who coined "vibe coding" in February 2025, acknowledged a year later ⧉ that the era is ending and that the new default for professionals is agentic engineering — orchestrating agents against detailed specifications with human oversight.
  • Banking adoption is genuine and accelerating. 70% of banking firms ⧉ report using agentic AI to some degree (16% in production, 52% in pilot, EY 2026); 44% of finance teams will use it this year — a 600%+ year-over-year increase per Wolters Kluwer.
  • Governance has not kept pace. Deloitte's State of AI 2026 finds only one in five companies has a mature governance model for autonomous AI agents. Deloitte's analysis of the MIT AI Risk Database identifies more than 350 risks ⧉ that can arise from autonomous or agentic behaviour.
  • The threat landscape has industrialised. Anthropic disclosed in November 2025 that Chinese state-sponsored group GTG-1002 hijacked Claude Code to run autonomous espionage against approximately 30 targets, with the AI handling 80–90% of tactical operations independently. Flashpoint observed a 1,500% rise in AI-related illicit discussions ⧉ between November and December 2025 alone.
  • The legacy estate is the silent constraint. Financial-services IT budgets are 70–75% consumed by legacy maintenance, 63% of banks still rely on code written before 2000, and most banks report only one or two people in-house who can maintain the COBOL their core platforms run on. Agentic AI is now the dominant approach to closing that gap.
  • The regulatory stack is converging. Under the EU AI Act, 2 August 2026 triggers full enforceability for high-risk AI systems (Annex III explicitly includes credit scoring and creditworthiness assessment). DORA is already in force. SR 11-7 has been extended in regulator practice to cover LLMs and agentic systems. The fines for breach reach €35 million or 7% of global annual turnover.
  • Human oversight is not a single concept. The distinction between HITL (Human-in-the-Loop, where the agent cannot execute without explicit human approval) and HOTL (Human-on-the-Loop, where the agent executes autonomously under human monitoring) is now the working framework for EU AI Act Article 14 compliance, and every high-risk agent needs an explicit position on which model applies.
  • Most agents will be bought, not built. Third-party risk management under DORA is the loudest under-recognised challenge of 2026. Vendors will supply most of the agentic capability banks deploy; the regulatory obligation remains with the bank, and most existing vendor contracts cannot satisfy Article 13 documentation requirements.
  • Agentic engineering is not "ChatGPT plus MCP servers." It is a structural ownership position over the end-to-end flows of the institution — customer journeys, transaction lifecycles, control plane, audit substrate, quantum-safe cryptographic foundation — built and operated by the institution's own engineering function, not delegated to a chatbot.

The Year Agentic Engineering Became Inescapable #

The conversation about AI in financial services has, until very recently, been dominated by two adjacent but distinct things: generative chat interfaces (helpful but bounded), and Retrieval-Augmented Generation patterns layered onto enterprise data (useful, also bounded). What changed between late 2025 and early 2026 is that the third category — autonomous agents that plan, execute, and complete multi-step workflows with limited human supervision — moved from technical demonstration into operational reality, and crossed simultaneously into both the enterprise and the threat actor.

Andrej Karpathy, who coined the term "vibe coding" in February 2025 ⧉, spent the following year watching professional engineers move beyond it. His revision — "agentic engineering" — is now the working term across the industry. The substance of the shift is straightforward: in serious software work in 2026, engineers are not writing the code directly 99% of the time. They are orchestrating agents that do, while acting as oversight. The work is no longer typing characters into an editor; it is producing specifications that constrain what the agents can generate, designing the verification gates the output must pass, and curating the architectural decisions the agents implement.

This shift sounds like an engineering-team conversation. In banking it is not. It is a board-level conversation, because the same agentic capability that is rewriting how internal code is produced is also rewriting how external adversaries operate, how regulators expect oversight to be exercised, and how the institutional perimeter is defined. A bank that does not own its position on agentic engineering by the end of 2026 is not a bank that has avoided the question. It is a bank whose vendors, adversaries, and regulators have answered the question for it.

The State of Adoption in Banking #

The aggregate picture is unambiguous. According to research compiled across multiple 2026 surveys, 70% of banking executives ⧉ report their firms are already using agentic AI to some degree. Gartner projects ⧉ that by the end of 2026 approximately 40% of all financial services firms will run AI agents in some form. Financial services AI spending is on track to reach $67 billion by 2028 (IDC). McKinsey estimates agentic AI can return 10–12 hours per week to relationship managers in banking.

The execution picture is less encouraging. KPMG reports ⧉ that 99% of companies plan to put autonomous agents into production but only 11% have done so. EY finds 34% of leaders have started using AI agents and only 14% have fully implemented them. Forrester finds that 57% of organisations believe they lack the internal capabilities to take advantage of agentic AI. The gap between intent and execution is not a marketing artefact. It is a real reflection of the engineering, governance, and cultural work that has not yet been done.

The UK's Financial Conduct Authority has publicly raised concerns ⧉ about the speed of deployment outpacing the maturity of governance — a tension the FCA's Chief Data Officer Jessica Rasu has framed as a near-term retail-consumer risk. McKinsey separately warned that banks that fail to adapt their business models ⧉ risk eroding up to $170 billion in global profits by 2030. Both observations are correct simultaneously. The question is not whether to move; it is how to move with the operational and governance integrity that financial-services regulation has always demanded, and that agentic systems make sharper.

Three Risk Vectors Banks Must Internalise #

Before any architectural conversation, the board's attention should rest on three risks that are specific to agentic systems and that arrive sooner than most banks have planned for.

1. The Autonomous Adversary #

The most disorienting development of 2026 is the operationalisation of agentic AI on the attack side. In August 2025, Anthropic disclosed a category of activity it called vibe hacking: cybercriminals using agentic AI to perform sophisticated attacks at scale, with the AI embedded across reconnaissance, credential harvesting, network penetration, and stolen-data analysis. In November 2025 ⧉, Anthropic disclosed that it had disrupted a campaign by a Chinese state-sponsored group (designated GTG-1002) that hijacked Claude Code instances to run autonomous espionage against roughly thirty defence, energy, and technology targets, with the AI handling 80–90% of tactical operations and operating at thousands of requests per second — speeds impossible for human operators.

In January 2026, Step Finance — a Solana-based DeFi portfolio manager — was compromised in a way that turned a device intrusion into a $27–30 million loss because the firm's AI trading agents had permissions to execute large transfers without human approval. The attacker socially engineered the AI itself, claiming to be running an authorised bug bounty programme. The lesson ⧉ was not that AI was inherently unsafe; it was that an AI agent that accepts claimed authorisation without verification is a perimeter weakness.

The aggregate trend is what banks have to internalise. Flashpoint's 2026 Global Threat Intelligence Report identified a 1,500% rise in AI-related illicit discussions between November and December 2025, with attackers actively developing autonomous systems that scrape data, rotate infrastructure, adjust messaging, and learn from failed attempts without continuous human oversight. JPMorgan's Jamie Dimon has been publicly explicit ⧉ that the initial advantage in this technology goes to offence, not defence. The implication is uncomfortable: a bank running classical security operations against agentic adversaries is, structurally, in the position of a chess player whose opponent has been given a computer.

2. The Code-Quality Regression #

The second vector is internal and quieter. LLM-generated code, in the absence of specification discipline and rigorous verification, ships with defects at a rate substantially higher than human-written code. A SonarQube analysis of five frontier LLMs ⧉ generating Java code found that over 70% of detected vulnerabilities in Llama 3.2 90B output were rated BLOCKER severity, with roughly two-thirds of GPT-4o's and OpenCoder-8B's vulnerabilities rated BLOCKER or CRITICAL. Pearce et al. (IEEE S&P) found approximately 40% of LLM-generated programs in security-sensitive contexts contained vulnerabilities. Yan et al. (2025) put the range at 9.8–42.1% across their benchmarks. A separate catalogue from Fu et al. identified 43 CWEs across three AI code-generation tools.

For a non-regulated industry, this is a productivity tax. For a bank, it is a regulatory and operational risk that compounds. Code that ships with a high vulnerability rate into a system handling payments, settlement, or customer data is not a code-quality issue in the abstract; it is the surface that GTG-1002-class adversaries will be probing in 2027 with the same agentic tools that produced it. The defence is not to ban LLM-generated code (commercially impossible) but to surround it with the verification and specification infrastructure that ensures defects surface before deployment. This is the practical reason spec-driven development is being adopted at speed by enterprise engineering organisations that are not natively technology firms.

3. The Legacy Anchor #

The third vector is the one banks already understand best, and that the agentic transition has made simultaneously more urgent and more tractable. More than 70% of Fortune 500 companies still rely on mainframes, the Computer Weekly analysis notes ⧉, often built on decades of interwoven COBOL and RPG with custom business logic. In financial services specifically, legacy technologies consume 70–75% of annual IT spending. A CIO study cited in 2026 industry analysis found that 63% of banks still rely on code written before 2000, and more than 75% reported having only one or two people in-house with the skills to maintain it.

What changed in February 2026 was the arrival of credible agentic tooling for legacy modernisation. Anthropic's announcement that Claude Code could map COBOL dependencies, document workflows, and identify risks ⧉ that human analysts would take months to surface — paired with similar capabilities from Microsoft (GitHub Copilot for COBOL, Watsonx Code Assistant) and AWS (Mainframe Modernization with agentic AI) — has compressed the modernisation cost curve materially. The reaction in IBM's share price (a 13% drop on the day of the announcement) was an inelegant but accurate market signal. AI now accounts for roughly one-third of enterprise modernisation investment, and more than 75% of enterprises are using AI in their modernisation strategy. The legacy anchor is, for the first time, a tractable engineering problem rather than a generational one.

Why Vibe Coding Cannot Be the Default in Banking #

It is worth being precise about why vibe coding — short prompt, observe output, iterate — fails as a default workflow in a regulated estate. The failure mode is not the obvious one (the LLM occasionally hallucinates). The failure mode is structural and shows up in four places simultaneously.

The first is lack of shared conventions. Multiple engineers working through chat prompts will produce five different ways to do the same thing in the same codebase within a single quarter. In a non-regulated context, this is technical debt. In a regulated context, this is the surface that breaks under examination.

The second is context decay. AI agents are stateless. On a large project, conversations exceed context windows, and the reasoning behind earlier architectural decisions evaporates. The same agent, two weeks later, will make the inverse decision in a new chat because nothing persists the rationale of the first one. For systems that need an audit trail for regulators, this is structurally incompatible.

The third is invisible defect accumulation. The Pearce, Yan, and SonarQube findings cited above are not corner cases. They are the baseline rate at which LLMs generate vulnerable code in the absence of specification discipline and rigorous testing. A bank running vibe-coding workflows in production accumulates these defects at the same rate, without the surface visibility to know what has been shipped.

The fourth is the regulatory traceability problem. Article 12 of the EU AI Act requires automatic logging of inputs and outputs for high-risk AI systems. SR 11-7 requires documented model owner and validator roles, change management for model updates, and board reporting on AI model risk. DORA requires comprehensive ICT risk management with documented evidence. None of these obligations can be satisfied by a workflow whose primary artefact is a chat history that nobody persists.

The conclusion is not that LLMs are unsuitable for banking. The conclusion is that the workflow surrounding them must produce specifications, audit trails, and verification gates as first-class outputs rather than as afterthoughts. This is what spec-driven development is, operationally.

Spec-Driven Development in a Regulated Estate #

Spec-driven development (SDD) inverts the order of work. Instead of jumping into implementation and iterating with an agent, the team produces a specification first — architectural decisions, requirements, interface contracts, success criteria, security constraints — and the agent generates code that satisfies the specification. Verification is structured: the spec defines what the output must do, and a separate process (test generation, code review, formal verification where applicable) checks whether it has been done.

The practical tooling has consolidated in late 2025 and early 2026. GitHub's Spec Kit ⧉ (released late 2025) formalises intent before code generation. AWS embeds spec-first workflows directly into its Kiro IDE. JetBrains and Cursor have introduced planning modes that structure the AI interaction. Frameworks like BMAD (Breakthrough Method for Agile AI-Driven Development) push further with teams of specialised AI agents that mirror analyst, architect, developer, and QA roles across the SDLC. Constitutional SDD, formalised in an arXiv paper in February 2026, embeds explicit security constraints with CWE vulnerability mappings into the specification itself.

For a bank, the variant that matters is what Augment Code's analysis calls spec-anchored development — specifications come first, AI generates code constrained by them, and additional governance layers (constitutional constraints, supervision checkpoints, human approval gates) sit between generation and merge. This is the only variant that produces the audit trail Article 12 of the EU AI Act expects, the documented validator role SR 11-7 requires, and the change-management discipline DORA demands.

The investment required is real, but it is also tractable. The institutions doing this well have moved engineers' day-to-day from typing characters to producing two artefacts: a spec the agent will satisfy, and a verification harness the output must pass. The cognitive demand on the engineer is higher in some respects (clarity of intent matters more than ever) and lower in others (the mechanical work of writing boilerplate is gone). The institutions that have not yet made this shift are still operating in a mode where the LLM is a faster typist. That position is not survivable in a regulated estate beyond the next twelve months.

The Regulatory Stack That Now Applies #

The 2026 regulatory perimeter around AI in banking is no longer a checklist; it is a stack of overlapping obligations that need to be reasoned about together. The single most consequential date is 2 August 2026, when the EU AI Act's high-risk system obligations become fully enforceable ⧉. Annex III explicitly classifies credit scoring, creditworthiness assessment, risk assessment in life and health insurance, and the evaluation or classification of individuals' financial standing as high-risk. The obligations that flow from that classification include conformity assessments, quality management systems, risk management frameworks, technical documentation, EU database registration, robust data governance, human oversight, and cybersecurity protections. Penalties for breach of high-risk obligations reach €35 million or 7% of global annual turnover, whichever is higher.

Sitting alongside the AI Act:

Three Modes of AI-Assisted Development Compared #

Dimension Vibe Coding Spec-Driven Development Agentic Engineering
Primary input Short prompt Formal specification Specification + agent orchestration plan
Engineer's role Prompt iterator Specification author Orchestrator and verifier
Output discipline Direct code generation Code constrained by spec Multi-agent workflows producing code, tests, docs
Audit trail Chat history (not persisted) Spec + generated code + tests Spec + agent traces + verification artefacts
Defect rate (LLM-only) 10–40% vulnerability rate (literature baseline) Materially reduced by spec constraints Lowest with verification gates
Regulatory traceability Insufficient for high-risk AI Compatible with EU AI Act Article 12 Designed for Article 12 + SR 11-7 + DORA
Suitable for banking? No, for production Yes, with governance Yes, with mature governance
Capability ceiling Bounded by single-shot prompting Bounded by spec quality Bounded by orchestration quality

Source: Synthesis of Karpathy commentary (2026), Augment Code SDD analysis ⧉, CGI Spec-Driven Development analysis ⧉, and the academic literature on LLM code-generation vulnerability rates (Pearce et al., Yan et al., Fu et al., 2023–2025).

Building the Agentic Bank: An Architecture View #

The strategic position behind these workflows is what the C-suite needs to own explicitly. Agentic engineering in banking is not a developer-productivity initiative. It is an institutional capability that touches end-to-end customer journeys, the entire transaction lifecycle, and the cryptographic and audit substrate that underlies both. Four layers of that capability deserve direct executive attention, top-down:

Layer 4 — Agent Control Plane Governance, audit, kill switches, behavioural anomaly detection, human override. HITL and HOTL oversight configurations per agent class.

Layer 3 — Agentic Workflows Customer journeys, internal operations, development pipeline. Spec-driven by default for high-risk flows.

Layer 2 — Data & Model Layer AIBOM (AI Bill of Materials), model registry, retrieval substrate, prompt-template version control, fine-tune lineage.

Layer 1 — Quantum-Safe Foundation ML-KEM, ML-DSA, hybrid PKI, crypto-agility. The substrate every higher layer's integrity claims depend on.

Layer 1 — The Quantum-Safe Foundation. Every layer above this assumes the integrity of the cryptographic substrate. With the G7 roadmap, the NCSC three-phase plan, and BIS Project Leap all on public record, this is no longer a niche concern. Agentic systems whose audit trails are signed under classical ECDSA, or whose key-establishment depends on RSA or ECDH, will see their integrity claims expire alongside the cryptography. The institutions that get this right pull the post-quantum work upstream and treat ML-KEM, ML-DSA, and hybrid PKI as the substrate on which every higher layer's audit and integrity guarantees rest.

Layer 2 — The Data and Model Layer. This is where the AI Bill of Materials (AIBOM) lives. Analogous to the Cryptographic Bill of Materials used in post-quantum migration planning, the AIBOM is the inventory of every model, dataset, prompt template, retrieval index, fine-tune, and third-party AI dependency the institution operates. It is the artefact the EU AI Act Article 49 effectively requires, the inventory that SR 11-7 examinations now request, and the foundation of any credible governance posture. Most institutions do not have one. They will need one by August.

Layer 3 — Agentic Workflows. This is the layer most institutions are currently building, often without sufficient attention to layers 1, 2, and 4. The workflows themselves range from internal (code generation, regulatory document drafting, customer-service triage) to customer-facing (relationship-manager copilots, onboarding, KYC orchestration, transaction monitoring, FX optimisation) to fully autonomous (treasury operations, certain trading and risk-management functions where regulator tolerance permits). The strategic discipline at this layer is to treat it as systems engineering, not application development — orchestration patterns, escalation rules, human-in-the-loop gates, and audit emission are first-class design concerns.

Layer 4 — The Agent Control Plane. This is what Deloitte has characterised as the "agent control room" ⧉: the real-time auditing, action logging, behavioural anomaly detection, kill switches, and human override infrastructure that surrounds every agent in production. The Step Finance loss was not, technically, an AI failure. It was a control-plane failure: the agents had permissions they should not have had, and the behavioural anomaly that should have triggered a halt did not. The institutions that build the control plane first — before scaling agent deployment — are the ones that will not see Step-Finance-class incidents in 2027.

The relevant comparison for the C-suite is not "are we doing more AI than our competitors?" It is whether the institution owns all four layers, or whether one or more layers is being silently delegated to a vendor with no contractual ability to satisfy the EU AI Act's Article 13 documentation requirements. The latter is a position that looks fine until a regulator opens the question.

Human Oversight in Practice: HITL vs HOTL #

The single distinction inside Layer 4 that regulators are most focused on in 2026 is between two oversight models. Both are forms of human supervision; they differ in latency, scale, and the assumption the regulator is willing to grant about agent behaviour.

Human-in-the-Loop (HITL) is the model in which an agent cannot execute a consequential action without explicit human approval. The agent prepares the decision, presents it, and waits. A KYC remediation agent that flags an account for closure but cannot close it without a compliance officer's sign-off is HITL. The trade-off is operational: HITL is safer and produces an unambiguous Article 14 audit trail, but it does not scale to high-volume, low-latency workflows.

Human-on-the-Loop (HOTL) is the model in which an agent executes autonomously within bounded parameters, with humans monitoring telemetry in real time and retaining the authority to halt the agent at any point. A real-time fraud-screening agent that auto-blocks transactions matching specific risk patterns, with a human operations team watching the alert volume and intervening on anomalies, is HOTL. The trade-off is inverse: HOTL scales, but it relies on the agent's parameters being correctly set and on behavioural anomaly detection that catches drift before harm accumulates.

EU AI Act Article 14 does not prescribe HITL versus HOTL; it requires that human oversight be meaningful. The practical implication is that every high-risk agent the bank operates must have an explicit, documented position on which model applies, why, and what the escalation path is when the agent encounters situations outside its bounded parameters. Most banks running pilots in 2025 did not have this documentation. Most banks running production agents by August 2026 will need it.

The decision rule is not complex. For consequential, low-volume, irreversible actions — credit denial on a natural person, account closure, large-value wire authorisation, regulatory filing submission — HITL is the defensible default. For high-volume, reversible, parameter-bounded actions — transaction monitoring alerts, document classification, routine customer-service triage — HOTL is appropriate, provided the behavioural anomaly detection and kill-switch infrastructure is mature. Banks that treat every workflow as HITL will not capture the operational leverage of agentic systems. Banks that treat every workflow as HOTL will eventually have a Step Finance moment.

Buying vs Building: The Third-Party Agent Problem #

The 2026 reality that has crept up on most banks is that they will not, primarily, build agentic capability. They will buy it. The vendor landscape — Oracle's agentic banking platform launched in February 2026, IBM's Watsonx, Microsoft's Copilot suite, AWS Bedrock Agents, Salesforce Agentforce, ServiceNow's NowAssist, and the wave of fintech-specialist agent vendors — is moving faster than internal bank engineering can. The strategic consequence is that most of the agents operating inside a bank in 2027 will have been written by someone else, and the governance question is no longer "can we trust our agents?" but "can we trust the agents we have procured, and can we prove to a regulator that we can?"

This is the loudest under-recognised challenge under DORA. Articles 28–30 of the regulation make ICT third-party risk management an active supervisory area, with explicit requirements covering contractual provisions, ongoing monitoring, concentration-risk assessment, and exit strategies. The European Supervisory Authorities maintain a register of critical ICT third-party providers, with direct oversight powers over those designated as such. The new operational reality is that the AI vendors of 2026 — frontier model providers, agent-platform vendors, AI-enabled SaaS — are, increasingly, the ICT third parties DORA was written to cover.

For a bank in the buying position, three practical disciplines apply:

Demand the AIBOM from the vendor. Any agent product procured for use on high-risk workflows must come with a documented bill of materials covering the underlying models, training data provenance and limitations, fine-tunes applied, retrieval indices accessed, prompt-template versions, and dependency chain to downstream agent components. This is the artefact the bank will need to satisfy Article 13 documentation requirements under the EU AI Act. The bank cannot produce it retrospectively from a vendor that has not contractually committed to providing it.

Test the black box, not the brochure. Vendor procurement evaluations historically focus on feature comparison and reference-customer interviews. For agentic systems, that is not sufficient. The institution must conduct behavioural testing of the agent under conditions analogous to its intended production deployment — including adversarial probing for prompt injection, social-engineering resistance (the Step Finance vector), drift under data-distribution shifts, and the latency and failure modes of the kill-switch and override pathways. Most current vendor contracts do not permit this depth of testing without specific negotiation; that negotiation needs to happen before the contract is signed, not after.

Renegotiate contracts on Article 13 terms. Most existing AI vendor agreements include none of the documentation, audit rights, model-change notification, incident reporting, or sub-processor disclosure requirements that the EU AI Act and DORA together demand. The Regulativ analysis of UK firms ⧉ was explicit on this point: legal review of vendor agreements takes weeks, and most institutions cannot satisfy Article 13 for a model whose inner workings their vendor has never been contractually obliged to disclose. The regulatory obligation sits with the deployer, not the vendor. Procurement teams need to know that before the next renewal cycle, not after a regulatory enquiry.

The board-level summary is that the vendor relationship has moved from procurement to risk transfer — and the risk does not, in fact, transfer. The bank remains the deployer. The bank remains liable. The bank needs the contractual instruments and the testing discipline that make its liability tractable rather than merely formal.

What This Means by Bank Type #

The right response varies. The pattern below is a rough segmentation, not a prescription.

Tier-One Universal Banks #

Institutions with $1T+ balance sheets and global presence are simultaneously the most exposed (broadest regulatory perimeter, largest legacy estate, highest-value target for autonomous adversaries) and the best resourced. The strategic priority is to build the control plane first — Layer 4 of the architecture above — and to bring spec-driven development discipline into the internal engineering function before scaling agent deployment further. The competitive consequence of getting this right is meaningful; the consequence of getting it wrong is existential, given the penalty exposure under the EU AI Act and the operational exposure to GTG-1002-class threat patterns.

Mid-Tier and Regional Banks #

The competitive question for tier-two banks is sharper than for tier-one. They face the same regulatory perimeter without the same governance budget, the same threat surface without the same defensive resources, and a customer base that is increasingly comparing them to AI-native fintechs. The practical answer is to standardise hard on a small set of vetted vendors (with contracts that satisfy Article 13 documentation requirements), to invest in spec-driven development discipline rather than custom platform engineering, and to use agentic tooling to compress the COBOL modernisation timeline that has been a strategic anchor for two decades. The institutions that move early here will close, materially, the technology gap with tier-one banks for the first time in a generation.

Fintechs, PSPs, and Crypto-Adjacent Institutions #

The fintech and payments-institution segment has the inverse problem: agility is high, governance is often lower than peer banks, and the EU AI Act's penalty exposure is, for a mid-sized fintech, potentially existential. The strategic discipline is to treat AI governance as a product-readiness gate rather than a compliance overlay — building the AIBOM, the audit substrate, and the spec-driven workflows into the engineering culture from the outset rather than retrofitting them under regulatory pressure. For institutions whose payment infrastructure intersects with the November 2026 SWIFT CBPR+ structured-address deadline, the agentic-engineering investment is also the natural mechanism for industrialising the structured-address remediation work — the validation rules, the data-quality enforcement, and the CI-pipeline integration are precisely the patterns that spec-driven workflows make tractable.

Internal Engineering Functions #

For the engineers and researchers reading this, the working discipline that matters is the daily one. Move the centre of gravity of work from typing characters to producing specifications and verification harnesses. Treat agent traces, intermediate plans, and approval gates as first-class artefacts in your version control. Invest in tooling — Spec Kit, Kiro, Cursor's plan mode, Claude Code with project-level skill files — that makes the specification the durable artefact and the generated code the disposable one. The ergonomic shift is real. The professional payoff is that the discipline being adopted at the frontier is also the discipline that survives regulatory scrutiny.

The 12-Week Action Plan to August 2026 #

For the executive sponsor running an agentic-engineering programme between now and the EU AI Act enforcement date, the work compresses into a twelve-week sequence. The plan below is not exhaustive; it is the minimum a board should expect a credible programme to have completed by 2 August 2026.

Weeks 1–2 — Produce the AIBOM. Stand up the centralised inventory of every AI system, model, dataset, prompt template, retrieval index, fine-tune, and third-party AI dependency in production or under development. Map each entry to EU AI Act Annex III classification. The deliverable is a single source of truth that the CRO, CCO, CISO, and CTO can each query.

Weeks 3–4 — Classify oversight model per system. For every high-risk and consequential agent, document explicitly whether the oversight model is HITL or HOTL, the rationale, the escalation pathway, and the named human accountable under SM&CR (UK) or the equivalent national regime. Where the answer is unclear, default to HITL until the analysis is complete.

Weeks 5–6 — Build or harden the Agent Control Plane. Real-time action logging, behavioural anomaly detection, kill-switch and override pathways operational on every production agent. Where the control plane does not yet exist for a system, that system goes into restricted-deployment status until it does.

Weeks 7–8 — Vendor contract review. Legal and procurement walk every active AI vendor contract for Article 13 documentation rights, model-change notification, incident reporting, audit rights, and sub-processor disclosure. The output is a tiered list: compliant, remediation required, replacement required. Replacement decisions need to start now to have any chance of completing this year.

Weeks 9–10 — Dry-run the conformity assessment. For each high-risk system under Annex III, complete the conformity-assessment workflow as if a notified body were arriving the following week. This will surface gaps that look minor on paper and are operationally severe under examination. Fix what can be fixed; document the residual.

Weeks 11–12 — Pre-cutover validation and board sign-off. Final review of the AIBOM, the HITL/HOTL classifications, the control-plane evidence, the vendor remediation status, and the conformity-assessment outputs. Named senior-manager accountability confirmed. Board minute the position. Notify the regulator where the framework expects pre-notification.

The institutions that complete this twelve-week sequence will not have solved agentic engineering. They will have established the floor a credible programme requires. The institutions that have not started by the time this article is published are not, as the Regulativ analysis put the same point on the SWIFT side, uniquely negligent. They are the majority. The question every CCO, CRO, and CTO needs to answer in the next fortnight is whether the firm acts in May or scrambles in July.

Conclusion #

The harsh observation that has crystallised across the industry in the last six months is that the old ways of operating at enterprise scale are being surpassed not by a new technology but by a new working pattern. Agentic tools have revealed — sometimes in production, sometimes in incident reports — flaws and gaps in legacy estates that have been quietly compounding for years. The same tools have provided the resources to malicious actors that previously required state-actor backing. The same tools, used internally and with discipline, are the most credible path institutions have to closing the legacy gap, satisfying the August 2026 regulatory deadline, and reaching the operational tempo that customer expectations and competitive realities now require.

The institutions that own this position internally — who treat agentic engineering as a structural capability of the bank rather than a productivity overlay procured from a vendor — will spend the next two years compounding advantage. The institutions that do not will spend the next two years discovering, in incident reports and regulator findings, what they should have built. The choice between those two outcomes is a 2026 board-level decision, not a 2028 technology one.

For prior context on this site, the April 2026 piece on quantum thresholds covered the hardware trajectory that underpins Layer 1 of the architecture above, the May 2026 piece on post-quantum migration for corporate finance covered the cryptographic substrate in depth, the May 2026 analysis of the pacs.008 structured-address deadline covered the regulatory and engineering discipline that spec-driven validation makes tractable, and the Rust open-source work on KyberLib, pain001, and pacs008 sits in the broader effort to put production-grade primitives — quantum-safe, payment-compliant, audit-ready — into the hands of the engineering teams who will build the agentic bank. The connection across these pieces is not coincidental. It is the shape of the work the next two years require.

Questions? Answers.

What is the difference between generative AI, agentic AI, and agentic engineering?

Generative AI produces content in response to a prompt; it is reactive. Agentic AI pursues defined goals autonomously, accessing data, using tools, and taking actions across multi-step workflows without requiring a human prompt at every step. Agentic engineering — the term Karpathy adopted in 2026 ⧉ — is the working discipline of orchestrating agents against detailed specifications with human oversight. For banking, the distinction matters because the regulatory perimeter, the threat model, and the engineering discipline are different for each category. A chat interface and a fully autonomous trading agent are not in the same regulatory class, and treating them as if they are creates exposure at both ends.

Why is the EU AI Act August 2026 deadline so consequential for banks?

Annex III of the AI Act explicitly classifies several core banking AI use cases as high-risk: creditworthiness assessment and credit scoring of natural persons, risk assessment and pricing in life and health insurance, and the evaluation or classification of individuals' financial standing. From 2 August 2026, deployers of these systems must demonstrate compliance with quality management systems, risk management frameworks, technical documentation, conformity assessments, EU database registrations, robust data governance, human oversight, and cybersecurity protections. Article 12 requires automatic logging of inputs and outputs. Article 14 requires meaningful human oversight (HITL or HOTL, as appropriate to the system). Penalties for breach reach €35 million or 7% of global annual turnover. The work to satisfy these obligations is engineering work — not documentation work — and it is the practical reason that spec-driven discipline has accelerated through Q1 2026.

What is the practical difference between HITL and HOTL, and when should each apply?

HITL (Human-in-the-Loop) means the agent cannot execute consequential actions without explicit human approval. HOTL (Human-on-the-Loop) means the agent executes autonomously within bounded parameters, with humans monitoring telemetry and retaining authority to halt at any point. EU AI Act Article 14 requires that oversight be meaningful but does not prescribe which model. The decision rule is to apply HITL where the action is consequential, low-volume, and irreversible (credit denial, account closure, large-value wire authorisation, regulatory filing submission); and HOTL where the action is high-volume, reversible, and parameter-bounded (transaction monitoring alerts, document classification, routine customer-service triage). Both require the kill-switch and override infrastructure to be operational and tested; the difference is whether the human is upstream of execution (HITL) or alongside it (HOTL).

Most of our agents will come from vendors. How do we satisfy DORA and the EU AI Act for systems we did not build?

The regulatory obligation sits with the deployer, not the vendor. The practical answer is three-fold. First, demand a documented AIBOM from the vendor before signing — model lineage, training-data provenance, fine-tunes, prompt templates, retrieval indices, dependency chain. Second, conduct behavioural testing of the agent under conditions analogous to production, including adversarial probing for prompt injection and social-engineering resistance. Third, renegotiate vendor contracts to include Article 13 documentation rights, model-change notification, incident reporting, audit rights, and sub-processor disclosure — most existing contracts have none of these. DORA Articles 28–30 cover ICT third-party risk management and are the relevant regulatory anchor on the European side; FFIEC guidance is the equivalent on the US side. The work is meaningful; it cannot be deferred.

How worried should banks actually be about agentic adversaries?

The honest answer is that the threat is real and is operationally distinct from previous cyber threats. The November 2025 Anthropic disclosure of GTG-1002 is the canonical example: agentic AI handling 80–90% of tactical operations in a state-sponsored espionage campaign across approximately thirty defence, energy, and technology targets, operating at thousands of requests per second. The Step Finance incident in January 2026 — a $27–30 million loss driven by AI trading agents with over-permissioned authority — is the canonical example of how an internal AI deployment can become an attack surface. The Flashpoint 2026 GTIR observed a 1,500% rise in AI-related illicit discussions in a single month. These are not hypothetical scenarios; they are 2025–2026 incident-report material. Banks running classical defensive operations against agentic adversaries are, structurally, asymmetrically exposed, and the correct response is to build AI-on-AI defensive capability rather than to slow the agentic transition on the offensive side.

Is agentic AI just "ChatGPT plus MCP servers"?

No, and this is one of the most consequential misconceptions in the current market. A chat interface augmented with MCP servers is a useful pattern for retrieving and acting on data within a bounded session. Agentic engineering is a structural capability of the institution — the AIBOM, the agent control plane, the spec-driven development pipeline, the audit substrate, the quantum-safe cryptographic foundation, the orchestration patterns across end-to-end customer journeys. These are not features bought from a vendor; they are an institutional ownership position. Banks that treat the question as a procurement decision end up with shallow deployments that fail under examination. Banks that treat it as an engineering and governance ownership question end up with an asset that compounds.

What is the single most important thing a bank should be doing in the next twelve weeks?

Three things, sequenced. First, produce the AI Bill of Materials — the complete inventory of every AI system, model, dataset, prompt template, retrieval index, and third-party AI dependency in production or under development, with each entry classified against EU AI Act Annex III. The institution that cannot produce this when a regulator asks for it is the institution that will receive findings. Second, build the agent control plane for any AI system currently making or materially influencing customer-affecting decisions — audit logging, behavioural anomaly detection, human override, and kill switches as default infrastructure, not as a future roadmap item. Third, move the internal engineering culture from vibe coding to spec-driven development on the work that matters most — high-risk systems, regulated workflows, and the legacy modernisation pipeline. The first two are compliance work; the third is competitive work. The institutions that do all three will be in a materially stronger position than those that do one or none. The full twelve-week sequence is laid out in the action-plan section above.

References #

Last reviewed .