২০২৬ সালে ব্যাংকের জন্য Agentic AI সূচক: Autonomy মাপা

TL;DR. ব্যাংকে agentic AI প্রস্তুতি মূল্যায়নের ছয়-মাত্রিক সূচক: autonomy স্তর, governance, নিয়ন্ত্রক প্রমাণ, অর্থনীতি, প্রস্তুতি ও বৈশ্বিক সামঞ্জস্য।

Points clés

এই সূচক কেন বিদ্যমান. Evident AI Index লক্ষ লক্ষ সর্বজনীনভাবে উপলব্ধ ডেটা পয়েন্ট ব্যবহার করে প্রতিভা (Talent), উদ্ভাবন (Innovation), নেতৃত্ব (Leadership) ও স্বচ্ছতা (Transparency)-র ভিত্তিতে 50টি বৈশ্বিক ব্যাংককে ক্রমানুসারে সাজায়। এটি…
2026 সালের এজেন্টিক এআই পরিপক্বতার পরিদৃশ্য. 2026 সালের Cambridge CCAF প্রতিবেদন — আর্থিক পরিষেবায় এআই-এর বৃহত্তম বৈশ্বিক গবেষণা, যা BIS, IMF, WEF ও বিশ্বব্যাংকের সঙ্গে অংশীদারিত্বে 151টি এখতিয়ারের 628টি প্রতিষ্ঠানকে অন্তর্ভুক্ত করে — এই সূচকের পরিসংখ্যানগত…
ছয়-মাত্রিক সূচক স্থাপত্য. এই সূচক ছয়টি মাত্রায় এজেন্টিক এআই প্রস্তুতি স্কোর করে। প্রতিটি মাত্রার একটি চার-স্তরের পরিপক্বতা স্কেল রয়েছে। একটি ব্যাংকের সূচক স্কোর হলো নিয়ন্ত্রক বাস্তবতা অনুসারে ওজনকৃত এর মাত্রিক স্কোরগুলির গুণফল। ওজন…
যৌগিক সূচক স্কোর. The six dimensional scores combine into a composite index using the following regulatory-materiality weighting:.

ব্যাংকিংয়ে এজেন্টিক এআই পরীক্ষা-নিরীক্ষার পর্যায় পেরিয়ে পরিচালনামূলক পরিকাঠামোতে প্রবেশ করেছে। 2026 সালে প্রশ্নটি আর এটি মোতায়েন করা হবে কিনা তা নয় — 52% আর্থিক প্রতিষ্ঠান ইতিমধ্যেই করেছে — বরং শিল্প যা গড়ে তুলেছে তা কি মূলধন, ঋণ ও তারল্যের ক্ষেত্রে প্রয়োগ করা সমান কঠোরতার সঙ্গে পরিমাপ করতে পারে। এই সূচকই সেই পরিমাপ কাঠামো (Cambridge CCAF, 2026)।

নির্বাহী সারসংক্ষেপ / মূল বিষয়গুলি

স্বায়ত্তশাসনই নতুন মূলধন পর্যাপ্ততা। ঠিক যেমন ব্যাসেল (Basel) আর্থিক স্থিতিস্থাপকতার জন্য পরিমাপযোগ্য মান স্থাপন করেছিল, এই খাতের এখন স্বায়ত্তশাসিত সিদ্ধান্ত গ্রহণের জন্য একটি পরিমাপযোগ্য মান প্রয়োজন। এই সূচকই প্রথম আন্তঃমাত্রিক কাঠামো যা গভর্ন্যান্স, প্রযুক্তিগত স্থাপত্য, নিয়ন্ত্রক প্রমাণ, অর্থনৈতিক প্রতিদান এবং সাংগঠনিক পরিপক্বতাকে একটি একক পরিচালন মডেল হিসেবে নিয়ে এজেন্টিক এআই প্রস্তুতি স্কোর করে।

52% গ্রহণ একটি 14% রূপান্তর হারকে আড়াল করে। 151টি এখতিয়ারের 628টি প্রতিষ্ঠানের উপর Cambridge CCAF-এর 2026 সালের জরিপে দেখা যায় যে পাঁচটির মধ্যে চারটি আর্থিক প্রতিষ্ঠান এআই মোতায়েন করলেও, মাত্র 14% একে তাদের প্রতিযোগিতামূলক অবস্থানের রূপান্তরকারী হিসেবে বর্ণনা করে। ব্যবধানটি গভর্ন্যান্সের, প্রযুক্তির নয়।

66.3%-এ OSWorld হলো নির্ভরযোগ্যতার ঊর্ধ্বসীমা, নিম্নসীমা নয়। Stanford HAI-এর 2026 সালের বেঞ্চমার্ক দেখায় যে এআই এজেন্ট কাঠামোবদ্ধ এন্টারপ্রাইজ কাজের 66.3% সম্পন্ন করে (Stanford HAI, 2026)। সেই হারে পরপর তিনটি সংযুক্ত টুল কল মিলে 29% এন্ড-টু-এন্ড সাফল্যের হারে নেমে আসে। এই নির্ভরযোগ্যতার স্তরে সচল পেমেন্ট সিস্টেমের বিপরীতে তত্ত্বাবধানহীন নির্বাহ প্রতিরক্ষাযোগ্য নয়।

FSB কথা বলেছে। 10 June 2026-এ, ফাইন্যান্সিয়াল স্ট্যাবিলিটি বোর্ড (Financial Stability Board) আর্থিক পরিষেবায় এজেন্টিক এআই পরিচালনার জন্য তার প্রথম পরিচালনামূলক কাঠামো প্রকাশ করে (FSB, 2026) — 12টি সাউন্ড প্র্যাকটিস যা পরিচালনা পর্ষদের জবাবদিহিতা, জীবনচক্র ব্যবস্থাপনা এবং এআই-মনিটরিং-এআই স্থাপত্য নিয়ে গঠিত। মন্তব্য 22 July 2026-এ বন্ধ হয়।

EU AI Act প্রয়োগের ঘড়ি চলছে। Annex III-এর অধীনে উচ্চ-ঝুঁকির এআই সিস্টেমের বাধ্যবাধকতা 2 août 2026-এ কার্যকর হয় (EU AI Act guidance, 2026)। প্রতি-এজেন্ট অডিট-লগ পরিচয়, নথিভুক্ত প্রত্যাহার পদ্ধতি এবং পরিচালনা পর্ষদ-স্তরের প্রমাণ ছাড়া ইইউ এজেন্টিক এআই পরিচালনাকারী আর্থিক প্রতিষ্ঠানগুলি পিছিয়ে আছে।

JP Morgan একটি বছর উল্লেখ করেছে। প্রধান বিশ্লেষণ কর্মকর্তা ডেরেক ওয়ালড্রন 9 June 2026-এ CNBC-কে নিশ্চিত করেছেন যে ব্যাংকটি 2026 সালের মধ্যে দীর্ঘ-চলমান স্বায়ত্তশাসিত এজেন্ট মোতায়েন করবে (CNBC, 2026) — যা এক থেকে দুই ঘণ্টা স্বাধীনভাবে চলতে সক্ষম। সেই প্রকাশ এর বিপরীতে বেঞ্চমার্ক করা প্রতিটি প্রতিষ্ঠানের প্রতিযোগিতামূলক কাঠামো বদলে দেয়।

সূচকটি ছয়টি মাত্রা স্কোর করে। স্বায়ত্তশাসন স্তর, গভর্ন্যান্স স্থাপত্য, নিয়ন্ত্রক প্রমাণ, অর্থনৈতিক জবাবদিহিতা, সাংগঠনিক প্রস্তুতি এবং বৈশ্বিক নিয়ন্ত্রক সঙ্গতি। একত্রে এগুলি একটি এআই কর্মসূচিকে উদ্যোগের পোর্টফোলিও থেকে একটি পরিমাপযোগ্য সক্ষমতায় রূপান্তরিত করে।

এই সূচক কেন বিদ্যমান

Evident AI Index লক্ষ লক্ষ সর্বজনীনভাবে উপলব্ধ ডেটা পয়েন্ট ব্যবহার করে প্রতিভা (Talent), উদ্ভাবন (Innovation), নেতৃত্ব (Leadership) ও স্বচ্ছতা (Transparency)-র ভিত্তিতে 50টি বৈশ্বিক ব্যাংককে ক্রমানুসারে সাজায়। এটি আর্থিক পরিষেবায় এআই পরিপক্বতার সবচেয়ে বিশ্বস্ত বাহ্যিক বেঞ্চমার্ক। যা এটি করে না — পরিকল্পনাগতভাবেই — তা হলো সেই নির্দিষ্ট ইঞ্জিনিয়ারিং ও গভর্ন্যান্স স্থাপত্য স্কোর করা যা এজেন্টিক এআইকে সচল ব্যাংকিং API-এর বিপরীতে নিরাপদে মোতায়েনযোগ্য করে। Stanford AI Index গবেষণা আউটপুট, প্রযুক্তিগত কর্মক্ষমতা ও সামাজিক প্রভাব ট্র্যাক করে। যা এটি করে না তা হলো OSWorld কাজ-সমাপ্তির শতাংশকে একজন ট্রেজারার, একজন প্রধান ঝুঁকি কর্মকর্তা বা একটি মডেল যাচাইকরণ দলের জন্য একটি পরিচালনামূলক নির্দেশ-সেটে অনুবাদ করা।

এই সূচক সেই ব্যবধান পূরণ করে। এটি Stanford কাঠামোর পরিমাপযোগ্যতার শৃঙ্খলা, Evident Index-এর প্রতিযোগিতামূলক প্রেক্ষাপট এবং SR 11-7, SS1/23, EU AI Act, FSB সাউন্ড প্র্যাকটিস ও সিঙ্গাপুরের IMDA Model AI Governance Framework for Agentic AI-এর নিয়ন্ত্রক সুনির্দিষ্টতা গ্রহণ করে — এবং সেগুলিকে এমন একটি ছয়-মাত্রিক স্কোরিং মডেলে রূপান্তরিত করে যার উপর একটি পরিচালনা পর্ষদ কাজ করতে পারে।

বাস্তব প্রণোদনা হলো এজেন্টিক এআই একটি পরিকল্পনা আলোচনা থেকে একটি অডিট প্রশ্নে স্থানান্তরিত হয়েছে। যখন JP Morgan-এর প্রধান বিশ্লেষণ কর্মকর্তা একই-বছরে দীর্ঘ-চলমান স্বায়ত্তশাসিত এজেন্ট মোতায়েনের ঘোষণা দেন, যখন DBS ক্রেডিট মেমো প্রস্তুতি ও গ্রাহক সেবায় এজেন্ট কন্ট্রোল প্লেন তৈরি করে, যখন FSB নির্দেশ দেয় যে আর্থিক লেনদেন নির্বাহকারী এজেন্টের জন্য প্রয়োজন "একটি প্রান্তিক মানের উপরে মানবিক অনুমোদন বা দ্বৈত অনুমোদন, পেমেন্ট সিস্টেমে এজেন্টের সীমিত প্রবেশাধিকার এবং প্রতিটি এজেন্ট লেনদেনের অডিট ট্রেইল" — তখন যে প্রতিষ্ঠান নিজের অবস্থান স্কোর করতে পারে না, সে দেখবে একজন নিয়ন্ত্রক তা স্কোর করছেন।

2026 সালের এজেন্টিক এআই পরিপক্বতার পরিদৃশ্য

ডেটা যা দেখায়

2026 সালের Cambridge CCAF প্রতিবেদন — আর্থিক পরিষেবায় এআই-এর বৃহত্তম বৈশ্বিক গবেষণা, যা BIS, IMF, WEF ও বিশ্বব্যাংকের সঙ্গে অংশীদারিত্বে 151টি এখতিয়ারের 628টি প্রতিষ্ঠানকে অন্তর্ভুক্ত করে — এই সূচকের পরিসংখ্যানগত ভিত্তি প্রদান করে।

সংকেত	অনুসন্ধান	উৎস
সক্রিয় এআই গ্রহণ	81% আর্থিক প্রতিষ্ঠান কোনো না কোনো স্তরে এআই মোতায়েন করে	Cambridge CCAF
এজেন্টিক এআই গ্রহণ	52% ইতিমধ্যেই টেকসই বহু-ধাপের স্বায়ত্তশাসিত কর্মে সক্ষম এজেন্টিক সিস্টেম পাইলট বা মোতায়েন করছে	Cambridge CCAF
রূপান্তর হার	মাত্র 14% এআইকে তাদের প্রতিযোগিতামূলক সুবিধার পুনর্নির্ধারক হিসেবে বর্ণনা করে	Cambridge CCAF
পরিমাপের অসুবিধা	55% শিল্প ও 63% নিয়ন্ত্রক এআই মোতায়েনের মূল্য পরিমাপে হিমশিম খায়; বিশেষত 76% বড় আর্থিক প্রতিষ্ঠান	Cambridge CCAF
মুনাফা	মাত্র 40% এআই থেকে বর্ধিত মুনাফার কথা জানায়; 43% কোনো পরিবর্তন জানায় না	Cambridge CCAF
মানব তত্ত্বাবধানের ক্ষতি	51% মানব তত্ত্বাবধানের ক্ষতিকে শীর্ষ ঝুঁকি হিসেবে উল্লেখ করে	Cambridge CCAF
এজেন্টিক ব্যবহার ক্ষেত্র	Q1 2026-এ নতুন ব্যাংক এআই ব্যবহার ক্ষেত্রের 31% ছিল এজেন্টিক অ্যাপ্লিকেশন — রেকর্ড সর্বোচ্চ, Q4 2025-এর 15% থেকে বৃদ্ধি	Evident Insights
গভর্ন্যান্স ব্যবধান	2,000 প্রযুক্তি নেতার 77% বলেন এআই গ্রহণ গভর্ন্যান্স সক্ষমতাকে ছাড়িয়ে যাচ্ছে; 2025 সালে প্রতি এন্টারপ্রাইজে গড়ে 54টি এআই এজেন্ট ঘটনা	IBM
এজেন্ট বিস্তার	এন্টারপ্রাইজগুলি 2027 সালের মধ্যে গড়ে 1,661টি এআই এজেন্ট মোতায়েনের প্রত্যাশা করে; মাত্র 11% বলে তারা সম্পূর্ণ প্রস্তুত	IBM
McKinsey মুনাফা পুল ঝুঁকি	এজেন্টিক এআই ব্যাংকের পরিচালন ব্যয় 20% কমাতে পারে কিন্তু ব্যবসায়িক মডেল মানিয়ে না নিলে 2030 সালের মধ্যে বৈশ্বিক মুনাফা পুলে $170 বিলিয়ন পর্যন্ত ক্ষয় করার হুমকি দেয়	McKinsey

এই সংখ্যাগুলি সমস্যাটি সুনির্দিষ্টভাবে সংজ্ঞায়িত করে: গ্রহণ গভর্ন্যান্সের চেয়ে এগিয়ে, উৎপাদনশীলতার লাভ দৃশ্যমান, রূপান্তর বিরল, এবং পরিমাপের ব্যবধান সেখানেই সবচেয়ে প্রশস্ত যেখানে নিয়ন্ত্রক ঝুঁকি সর্বোচ্চ — বড় আর্থিক প্রতিষ্ঠানে।

প্রতিযোগীরা কোথায় সীমারেখা টানছে

Evident AI Index 2025 JP Morgan Chase-কে প্রথমে স্থান দিয়েছে (স্কোর: 79), এরপর Capital One (78.1), RBC (58.4), CommBank Australia (53.9) ও Morgan Stanley (52.2)। সূচকটি চারটি সক্ষমতা স্তম্ভ পরিমাপ করে — প্রতিভা, উদ্ভাবন, নেতৃত্ব, স্বচ্ছতা — পরিচালনামূলক এজেন্ট স্থাপত্য নয়। এটি একটি কাঠামোগত ব্যবধান তৈরি করে: একটি ব্যাংক উদ্ভাবন প্রকাশে উচ্চ স্কোর করতে পারে অথচ কোনো জরুরি বন্ধ সুইচ, কোনো WORM অডিট লগ এবং কোনো OPA পলিসি গেট ছাড়া এজেন্ট মোতায়েন করতে পারে। এই সূচক সেই ব্যবধানটি দৃশ্যমান করতে ডিজাইন করা হয়েছে।

Deloitte-এর 2026 Tech Trends জানায় যে মাত্র 11% প্রতিষ্ঠানের উৎপাদনে এজেন্টিক এআই রয়েছে। McKinsey দেখে যে প্রযুক্তিগত সক্ষমতা দ্রুত অগ্রসর হলেও মাত্র প্রায় এক-তৃতীয়াংশ প্রতিষ্ঠান এজেন্টিক এআই নিয়ন্ত্রণে তিন বা তার বেশি গভর্ন্যান্স পরিপক্বতা স্তরে পৌঁছায়। CCG Catalyst-এর জরিপ ডেটা দেখায় এআই-সম্পর্কিত ব্যয়ের 93% যায় প্রযুক্তি পরিকাঠামোতে এবং মাত্র 7% মানুষ, প্রতিভা, প্রশিক্ষণ, পরিবর্তন ব্যবস্থাপনা ও গভর্ন্যান্সে — একটি অনুপাত যা স্কেলিংকে কাঠামোগতভাবে অসম্ভব করে তোলে।

Q1 2026-এর জন্য Evident Venture Tracker Anthropic-কে সর্বাধিক উল্লেখিত বিক্রেতা হিসেবে চিহ্নিত করে, এবং একটি দীর্ঘ-পুচ্ছ বিশেষায়িত খেলোয়াড় সমস্ত মোতায়েনের 68% গঠন করে, যা মূলত ঋণ, অর্থ পাচার প্রতিরোধ ও ট্রেজারিতে কর্মপ্রবাহ-নির্দিষ্ট ব্যবহার ক্ষেত্রকে লক্ষ্য করে। সরবরাহের দিকটি পরিপক্ব। গভর্ন্যান্সের দিকটি নয়।

ছয়-মাত্রিক সূচক স্থাপত্য

এই সূচক ছয়টি মাত্রায় এজেন্টিক এআই প্রস্তুতি স্কোর করে। প্রতিটি মাত্রার একটি চার-স্তরের পরিপক্বতা স্কেল রয়েছে। একটি ব্যাংকের সূচক স্কোর হলো নিয়ন্ত্রক বাস্তবতা অনুসারে ওজনকৃত এর মাত্রিক স্কোরগুলির গুণফল। ওজন কাঠামোটি SR 11-7, SS1/23, EU AI Act Annex III বাধ্যবাধকতা এবং FSB সাউন্ড প্র্যাকটিস শ্রেণিবিন্যাসের সঙ্গে ক্যালিব্রেট করা।

মাত্রা ১: স্বায়ত্তশাসন স্তরের পরিধি

এটি যা পরিমাপ করে: প্রতিটি উৎপাদন এজেন্টিক কর্মপ্রবাহ একটি সংজ্ঞায়িত স্বায়ত্তশাসন সিঁড়িতে শ্রেণিবদ্ধ কিনা, নথিভুক্ত ব্যতিক্রম ছাড়া কোনো কর্মপ্রবাহ তার অনুমোদিত স্তরের উপরে পরিচালিত হচ্ছে কিনা — এবং সেই স্তর নির্ধারণ শুধু কাজের সীমানা নয় বরং আইনি জবাবদিহিতার সীমানাও সংজ্ঞায়িত করে কিনা।

স্বায়ত্তশাসন সিঁড়ি ভিত্তিগত নির্মাণ হিসেবে রয়ে গেছে। পাঁচটি স্তর — Level 0 (পর্যবেক্ষণ ও কেবল-পঠন) থেকে Level 4 (বাধ্যতামূলক চেকপয়েন্ট সহ বহু-টুল অর্কেস্ট্রেশন) পর্যন্ত — এজেন্টের অনুমতি সীমানা সংজ্ঞায়িত করে, মডেলের পরিশীলন নয়। একই অন্তর্নিহিত LLM যেকোনো স্তরে থাকতে পারে; পার্থক্য হলো র‍্যাপারে। Level 5 — চেকপয়েন্ট ছাড়া স্ব-অর্কেস্ট্রেটিং নির্বাহ — 2026 সালে উৎপাদন ব্যাংকিংয়ে থাকা উচিত নয়। 66.3% কাজ-সমাপ্তিতে OSWorld যৌগিক হয়: প্রতিটি 66%-এ পরপর তিনটি সংযুক্ত কল 29% এন্ড-টু-এন্ড সাফল্যের হার দেয়। পাঁচটি সংযোগ 13% দেয়।

সিঙ্গাপুরের IMDA Model AI Governance Framework for Agentic AI, যা 22 janvier 2026-এ দাভোসে স্বায়ত্তশাসিত এজেন্ট নিয়ে স্পষ্টভাবে কাজ করা বিশ্বের প্রথম গভর্ন্যান্স কাঠামো হিসেবে প্রকাশিত (IMDA, 2026), চারটি সমতুল্য ধারণা সংজ্ঞায়িত করে: প্রিন্সিপাল শ্রেণিবিন্যাস (কে এজেন্টকে নির্দেশ দিতে পারে), কাজের সীমানা (এজেন্ট কী করার জন্য অনুমোদিত), ন্যূনতম পদচিহ্ন (এজেন্টের তাৎক্ষণিক প্রয়োজনের বাইরে অনুমতি জমা করা উচিত নয়) এবং ব্যাখ্যাযোগ্যতা (যুক্তির পথ অবশ্যই অনুসরণযোগ্য হতে হবে)। এই চারটি সরাসরি স্বায়ত্তশাসন স্তর মডেলের সঙ্গে মেলে।

প্রিন্সিপাল-এজেন্ট সমস্যা ও অভিপ্রায়ের আইনি আরোপণ। IMDA কাঠামো এমন একটি মাত্রা উপস্থাপন করে যা বিশুদ্ধ ইঞ্জিনিয়ারিং স্পেসিফিকেশন কম গুরুত্ব দেয়: যখন একটি এআই এজেন্ট একটি কর্পোরেট সত্তার প্রক্সি হিসেবে কাজ করে — একটি পেমেন্ট নির্বাহ করে, একটি ঋণ সীমা সমন্বয় অনুমোদন করে, একটি নিয়ন্ত্রক ফাইলিং জমা দেয় — তখন এটি একটি আইনি অভিপ্রায়ের আরোপণ সমস্যা তৈরি করে। কার কর্তৃত্বে এজেন্ট কাজ করল? এজেন্ট যখন তার প্রম্পট সীমাবদ্ধতা থেকে বিচ্যুত হয় তখন দায় কে বহন করে? এজেন্ট যখন একটি অস্পষ্ট নির্দেশের দুটি বৈধ-কিন্তু-ভিন্ন ব্যাখ্যার মধ্যে নির্বাচন করে তখন কার অভিপ্রায় আরোপিত হয়?

Level 3 ও Level 4 কর্মপ্রবাহের জন্য — যেখানে এজেন্ট সংজ্ঞায়িত পরামিতির মধ্যে স্বায়ত্তশাসিতভাবে গুরুত্বপূর্ণ কর্ম নির্বাহ করে — স্তর সংজ্ঞাকে কেবল প্রযুক্তিগত কাজের সীমানা নয় বরং আইনি জবাবদিহিতার সীমানাও নির্দিষ্ট করতে হবে: একজন নামকৃত মানব প্রিন্সিপাল যিনি কর্মপ্রবাহটি অনুমোদন করেছেন, একটি নথিভুক্ত অর্পণ দলিল (পরিচালনা পর্ষদের সিদ্ধান্ত, কর্তৃত্বের অর্পণ, বা স্বাক্ষরিত ম্যান্ডেট), যে শর্তে এজেন্টের কর্ম প্রতিষ্ঠানকে আবদ্ধ করে, এবং যে শর্তে প্রম্পট সীমাবদ্ধতা থেকে বিচ্যুতি স্বয়ংক্রিয় প্রত্যাবর্তন, ঊর্ধ্বমুখীকরণ ও ঘটনা লগিং সক্রিয় করে। এটি ছাড়া, স্বায়ত্তশাসন স্তর শ্রেণিবিন্যাস একটি ইঞ্জিনিয়ারিং নিদর্শন মাত্র যা একটি আইনি চ্যালেঞ্জ, একটি নিয়ন্ত্রক পরীক্ষা, বা এমন একটি প্রতিপক্ষের সঙ্গে বিরোধে টিকবে না যার তহবিল স্থানান্তরিত হয়েছে কারণ একটি এজেন্ট একটি শর্তাধীন নির্দেশ ভুল ব্যাখ্যা করেছিল।

পরিপক্বতা স্তর	এটি দেখতে কেমন	সূচক স্কোর
Level 1 — অশ্রেণিবদ্ধ	কোনো আনুষ্ঠানিক শ্রেণিবিন্যাস নেই; এজেন্টদের অনানুষ্ঠানিকভাবে "সহকারী" বা "কো-পাইলট" হিসেবে বর্ণনা করা হয়; কোনো স্তর নথিপত্র নেই	0–24
Level 2 — শ্রেণিবদ্ধ, অযাচাইকৃত	স্তর লেবেল প্রয়োগ করা হয়; র‍্যাপার ঘোষিত স্তর প্রয়োগ করে এমন আনুষ্ঠানিক যাচাই নেই; Level 5 কর্মপ্রবাহ শনাক্তকরণ ছাড়াই থাকতে পারে	25–49
Level 3 — শ্রেণিবদ্ধ ও নিয়ন্ত্রিত	সমস্ত উৎপাদন কর্মপ্রবাহ Level 0–4 হিসেবে ট্যাগ করা; Level 5 চুক্তিগতভাবে নিষিদ্ধ; MRM পর্যালোচনার জন্য ত্রৈমাসিক স্তর-অডিট নিদর্শন উপলব্ধ	50–74
Level 4 — শ্রেণিবদ্ধ, নিয়ন্ত্রিত ও প্রমাণ-প্রস্তুত	সম্পূর্ণ স্তর রেজিস্টার; ধারাবাহিক ড্রিফট পর্যবেক্ষণ; যেকোনো স্তর পুনঃশ্রেণিবিন্যাস নতুন MRM যাচাই সক্রিয় করে; নিরীক্ষক চাহিদামতো যেকোনো কর্মপ্রবাহের স্তর নির্ধারণ পুনর্গঠন করতে পারেন	75–100

মাত্রা ২: গভর্ন্যান্স স্থাপত্য

What it measures: Whether the five-component agent control plane is fully engineered and operational in production — not described in a policy document.

The FSB June 2026 consultation explicitly states that existing governance frameworks were not designed for systems that "plan, take multi-step actions, and interact with external systems without step-by-step human oversight". The five-component control plane translates that observation into an engineering checklist:

Component 1: Identity and Permissions. Every agent maps to exactly one service account with OAuth client_credentials tokens scoped to the minimum API surface. The card-freeze agent's token can call POST /accounts/{id}/freeze with an amount ceiling; it cannot call anything in custody, treasury, or trading. Service-account secrets rotate on a defined cycle. Long-lived credentials are the most common control-plane failure in production deployments. The FSB explicitly recommends "least privilege to agents and their sub-agents, and dynamic identity and access management that grants, changes or revokes permissions in real time based on behaviour and context, rather than the static profiles used for human users".

Component 2: Deterministic Guardrails. Every LLM tool-call passes through a semantic router (NeMo Guardrails, LangChain Guardrails, or equivalent) before it reaches the production API. The router classifies intent against a finite allow-list and rejects calls outside that list. A JSON-schema validator then checks the payload. A pacs.008 with amount: 0 is a model failure, not a legitimate transaction. So is a wire to a country not pre-approved for the originating customer segment.

Component 3: Policy-as-Code. Open Policy Agent (or equivalent) sits between the validator and the API. Policies are versioned in Git; rejection decisions are logged; the same policy engine that gates microservice-to-microservice calls in the existing platform gates agent tool-calls. The EU AI Office's May 2026 guidance on Article 12 audit logging requires that log entries for high-risk AI systems attribute actions to a specific agent instance, not just a deployment or API credential. Multi-agent deployments sharing a credential fail this test.

Component 4: Audit Completeness. Immutable WORM storage — S3 Object Lock, Azure Blob immutability, or a ledgered database. Every invocation captures: timestamp, agent ID, service-account ID, system-prompt hash, retrieved context, LLM provider plus model plus version, raw LLM output, parsed tool-call, OPA decision, API response, downstream effect, and approver UID where applicable. Records are cryptographically signed at write time. The EU AI Act Article 12 clarification published May 2026 names per-agent identity as a specific gap; institutions running multiple agent instances sharing a credential are explicitly out of compliance.

Component 5: Kill Switch and AI-Monitoring-AI. A tested red-button API that cancels all in-flight agent invocations within a permission class in under 60 seconds. The word tested is load-bearing. An untested kill switch is a policy aspiration.

Beyond the kill switch, Dimension 2 at the highest maturity level must mandate AI-monitoring-AI (AMI) architecture — and the reason is arithmetic. IBM's data puts the average enterprise agent population at 1,661 by 2027 (IBM, 2026). The FSB explicitly accepts that continuous human monitoring of individual agent decisions becomes physically impossible at scale, and recommends supplementing human oversight with AI systems that alert humans when performance metrics are breached or agent behaviour drifts. A human compliance officer cannot monitor 1,661 concurrent agents executing decisions at machine speed. The control model that assumes they can will fail the first time an agent population undergoes a correlated behavioural shift — a model update silently changing output distributions across dozens of workflows simultaneously.

The AMI layer is not a replacement for human oversight; it is the detection mechanism that makes human oversight actionable at scale. Its three mandatory functions are: drift detection (statistical monitoring of output distribution across agents of the same tier and type, flagging deviations beyond a defined sigma threshold before a human could notice them); cross-agent correlation alerting (identifying when multiple agents begin executing in a directionally consistent pattern that was not present yesterday — the early signal of the herding dynamic described in Dimension 6); and anomaly pre-escalation (generating a structured alert, with context and reversibility assessment, to a human decision-maker before the kill switch is the only remaining option). The FSB explicitly recommends AMI architectures in Sound Practice 9. An institution that reaches Maturity Level 4 in Dimension 2 without an operational AMI layer is not at Level 4.

Maturity Level	What It Looks Like	Index Score
Level 1 — Ad hoc	Some components present but undocumented; no formal control-plane owner; no kill-switch test record	0–24
Level 2 — Documented	All five components documented; implementation gaps exist; kill switch exists but untested; WORM logs incomplete	25–49
Level 3 — Operational	All five components operational in production; kill switch tested quarterly; WORM logs complete for Level-3+ workflows; OPA policies version-controlled	50–74
Level 4 — Evidence-ready	Control plane generates continuous, cryptographically signed evidence; per-agent identity satisfies EU AI Act Article 12; kill-switch test results are audit artefacts; drift detection is automated	75–100

Dimension 3: Regulatory Evidence Completeness

What it measures: Whether the institution can produce a complete, per-workflow regulatory evidence package on demand for SR 11-7, SS1/23, EU AI Act, DORA, FSB, and applicable national frameworks.

The Federal Reserve has repeatedly clarified that SR 11-7 applies to any input-to-output decisioning system, regardless of whether the institution classifies the underlying LLM as a model. The PRA's SS1/23 is broader still. The EU AI Act's Annex III high-risk classification covers most financial-services LLM use cases — credit scoring, fraud detection, customer suitability, insurance pricing. Full compliance for EU-scope systems is required by 2 août 2026, with Germany, France, and the Netherlands confirmed for Q3 2026 supervisory reviews. The IOSCO Supervisory Toolkit for AI Use in Capital Markets, finalised 25 May 2026, covers the full AI lifecycle from traditional ML through GenAI and agentic AI — and explicitly identifies that planning capabilities, long-term memory, and external tool access create risks of emergent behaviour and cascading failures across interconnected systems.

The three-lines-of-defence model, applied to agents:

First line (model owner): Documents intended use, training and eval data lineage, system-prompt schema, tool-call allow-list, kill-switch test results. Owns drift monitoring in production. Owns the bank-specific held-out evaluation set — the work most institutions underinvest in.
Second line (MRM team): Validates the agent before production. The validation report covers vendor eval scores (MMLU, HumanEval — useful but not sufficient), bank-specific eval scores, prompt-injection red-team results, bias and fairness analysis, and a quantified residual-risk statement.
Third line (internal audit): Tests control-plane gates and audit-log completeness against a sample of production decisions. The 2027 audit cycle will look substantially different from 2025; budget accordingly.

The Singapore Model AI Governance Framework for Agentic AI (MGF) requires financial institutions to assess agents across four dimensions: bounding agent autonomy and access, establishing human accountability at defined checkpoints, implementing technical controls including baseline testing, and enabling end-user responsibility through transparency. MAS's mars 2026 AI Risk Management Toolkit — developed under Project MindForge with 24 institutions — represents the most operationally detailed national-level guidance available.

Maturity Level	What It Looks Like	Index Score
Level 1 — Compliance awareness	Regulatory obligations identified; no workflow-level evidence produced; SR 11-7 model cards absent or incomplete	0–24
Level 2 — Point-in-time validation	Pre-deployment validation completed; evidence exists at deployment date; no continuous monitoring; no per-workflow evidence cadence	25–49
Level 3 — Continuous evidence	Model cards maintained per workflow; continuous eval suites re-run weekly; EU AI Act Article 12 per-agent logging operational; FSB Sound Practice categories mapped to internal controls	50–74
Level 4 — Examiner-ready	Complete regulatory evidence package retrievable on demand per workflow; three-lines-of-defence validation records current; bank-specific eval suite catches model-update regressions faster than vendor release cycles; MAS MGF four-dimension mapping completed	75–100

Dimension 4: Economic Accountability

What it measures: Whether the institution measures agentic AI return using workflow-level unit economics rather than programme-level productivity claims.

McKinsey's analysis identifies that agentic AI could lower bank operational costs by 15–20% (McKinsey, 2026) — equivalent to 9–15% of operating profits — but that most of these gains will be competed away. The more durable competitive advantage is in institutions that build the measurement infrastructure to act faster than competitors when model and workflow improvements become available. The Cambridge CCAF finding that 76% of large financial institutions cannot measure the value of AI deployment is not a data-quality problem. It is an accountability-architecture problem: programmes are budgeted and reported at the portfolio level, making it impossible to trace value or failure to individual workflows.

The four unit-economic metrics that survive a CFO conversation:

Cost per completed decision, inclusive of the reversal and repair cost of failed decisions. A SAR-drafting agent that cuts BSA-officer time by 40% but generates 12% false-positive filings has destroyed value, not created it. This is the metric Deloitte's finding — that 93% of AI spending goes to infrastructure and only 7% to people and governance — makes unmeasurable: institutions cannot calculate the reversal cost of a governance failure they have not instrumented to detect.

Manual touches avoided, counted net of new touches created by control-plane oversight and exception handling. The point is not to minimise human attention; it is to redirect it to higher-leverage decisions.

Reversal rate — the percentage of agent-executed actions rolled back within 24 hours. A Level-3 workflow with a reversal rate above 2% is a reliability problem. Above 5% is a control-plane problem. This number should be tracked per workflow, not per programme. A portfolio average conceals the outlier that will generate the next audit finding.

Audit-trace completeness — the percentage of decisions with full provenance reconstructable from the WORM log. Should be 100% on Level-3 and Level-4 workflows. Anything less is a policy failure.

The agentic AI market in banking is growing at a rate that makes this measurement infrastructure urgent. Newgen's 2026 Banking Trends report forecasts the agentic AI market growing from $2.1 billion to $81 billion by 2034. McKinsey's scenario modelling indicates that the most likely outcome — a 30% probability scenario — involves AI agents achieving an agent-to-human ratio of approximately 20:1 and generating 15–20% cost reduction. Pioneers could open a gap of 4 percentage points of ROTE relative to slow movers. That margin is real, but it is only measurable and defensible if the unit economics are tracked at the workflow level.

Maturity Level	What It Looks Like	Index Score
Level 1 — Budget-level reporting	AI spend tracked; no workflow-level unit economics; productivity claims not validated against operational baselines	0–24
Level 2 — Aggregate metrics	Programme-level productivity and cost metrics available; reversal rate not tracked per workflow; CFO reporting relies on headcount avoided	25–49
Level 3 — Workflow-level tracking	Cost per completed decision tracked per workflow; reversal rate monitored; manual touches avoided calculated net of control-plane overhead	50–74
Level 4 — Full economic accountability	All four unit-economic metrics tracked per workflow; reversal rates above 2% trigger automatic workflow review; audit-trace completeness is a dashboard metric reported to the board quarterly	75–100

Dimension 5: Organisational Readiness

What it measures: Whether the institution has the talent, cross-functional governance, board-level reporting, and culture to deploy and sustain agentic AI at scale — not just to pilot it.

The Cambridge CCAF finding is precise: workforce preparedness is four times more predictive of AI profitability than technology procurement. Firms where the workforce is highly prepared report 23% AI profitability; firms where it is not report 6%. Only 10% of all firms describe their workforce as ready. Fintechs reach the transforming stage three times more often than traditional financial institutions — 19% versus 6% — despite many spending less than $10,000 annually on AI. The architecture is the differentiator, not the budget.

McKinsey describes three strategic postures for banks facing agentic AI: wait and see, adapt by becoming a product supplier behind agent interfaces, or compete to own the direct customer relationship. Most banks default to the first posture while representing themselves as pursuing the third. The strategic conversation has to be explicit, and the board is where it must land.

The FSB Sound Practice 1 directly addresses board accountability: boards bear ultimate responsibility for AI governance, setting risk appetite, and ensuring that accountability structures are clear. The EU AI Act Article 5 enforcement and DORA Article 5 board-liability provisions translate that principle into personal liability. IOSCO's May 2026 Supervisory Toolkit states that "AI systems are no longer isolated projects. They are core operational infrastructure requiring continuous validation, board-level governance, and supervisory evidence ready for inspection".

The board reporting framework for agentic AI should cover four numbers per workflow: autonomy tier, audit-trace completeness, reversal rate, and net cost per decision. Plus a top-five residual-risk list. Policy document slideware is not a substitute.

Maturity Level	What It Looks Like	Index Score
Level 1 — Awareness	Board aware of AI programme; no agent-specific governance; Chief AI Officer role absent; cross-functional governance committee not formed	0–24
Level 2 — Structure forming	Dedicated AI governance function established; accountability structure defined; risk appetite statement for AI drafted; workforce AI literacy programme nascent	25–49
Level 3 — Operational governance	Board receives quarterly agentic AI dashboard with per-workflow metrics; cross-functional model risk committee covers agents; workforce preparedness tracked against benchmarks; MRM bench scaled to validate 20+ agents per quarter	50–74
Level 4 — Governance as competitive advantage	Board evidence package satisfies FSB Sound Practices 1–4 and DORA Article 5 personal-liability requirements; MRM bench validates 50+ agents per quarter; culture of continuous governance improvement documented in annual report; institution responds to FSB consultation	75–100

Dimension 6: Global Regulatory Alignment

What it measures: Whether the institution's agentic AI operating model is aligned to the four major regulatory frameworks that apply in its principal operating jurisdictions — and whether that alignment is evidenced, not asserted.

The regulatory landscape for agentic AI has crystallised in the first half of 2026. Four frameworks are now operationally material:

United States (SR 11-7 / OCC Bulletin 2025-26). The Federal Reserve's model risk management guidance applies to any LLM-based decisioning workflow. The OCC has published specific model risk management guidance for community banks emphasising proportionality — "proportionate does not mean absent". The three-lines-of-defence model applies in full.

United Kingdom (PRA SS1/23 / FCA). The PRA's SS1/23 model-risk-management principles are broad enough to capture all LLM-based agents. UK supervisory authority is developing specific agentic AI expectations. The FCA is among the national authorities issuing supplementary guidance on AI governance in financial services.

European Union (EU AI Act / DORA). Annex III high-risk AI system obligations are in effect from 2 août 2026. Requirements include structured risk management (Article 9), data governance (Article 10), transparency (Article 13), human oversight (Article 14), and per-agent audit logging (Article 12). DORA Article 5 board-liability provisions apply to operational resilience including agentic AI. The EU AI Office's May 2026 guidance mandates per-agent cryptographic identity in audit logs. Non-compliance carries fines up to EUR 35 million or 7% of global turnover.

Asia-Pacific (MAS / IMDA / regional regulators). Singapore's IMDA published the world's first Model AI Governance Framework for Agentic AI at Davos on 22 janvier 2026. MAS published its AI Risk Management Toolkit in mars 2026 under Project MindForge, developed with 24 financial institutions. The framework covers scope and AI oversight, AI risk management, AI lifecycle management, and organisational enablers. MAS's proposed formal Guidelines on AI Risk Management are expected to be finalised in 2026, moving from voluntary FEAT principles to supervisory expectations with compliance implications. Australia's ASIC issued an open letter in May 2026 demanding cyber uplift in response to frontier AI threats.

FSB (Global, cross-jurisdictional). The FSB June 2026 consultation — the first global framework to treat agentic AI as operationally distinct — identifies six oversight models for agentic systems and recommends human-in-command for high-autonomy workflows, AI-in-the-loop monitoring as agent populations grow, and human approval or dual authorisation for agents executing financial transactions above threshold values. Comments close 22 July 2026; final report to G20 finance ministers in octobre 2026.

Maturity Level	What It Looks Like	Index Score
Level 1 — Jurisdictional inventory	Applicable frameworks identified per jurisdiction; no workflow-level mapping; "compliance by analogy" to pre-AI frameworks	0–24
Level 2 — Framework mapping	Each production agentic workflow mapped to applicable frameworks; gaps identified; remediation plans drafted	25–49
Level 3 — Evidenced compliance	Per-workflow evidence packages produced against applicable frameworks; EU AI Act Article 12 per-agent logging complete; FSB Sound Practices 5–10 mapped to internal controls; Singapore MGF four-dimension mapping completed	50–74
Level 4 — Proactive regulatory engagement	Institution participates in FSB, IOSCO, and national regulator consultations; regulatory intelligence integrated into agent deployment lifecycle; supervisory evidence generated automatically by operational pipelines, not assembled post-hoc	75–100

যৌগিক সূচক স্কোর

The six dimensional scores combine into a composite index using the following regulatory-materiality weighting:

Dimension	Weight	Rationale
Governance Architecture	25%	Highest weight: the control plane is the only thing that fails safely when the model fails
Regulatory Evidence Completeness	20%	Vital for the August 2 EU AI Act deadline and continuous supervisory readiness
Autonomy Tier Coverage	15%	Slightly reduced to reflect that tier classification, while foundational, is now a threshold expectation rather than a differentiator
Economic Accountability	15%	Critical for CFO/ROI alignment against McKinsey's profit-pool and ROTE-gap scenarios
Organisational Readiness	10%	Streamlined: structural governance is necessary but increasingly table-stakes at Tier 1 institutions
Global Regulatory Alignment	15%	Increased: must actively account for DORA third-party ICT concentration risk, cross-border agent execution, and systemic herding risk scoring

A composite score below 50 means the institution cannot defend its current agentic AI posture to an SR 11-7 examiner, a PRA on-site review, or an EU AI Act supervisory assessment. A score of 50–74 means controls exist but are not yet continuous or evidence-ready. A score of 75–100 means governance is a competitive asset, not a compliance cost.

অনুসরণ করার বর্তমান সংকেত

Signal	What It Means for Banks	Source
52% agentic AI adoption	Governance is overdue; institutions at scaling or transforming stages need a control plane, not another pilot	Cambridge CCAF
66.3% OSWorld task success	One-in-three failure rate on structured tool-use; unsupervised execution against customer-funds APIs is unsupportable	Stanford HAI
31% of new bank AI use cases are agentic	The fastest-growing category in Q1 2026; governance infrastructure is falling further behind deployment	Evident Insights
FSB June 2026 sound practices	First global framework treating agentic AI as operationally distinct; non-binding now, G20 deliverable octobre 2026	FSB
EU AI Act 2 août 2026 deadline	Full Annex III obligations in force; Germany, France, Netherlands supervisory reviews confirmed for Q3 2026	EU AI Office
JP Morgan long-running agents: 2026	Same-year deployment of 1–2 hour autonomous agents changes the competitive benchmark for every G-SIB and regional bank	CNBC
IBM: 1,661 agents by 2027	Enterprise agent sprawl is the governance challenge of 2027 if unaddressed in 2026; only 11% say they are prepared	IBM
Singapore MGF agentic AI: janvier 2026	World's first agentic-AI-specific governance framework; four concepts (principal hierarchy, task boundary, minimal footprint, explainability) apply universally	IMDA
IOSCO Supervisory Toolkit: May 2026	Full AI lifecycle coverage including agentic AI; emergent behaviour and cascading failure risks named explicitly	IOSCO
McKinsey: 4pp ROTE gap	AI pioneers could open a 4 percentage point ROTE advantage over laggards; the measurement infrastructure for capturing that gap is workflow-level unit economics	McKinsey

প্রতিষ্ঠানভেদে এর অর্থ

Global Systemically Important Banks (G-SIBs)

G-SIBs face the hardest governance challenge — not because the technology is more complex, but because scale and jurisdiction compound every gap. A G-SIB with 200 production agents across 30 business lines in 15 regulatory jurisdictions has 200 potential SR 11-7 findings, 200 potential EU AI Act audit-log failures, and 200 potential FSB Sound Practice gaps — simultaneously. The investment priority is not another pilot. It is the central control plane, the unified audit-log infrastructure, and an MRM bench capable of validating 50-plus agents per quarter.

JP Morgan's announcement of long-running autonomous agents in 2026 — DBS's agent control planes in credit memo preparation and customer servicing — BNP Paribas meeting its 2025 AI targets and beginning quarterly ROI reporting — these are the competitive data points against which every G-SIB board should be benchmarking. The institutional question is not whether to deploy; it is whether the control plane can scale at the same rate as the agent population.

The FSB explicitly warns against concentration risk from reliance on a few cloud, hardware, and foundation-model providers — and notes that shared models and data could push institutions towards correlated behaviour that amplifies herding and procyclicality in a downturn. G-SIBs that source 80% of their agentic infrastructure from two foundation-model vendors are building a systemic correlation they will have to explain to both their own risk teams and their supervisors.

Systemic Herding and Procyclicality: The Architectural Risk No Single Bank Can Solve Alone. The Evident Insights Q1 2026 use-case tracker identifies that 68% of bank agentic deployments now use a long-tail of specialised vendors — the majority of which are built on identical underlying frontier models, predominantly Anthropic's Claude. This creates a structural herding vulnerability that is materially different from the concentration risks banks already manage in cloud infrastructure or payment rails.

The mechanism is as follows. A bank's trading agent, liquidity agent, and credit-tightening agent are built on different vendor platforms. They have different system prompts, different tool-call schemas, different OPA policy gates. But they share an identical underlying model — the same weights, the same training distribution, the same emergent behavioural patterns under distributional stress. When a significant market event occurs — a sovereign credit event, a Fed communication that differs from consensus, a large-bank failure — every agent built on the same underlying model will process the event through the same implicit feature weightings. If those weightings produce a directional bias toward risk-off behaviour, multiple banks' trading, liquidity, and credit agents may execute correlated sell-offs, credit-tightening cycles, or liquidity withdrawals simultaneously — not because any individual bank's agent is malfunctioning, but because they are all functioning correctly on top of the same model.

IOSCO named this dynamic explicitly in the May 2026 Supervisory Toolkit, warning that planning capabilities, long-term memory, and external tool access create risks of emergent behaviours and cascading failures across interconnected systems. The FSB's June 2026 consultation addresses procyclicality directly — noting that if AI agents are trained on the same data and use similar models, their behaviour is likely to be correlated, potentially amplifying market movements.

Scoring systemic herding resilience in Dimension 6 requires three disclosures and one architectural control. The disclosures: what is the underlying foundation model for each production agentic workflow; what is the vendor dependency map across the agent portfolio; and what is the institution's assessment of its contribution to cross-institutional correlated behaviour under a defined stress scenario. The architectural control: at least one of the primary agents in high-risk asset classes (trading, liquidity management, credit) must use a different underlying model or a significantly different fine-tuned variant, so that a single model's distributional response to a stress event cannot produce a fully correlated outcome across all agentic workflows simultaneously. This is model diversity as systemic-risk management — the agentic equivalent of counterparty diversification.

Transaction and Corporate Banks

Highest-ROI agentic workflows are payment repair, KYC document extraction, treasury services, reconciliation breaks, and corporate client FAQ deflection. All Level-2 or bounded Level-3 under the autonomy ladder. The corporate client does not care that an agent executed the payment repair; they care that SLA improved and dispute rate stayed flat. Lead with the four unit-economic metrics, not with technology capability claims.

The Autonomous Treasury framework — observe → detect → forecast → prepare → request human approval → submit signed payload — is the right architecture for corporate treasury agents in 2026. The agent's prepared pain.001 payload routes through the same schema validation, fraud scoring, and sanctions engines as a corporate ERP submission. The conditionality layer (threshold, collateral eligibility, buffer floor) gates whether the pain.001 is sent, not what shape it takes. Treasury platforms that invent bespoke payloads to express conditions will fall out of the bank-consumable path.

Regional Banks and Community Banks

McKinsey's scenario analysis identifies three viable positions: wait and see, adapt as a product supplier behind agent interfaces, or compete for the direct customer relationship. Regional banks that fail to make this choice explicitly will drift into the wait-and-see posture by default — and find that the governance debt accumulated during that drift is the primary obstacle when competitive pressure forces action.

The OCC's proportionality principle — "proportionate does not mean absent" — is the operational frame for regional governance. A regional bank does not need to validate 50 agents per quarter. It needs one model risk officer who understands the autonomy ladder, one implementation of a vendor agent platform that ships with OAuth scoping, OPA integration, and WORM audit logging out of the box, and one board reporting template that covers the four unit-economic metrics. The investment is in workflow design and operator UX, not bespoke control-plane engineering.

CSI's 2026 Banking Priorities survey found that 85% of community banking respondents believe AI adoption will provide a significant competitive advantage and 50% named it the top technology trend for 2026. The governance infrastructure is what separates the 85% of believers from the small fraction that will capture the value.

Fintechs, PSPs, and Infrastructure Providers

The product question for agentic AI vendors in 2026 is not "does your platform perform better than humans?" It is "does your platform produce an SR 11-7-compliant audit trace, an EU AI Act Article 12-compliant per-agent log, and an FSB Sound Practice 10-compliant oversight model — out of the box?" Vendors who can answer that with a documented, testable yes will close enterprise deals. Vendors who cannot will cycle through proof-of-concept loops while bank MRM teams find reasons to fail validation.

Oracle launched an enterprise agentic AI platform for banking in février 2026. FIS partnered with Mastercard and Visa to enable agent-initiated commerce. Microsoft published a banking-specific blueprint for agentic customer experience. Accenture has outlined the workforce implications across front and back office. The supply side is ready. The differentiation is in regulatory evidence as a product feature, not a post-hoc compliance bolt-on.

The long-tail vendor dynamic identified by Evident — 68% of agentic AI deployments at banks now use specialised vendors beyond the hyperscalers — means third-party AI vendor risk is accelerating faster than most bank procurement frameworks can assess it. DORA requires documented due diligence on every ICT third-party provider. The EU AI Act layers additional requirements for vendors whose systems are used in high-risk categories. Banks that outsource governance to their vendor are outsourcing accountability — and the supervisory record will reflect that.

Enterprise and SME Businesses (Non-Bank Financial Services)

The governance burden is proportionate to the risk materiality of agentic AI use, but the measurement framework applies universally. An enterprise deploying agents in accounts payable, working capital optimisation, or financial planning and analysis needs the same unit-economic accountability framework — cost per completed decision, reversal rate, audit-trace completeness — even if the regulatory obligations are lighter than those on a systemically important bank. The FSB Sound Practices are framed as non-binding guidance applicable to financial institutions of all types and sizes. IBM's finding that enterprises average 54 AI agent incidents per year, including data breaches and cascading system failures, applies across the enterprise landscape.

For SMEs accessing banking services through agentic interfaces — the scenario McKinsey describes as consumers using AI agents as a new banking channel — the governance obligation falls upstream on the bank or PSP providing the agentic layer. But the SME's own data and operational integrity depends on that governance being real. Understanding the index score of the institutions managing your financial workflows is rapidly becoming a vendor-selection criterion.

বোর্ড-স্তরের স্কোরকার্ড

A useful board scorecard for agentic AI should track six metrics — the minimum set that distinguishes a governed programme from an ungoverned one:

Autonomy Tier Distribution: The count of production workflows by tier (Level 0–4), updated quarterly. Any Level-5 workflow is a reportable finding.
Control-Plane Completeness: The percentage of production workflows with all five control-plane components operational (identity, guardrails, policy-as-code, WORM logging, kill switch).
Audit-Trace Completeness: The percentage of Level-3+ workflow invocations with full provenance reconstructable from the immutable log. Target: 100%.
Reversal Rate by Workflow: The percentage of agent-executed actions rolled back within 24 hours, tracked per workflow. Alert threshold: 2%. Escalation threshold: 5%.
Net Cost per Decision: Workflow-level unit cost inclusive of reversal and repair costs, compared to the manual baseline. Tracked against the programme economics case.
Regulatory Evidence Currency: The date of the most recent per-workflow regulatory evidence update across applicable frameworks (SR 11-7, SS1/23, EU AI Act, MAS MGF). Any workflow more than 90 days out of evidence cadence is a risk finding.

These six numbers convert agentic AI from a slide deck into an operating model. They are also the numbers an SR 11-7 examiner, a PRA on-site reviewer, or an EU supervisory authority will ask for first.

এই সূচক যে ফাঁকগুলো পূরণ করে

Three structural gaps distinguish this index from existing frameworks:

Gap 1: Existing indexes measure AI maturity, not agentic-AI-specific governance. The Evident AI Index measures Talent, Innovation, Leadership, and Transparency across 50 banks using publicly available data. It does not — and is not designed to — assess whether a bank's production agentic workflows have operational kill switches, per-agent WORM audit logs, or OPA policy gates. A bank can rank first on the Evident Index while failing an EU AI Act Article 12 audit.

Gap 2: Existing regulatory frameworks address what is required, not how to score readiness. SR 11-7, SS1/23, the EU AI Act, the FSB Sound Practices, and the Singapore MGF each define governance obligations. None provides a cross-dimensional scoring framework that lets an institution benchmark its posture against peers or measure improvement over time. This index provides that scoring framework, using the existing regulatory frameworks as the evidence base.

Gap 3: Programme-level economics mask workflow-level failure. The industry standard of reporting AI value at the programme level — "AI saved X hours of compliance work" — makes it structurally impossible to trace a reversal, a false-positive SAR filing, or an unexplained agent action to the workflow that produced it. The unit-economic dimension of this index requires workflow-level accountability. This is the measurement architecture that makes a CFO conversation defensible and an audit conversation survivable.

Conclusion

Agentic AI in banks in 2026 is an engineering problem wearing the clothes of a strategy conversation. The model is interchangeable. The control plane — OAuth scoping, deterministic semantic routing, OPA policy gates, immutable WORM audit logs, and a tested kill switch — is not. The governance architecture — three-lines-of-defence validation, continuous bank-specific eval suites, board-level unit economics reporting — is not. The regulatory evidence package — per-workflow SR 11-7 model cards, EU AI Act Article 12 per-agent logs, FSB Sound Practice mappings — is not.

The institutions that will be credible to regulators in 2027 are the ones scoring above 75 across all six index dimensions today: classifying every production agent on the autonomy ladder, engineering the full five-component control plane, producing continuous regulatory evidence, tracking workflow-level unit economics, investing in organisational readiness, and engaging proactively with the FSB, IOSCO, and national regulator consultations that are shaping the binding standards of 2028.

OSWorld at 66.3% is the reliability ceiling. Three linked tool-calls at that rate produce a 29% end-to-end success rate. Plan accordingly. The institutions that measure agents the way they measure any other operational risk — by evidence, not aspiration — will find that governance is not the constraint on agentic AI. It is the only thing that makes agentic AI competitive.

প্রায়শই জিজ্ঞাসিত প্রশ্ন

What is the difference between this index and the Evident AI Index? The Evident AI Index benchmarks AI maturity across 50 global banks using publicly available data across Talent, Innovation, Leadership, and Transparency. This index scores the specific engineering and governance architecture — the control plane, the audit log, the autonomy tier classification, the regulatory evidence package — that makes agentic AI safe to deploy against live banking APIs. The two indexes are complementary: Evident measures the strategic posture; this index measures operational readiness.

Who should use this index? Chief Operating Officers, Chief Risk Officers, Chief AI Officers, heads of model risk management, and board risk committees at global banks, regional banks, corporate banking entities, and financial institutions deploying agentic AI. Also relevant for fintechs, PSPs, and infrastructure vendors selling into bank procurement processes where regulatory evidence is a selection criterion.

What is the minimum viable governance posture for 2026? Full five-component control plane operational in production; all production workflows classified Level 0–4; Level-5 workflows contractually prohibited; WORM audit logs complete for Level-3+ workflows; EU AI Act Article 12 per-agent logging in place before 2 août 2026; FSB Sound Practices 1–4 mapped to board accountability structures; bank-specific eval suite running continuously.

What does JP Morgan's announcement mean for my institution? It means the competitive benchmark for autonomous agent deployment has a named timeline in 2026 from a systemically important bank. It does not mean every institution should match that timeline. It means every institution should know its current index score, know the gap between that score and the deployment posture JP Morgan is describing, and have a board-approved view of the governance investment required to close that gap safely.

How should agentic AI risk be reported to the board? Six metrics per workflow: autonomy tier, control-plane completeness, audit-trace completeness, reversal rate, net cost per decision, and regulatory evidence currency. Plus a top-five residual-risk list. Skip the model-card slideware and the programme-level productivity summaries.

Does the FSB consultation create binding obligations now? No. The FSB explicitly states the 12 Sound Practices are not binding standards. However, the consultation closes 22 July 2026 and the final report goes to G20 finance ministers in octobre 2026. National regulators — the Fed, PRA, BaFin, DNB, ACPR, MAS — are free to incorporate the Sound Practices into binding supervisory expectations on their own timelines. The institutions that respond to the consultation now are the ones shaping what binding looks like.

References

সর্বশেষ পর্যালোচনা 2026-06-30।

সর্বশেষ পর্যালোচনা 2026-06-29.

এই নিবন্ধটি পুনঃপ্রকাশ করুন

২০২৬ সালে ব্যাংকের জন্য Agentic AI সূচক: Autonomy মাপা — Sebastien Rousseau

ব্যাংকে agentic AI প্রস্তুতি মূল্যায়নের ছয়-মাত্রিক সূচক: autonomy স্তর, governance, নিয়ন্ত্রক প্রমাণ, অর্থনীতি, প্রস্তুতি ও বৈশ্বিক সামঞ্জস্য।

এই নিবন্ধটি লাইসেন্স করা হয়েছে Creative Commons Attribution 4.0 International. পুনঃপ্রকাশনার জন্য মূল URL-এর কৃতিত্ব আবশ্যক।

২০২৬ সালে ব্যাংকের জন্য Agentic AI সূচক: Autonomy মাপা — Sebastien Rousseau

ব্যাংকে agentic AI প্রস্তুতি মূল্যায়নের ছয়-মাত্রিক সূচক: autonomy স্তর, governance, নিয়ন্ত্রক প্রমাণ, অর্থনীতি, প্রস্তুতি ও বৈশ্বিক সামঞ্জস্য।

Originally published at https://sebastienrousseau.com/bn/2026-06-30-agentic-ai-index-banks-measuring-autonomy-2026/ by Sebastien Rousseau.
Licensed under CC-BY-4.0.

SEBASTIEN ROUSSEAU FOUNDER · ENGINEER