What I Got Wrong, What's Next¶

Closing post in the series. The honest ledger: things that took longer than planned, things the agent swarm got wrong, what is still open, and the gap between a reference implementation and a regulated production system.

Published 2026-08-04. Code references are pinned to commit cceaee4.

This is the eighth and final post in the series on building this UK Basel 3.1 RWA calculator. Posts 1–7 made the case for what the calculator does and how it works. This post is the ledger: what got built, what didn't, what nearly didn't, and what to expect of a reference implementation versus a regulated production system.

The series began with this claim: "This is a reference implementation — useful for understanding how the rules behave, not a regulated production system. The gap between the two matters; I'll come back to it in post 8." This is post 8. The gap is the substance of half this post.

Things that took longer than I planned¶

Four areas where the published changelog under-counts the effort that landed.

The CRM allocator. The credit risk mitigation processor went through three rewrites. The first version was a single-allocation pass: each piece of collateral is assigned to one exposure, full stop. That ignored counterparty-level pools (a parent guarantee covering many drawdowns) and facility-level pledges (a master collateral basket). The second version was multi-level: direct → facility-level pro-rata → counterparty-level pro-rata, with overcollateralisation thresholds (1.4x for non-financial RE/other, 1.25x for receivables, 1.0x for financial). The third version, shipped in 0.2.0, added pool-awareness for AIRB own-LGD anti-double-counting — the war story from post 6. Each rewrite was three to four weeks of work and produced numbers that looked correct under the previous test set. Each successive set of acceptance scenarios revealed why the previous version was structurally insufficient.

Rating inheritance. Resolving "the counterparty's effective rating" turned out to be two parallel resolutions, not one. Internal ratings (which carry a PD) and external ratings (which carry a CQS) inherit from different parent chains under different rules, with Art. 138 multi-rating selection layered on top of the external chain. The classifier gates IRB on internal_pd is not null, not on rating presence. Sovereigns with external CQS but no internal PD always land on SA even when F-IRB is permitted. The whole story took six weeks across engine/hierarchy.py, the classifier, and the cross-approach guarantor routing — most of which I underestimated as "should be a couple of joins."

The output floor's OF-ADJ. The headline max(U-TREA, x × S-TREA) formula from post 5 is straightforward. The OF-ADJ reconciliation between IRB provision treatment (EL shortfall to CET1, excess to T2 capped at 0.6% of IRB RWA) and SA provision treatment (general credit risk adjustments to T2 directly) was not. The first cut conflated GCRA and SCRA, which produced floor numbers that were quietly wrong on any portfolio with non-zero general provisions. Sorting out the GCRA / SCRA boundary against Reg (EU) 183/2014, the 1.25% S-TREA cap on GCRA, and the 12.5x own-funds-to-RWA conversion took longer than the rest of the floor implementation combined.

RGLA / PSE classification routing. Regional governments and public sector entities can be treated as sovereign-derived or institution-treated; the same counterparty therefore has two possible exposure classes. I knew about this in principle. I learned the consequences from a version 0.1.63 bug where rgla_institution and pse_institution counterparties carrying internal ratings were silently routed to SA regardless of IRB permissions, because the permission expressions were keyed on the SA exposure class while the IRB permission table only listed CGCB and INSTITUTION. The fix touched the classifier, the model-permissions resolver, the entity-type maps, and a substantial set of regression tests. The lesson — that "what class is this exposure?" is genuinely a two-answer question — would have been much cheaper to learn from the rule book than from the bug.

Things the agent swarm got wrong¶

The pre-commit gate from post 4 catches architectural drift. It does not catch regulatory misreading. Three categories where the gates passed, the test went green, and the result was wrong on a careful read of the regulation.

Reasonable-looking arithmetic on the wrong unit. The SME supporting factor evaluating per-counterparty instead of group-of-connected-clients (covered in post 6) is the cleanest example. The agent's hand-derivation showed the EUR 1.5m threshold being checked against the loan's EAD, the test asserted the resulting RWA, the implementation made the test pass, the gate ran clean. Nothing about that loop ever consulted Art. 4(1)(39) on connected clients. The bug was caught only when a later acceptance scenario combined a small loan with a large parent group and produced a number that disagreed with the spec doc the architect agent had cited.

Multi-rating "most recent wins." The hierarchy resolver's external rating logic for Art. 138 originally collapsed multiple ratings to "the most recent assessment" — a defensible default if you have not read Art. 138 carefully. The actual rule is per-agency dedup (most recent per ECAI), then 1-rating / 2-rating-higher-RW / ≥3-rating-second-best selection. The first implementation passed every acceptance scenario where each counterparty had ratings from at most one agency, which was every scenario at the time the implementation landed. The bug was caught when the next batch of scenarios introduced multi-agency ratings — and the agents that wrote those scenarios did not notice that the hand-derivation should have followed Art. 138 because the rule had not been salient on previous iterations.

Defaulted retail with non-financial collateral. The defaulted SA treatment was originally blending the base class RW with the provision-coverage RW pro-rata by the secured/unsecured split. For defaulted retail with RE collateral, this produced 75% (the base retail RW) instead of the Art. 127(1) 100%/150%. The agent that built the original implementation followed the structure of the non-defaulted SA path (which does blend collateral) and assumed defaulted should look the same. The reading of Art. 127(2) — that the unsecured portion is what the CRM method produces, and no secondary split inside the defaulted override is needed — required someone to notice that the assumed parallelism between defaulted and non-defaulted didn't survive contact with the specific text.

The pattern across all three: agents are good at applying regulatory text once the right text is identified, mediocre at deciding which text is the right one when two articles overlap or when a default assumption is unstated. The validation gate enforces grammatical regularity (no print(), no inline scalars, frozen bundles); it does not enforce regulatory adequacy. That part stayed with me. It was not displaceable.

What is still open¶

IMPLEMENTATION_PLAN.md lists roughly 35 open items in Tier 1 (calculation correctness) at the time of writing. A representative sample, by P-code:

P1.93 — FCSM Art. 222(4) SFT zero-haircut and Art. 222(6) same-currency cash / 0%-RW sovereign exception. Effort: M.
P1.95 — B31 SCRA grades for unrated institution guarantors (currently a flat 40% under B31; should be SCRA grades A→40%, B→75%, C→150%). Effort: M.
P1.97 / P1.117 — B31 slotting non-HVCRE and HVCRE short-maturity subgrade differentiation (Art. 153(5)(d)). The current code ignores is_short for B31 slotting entirely. Effort: S each.
P1.109 / P1.124 — Art. 237–238 maturity mismatch and ineligibility for unfunded credit protection (guarantees, CDS). Currently no maturity adjustment is applied. Effort: M.
P1.151 — Purchased receivables / dilution-risk LGD (Art. 161(1)(e)–(g)) — the F-IRB LGD lookup has no entries for purchased_receivables_senior, _subordinated, or dilution_risk; everything defaults to senior 45%. Effort: M.
P1.153 — Art. 155(3) PD/LGD equity approach. CRR-only — removed under Basel 3.1 — but firms reporting under CRR through 2026-12-31 still need it. No EquityApproach.PD_LGD enum, no Art. 165 floor table (0.09% / 0.40% / 1.25%), no equity M=5y handling. Effort: L.

Add to that the structural items I am most aware of:

The oracle suite has 3 of an audit-recommended ~50 exposures — the load-bearing claim of post 7. Closing the gap is a several-week roadmap item, not a quick patch.
Stress testing has scenario coverage but the surrounding workbook integration is partial.
COREP and Pillar 3 templates have unimplemented rows acknowledged as out-of-scope (the only remaining not implemented markers in the codebase are CCR rows in the COREP generator).
Performance has been benchmarked against ~10M-exposure portfolios but not optimised end-to-end.
DOCS_IMPLEMENTATION_PLAN.md still has open items, mostly small documentation gaps the doc agent has not gotten to yet.

This is not a small backlog. It is also not a hopeless one. The agent workflow throughput on /next-items 3 runs is roughly three closed Tier 1 items per build iteration when items are non-conflicting, which means the open backlog is closeable in a few months of consistent effort. Whether it gets closed before PS1/26 goes live on 1 January 2027 is a question of priority, not feasibility.

The gap between reference and regulated¶

The series-opening claim was that this is a reference implementation, not a regulated production system, and that the gap matters. Here is what the gap actually contains.

The calculator is not validated against the PRA's SS1/23 Model Risk Management Principles. It has no independent validation function, no model inventory, no governance committee, no annual revalidation cycle, no signed-off model risk policy. A firm using this calculator inside a regulated capital-reporting process would owe the PRA every one of those things, and producing them is not engineering work — it is governance work that surrounds the engineering and outweighs it on cost.

The calculator is not under firm change control. Every commit lands when it lands; there is no sign-off path, no model committee, no segregation of duties between the developer and the reviewer (because there is one of me). Under SS1/23 expectations, model changes are tracked, justified, validated, and approved by named individuals with documented authority. None of that infrastructure exists here.

The calculator is not running on production data with full data-quality assurance. The acceptance suite covers ~500 scenarios with hand-derived inputs; real bank portfolios contain millions of exposures with combinations of attributes I have not enumerated. The test discipline gives statistical confidence about correctness on the scenarios tested, not absolute confidence on portfolios I have not seen. Real production deployment requires a parallel-run discipline against an existing system, signed-off reconciliation thresholds, and a rollback plan I have not built.

The calculator is not signed off by an external auditor. It has not been through a model validation engagement, an internal-audit review, or an independent quantitative testing effort. The hash-locked oracle suite from post 7 is the closest thing it has to independent validation; it is a useful start and it is not a substitute.

What it is useful for: a public reference for understanding how the rules behave; a comparator engine that a firm's in-house team can run alongside their own to triangulate disagreements; an educational artefact for engineers learning the regulation; and a demonstration that PS1/26 is implementable to roughly 5,300 tests of discipline by one person directing an agent pipeline. None of that is small. It is also not the same thing as a regulated production system. A firm that adopts this in earnest owes itself everything in the four paragraphs above.

What's next¶

Three things on the roadmap, before PS1/26 lands on 1 January 2027.

Closing the Tier 1 backlog. Roughly 35 items, the bulk of which are S-effort and addressable by the agent pipeline at current throughput. The blockers are not technical; they are the regulatory-judgement calls that gate each item — the decision of which interpretation to implement when the rule book is genuinely ambiguous. Those decisions cannot be delegated to the agents.

Expanding the oracle suite. Three exposures today, ten on the prioritised roadmap, ~50 on the audit's recommendation. Each addition is a four-file lockstep commit (markdown, derive script, JSON, test) and takes a measured day to a measured week depending on the calculation pathway. This is not work the agents can usefully accelerate — the derivations have to be auditable by a regulatory reader, which means a human writes them.

Documentation completeness. The Zensical site has comprehensive coverage of the architecture and major specifications, but DOCS_IMPLEMENTATION_PLAN.md still names gaps the doc agent has not closed. The agent runs in loop.sh docs_build mode in parallel with the main build; the doc backlog will close at roughly the same rate the calculation backlog does.

The framing question is not "can the calculator be made fully correct before PS1/26" — it can, with a few months of consistent effort. The framing question is whether the right ambition for an open-source reference implementation is full compliance or transparent partial compliance with a clear backlog and a credible path. I am increasingly convinced the second is more useful than the first. A reference implementation that closes its backlog in public, with each closed item carrying a regulatory citation and a hand-derived test, is more credible than a closed-source production system that says "trust us." The series has been an attempt to make that argument by demonstrating it.

Closing the series¶

Eight posts. About 22,000 words. One repository, one calculator, one solo developer with a Claude Code agent pipeline. The argument across the posts has been a single one in three parts:

Architecture is downstream of audit demands. Frozen bundles, structural protocols, lazy graphs, error accumulators, and a strict data/engine split are not stylistic preferences. They are what regulation does to engineering when you take the regulation seriously.
Regulation is harder than it looks because the routing is the work. SA appears simple because its lookup tables are short. The classification logic that decides which lookup table applies — exposure class, approach, default state, group-of-connected-clients aggregation, real-estate split, ECRA versus SCRA, defaulted-with-collateral routing — is where most of the regulatory judgement lives. SA exposes this; IRB hides it inside models. Both have it.
AI-assisted engineering at this scale is real, finite, and not autonomous. The agents do the typing. The architecture, the prompts, the validation gates, and the regulatory reading are the work that makes the typing produce correct code. Those parts stayed with me through every iteration of the build, and they are the parts that did not displace.

The calculator continues. The agent loop will run again tomorrow. The PS1/26 effective date is sixteen months away. The backlog gets shorter, then it doesn't, then it does. If you are reading this from a UK firm preparing for the same deadline, I hope some of what is here is useful. If you are reading it from somewhere else entirely, I hope at least one of the eight posts taught you something about regulation, software architecture, or how AI-assisted engineering looks when the rule book is unforgiving.

The repository is at github.com/OpenAfterHours/rwa_calculator. The series ends here.

The series so far:

Building a UK Basel 3.1 RWA Calculator in Public
The Pipeline: Why Regulation Forced an Immutable Design
Risk Weights Are Not a Lookup Table
Building With an Agent Swarm
The Output Floor and Why Basel 3.1 Bites
CRM, MOFs, and Other Edge-Case Archaeology
Testing a Regulatory Engine
What I Got Wrong, What's Next (this post)