Design Principles¶
This document explains the key architectural decisions and design principles underlying the RWA calculator.
Core Principles¶
1. Single Codebase, Dual Framework¶
Decision: Support both CRR and Basel 3.1 in a single codebase.
Rationale: - Avoids code duplication - Enables easy comparison between frameworks - Simplifies maintenance - Supports transition planning
Implementation: Regime is data. The user chooses a regime via the
.crr() / .basel_3_1() config factories (which set regime_id); each run
resolves a frozen, content-hashed ResolvedRulepack from
(regime_id, reporting_date), and regime-divergent values and behaviour are
read from that pack — the engine never branches on a regime boolean.
# Resolve the rulepack once per run from (regime_id, reporting_date)
pack = resolve(config.regime_id, config.reporting_date)
# Scaling factor and output-floor parameters come from the pack, not config
rwa = rwa * pack.scalar("scaling_factor") # 1.06 under CRR, 1.0 under Basel 3.1
if pack.feature("output_floor"):
rwa = apply_output_floor(rwa, sa_rwa, pack)
2. Pure LazyFrame Operations¶
Decision: Use Polars LazyFrames exclusively for all data operations.
Rationale: - Enables query optimization by Polars - Reduces memory usage through lazy evaluation - Allows automatic parallelization - Avoids row-by-row iteration (major performance gain)
Anti-pattern (Avoided):
# BAD: Row-by-row processing
for row in dataframe.iter_rows():
result = calculate_rwa(row)
results.append(result)
Pattern (Used):
# GOOD: Vectorized operation
result = df.with_columns(
rwa=pl.col("ead") * pl.col("risk_weight") * pl.col("factor")
)
3. Protocol-Based Interfaces¶
Decision: Define component interfaces using Python Protocols.
Rationale: - Enables dependency injection - Supports testing with mocks - Allows multiple implementations - Documents expected behavior
Implementation:
from typing import Protocol
class LoaderProtocol(Protocol):
def load(self, path: Path) -> RawDataBundle:
"""Load raw data from source."""
...
# Any class with matching signature satisfies protocol
class ParquetLoader:
def load(self, path: Path) -> RawDataBundle:
# Implementation
...
class MockLoader:
def load(self, path: Path) -> RawDataBundle:
# Test implementation
...
4. Immutable Data Contracts¶
Decision: All data transfer objects (bundles) are frozen dataclasses.
Rationale: - Prevents accidental mutation - Ensures thread safety - Supports lazy evaluation - Makes data flow predictable
Implementation:
from dataclasses import dataclass
import polars as pl
@dataclass(frozen=True)
class RawDataBundle:
counterparties: pl.LazyFrame
facilities: pl.LazyFrame
loans: pl.LazyFrame
collateral: pl.LazyFrame | None = None
guarantees: pl.LazyFrame | None = None
5. Error Accumulation¶
Decision: Accumulate errors rather than fail fast.
Rationale: - Reports all validation issues at once - Supports audit requirements - Allows partial results - Better user experience
Implementation: every stage either carries an errors list on its output
bundle (crm_errors, classification_errors, …) or accepts a caller-supplied
accumulator (the calculators' errors= keyword); the pipeline merges all of
them into AggregatedResultBundle.errors with their original codes.
# Stage bundle channel
bundle = processor.get_crm_unified_bundle(data, config)
for error in bundle.crm_errors:
logger.warning("%s: %s", error.exposure_reference, error.message)
# Branch-path accumulator channel
errors: list[CalculationError] = []
result = sa_calculator.calculate_branch(sa_rows, config, errors=errors)
6. Factory Methods for Configuration¶
Decision: Use factory methods rather than direct construction.
Rationale: - Self-documenting code - Encapsulates complex initialization - Ensures valid combinations - Easier to use correctly
Implementation:
The factories set the regime_id (the carrier of regime choice) plus firm
inputs / elections. They carry no regulatory values — the scaling factor,
PD/LGD floors, supporting factors and monetary thresholds all resolve from the
rulepack at read time.
class CalculationConfig:
@classmethod
def crr(cls, reporting_date: date, **kwargs) -> "CalculationConfig":
"""Create CRR configuration with appropriate defaults."""
return cls(
regime_id="crr",
reporting_date=reporting_date,
# firm inputs + elections only — regulatory values live in the pack
**kwargs
)
@classmethod
def basel_3_1(cls, reporting_date: date, **kwargs) -> "CalculationConfig":
"""Create Basel 3.1 configuration with appropriate defaults."""
return cls(
regime_id="b31",
reporting_date=reporting_date,
# firm inputs + elections only — regulatory values live in the pack
**kwargs
)
7. Hierarchical Join Pattern¶
Decision: Use iterative joins for hierarchy resolution instead of Python dictionaries.
Rationale: - 50-100x performance improvement - Stays within Polars optimization - Handles deep hierarchies efficiently - Avoids Python GIL limitations
Anti-pattern (Avoided):
# BAD: Python dictionary lookup
parent_dict = {row['id']: row['parent'] for row in df}
results = [parent_dict.get(x) for x in ids]
Pattern (Used):
# GOOD: Polars join
result = (
df
.join(parent_df, left_on="parent_id", right_on="id")
.select(["id", "resolved_parent"])
)
Module Organization¶
Main Entry Point First¶
Decision: Place main entry points at the top of modules.
Rationale: - Reads like a book - Easy to find key functionality - Follows "newspaper" style - Better developer experience
Implementation:
# module.py
# Main entry point at top
def calculate_rwa(exposures: Bundle, config: Config) -> Result:
"""Calculate RWA for all exposures."""
validated = _validate_exposures(exposures)
enriched = _apply_crm(validated, config)
return _compute_rwa(enriched, config)
# Supporting functions below
def _validate_exposures(exposures: Bundle) -> Bundle:
...
def _apply_crm(exposures: Bundle, config: Config) -> Bundle:
...
def _compute_rwa(exposures: Bundle, config: Config) -> Result:
...
Clean Imports¶
Decision: Organize imports clearly with separation.
# Standard library
from dataclasses import dataclass
from datetime import date
from decimal import Decimal
# Third-party
import polars as pl
from polars_normal_stats import normal_cdf, normal_ppf
# Local - contracts first
from rwa_calc.contracts.bundles import ResultBundle
from rwa_calc.contracts.config import CalculationConfig
# Local - engine components
from rwa_calc.engine.irb.formulas import calculate_k
Error Handling Strategy¶
Validation Errors¶
Collected and reported (the PD floor is read from the resolved rulepack, not from config):
errors = []
pd_floor = pack.formula("pd_floors")["minimum"]
if pd < pd_floor:
errors.append(ValidationError(
field="pd",
message=f"PD {pd} below floor {pd_floor}"
))
Calculation Errors¶
Logged with context:
try:
k = calculate_k(pd, lgd, correlation)
except ValueError as e:
errors.append(CalculationError(
exposure_id=exposure.id,
stage="IRB",
message=str(e)
))
System Errors¶
Raised immediately:
Testing Philosophy¶
Test-Driven Development¶
- Write acceptance test (what should happen)
- Write unit tests (how it should work)
- Implement to pass tests
- Refactor with confidence
Test Organization¶
tests/
├── acceptance/ # End-to-end scenarios
│ ├── crr/ # CRR scenarios
│ └── basel31/ # Basel 3.1 scenarios
├── contracts/ # Interface compliance
├── unit/ # Component tests
│ ├── test_loader.py
│ ├── test_hierarchy.py
│ └── crr/ # Framework-specific
└── fixtures/ # Test data generation
Test Naming¶
def test_sa_corporate_rated_cqs2_returns_50_percent_risk_weight():
"""Clear description of what is being tested."""
...
def test_sme_factor_tiered_calculation_above_threshold():
"""Documents the specific scenario."""
...
Documentation Philosophy¶
Code as Documentation¶
Self-documenting names and types:
def calculate_maturity_adjustment(
pd: float,
effective_maturity_years: float
) -> float:
"""Calculate maturity adjustment factor for IRB."""
...
Regulatory References¶
Link to articles:
def calculate_sme_supporting_factor(
total_exposure: Decimal,
threshold: Decimal = EUR_2_5M
) -> Decimal:
"""
Calculate SME supporting factor per CRR Article 501.
The factor provides capital relief for SME exposures using
a tiered approach based on total exposure amount.
"""
...
Next Steps¶
- Pipeline Architecture - Detailed pipeline documentation
- Data Flow - How data moves through the system
- Component Overview - Individual components