CASE STUDY
BankStatementParser — unified transaction intelligence for treasury
Problem
Corporate treasury teams receive bank statements in CAMT, PAIN.001, MT940, OFX, CSV, and scanned PDFs from dozens of banks. Each format carries different field semantics, encodings, and ambiguities. Most teams hand-build brittle per-bank parsers, blocking real-time cash forecasting, fraud detection, and audit-ready reconciliation.
What I built
An open-source Python toolkit that unifies every common bank statement format into a single, normalised transaction stream. Schema-validated CAMT / MT940 / PAIN parsers, OCR fallback for scanned PDFs, deterministic field mapping, and SR 11-7-grade audit evidence for every transformation step.
| Signal | Evidence |
|---|---|
| Formats supported | CAMT (.052, .053, .054), MT940, OFX, CSV, scanned PDF (OCR) |
| Normalisation target | Single unified transaction record schema |
| Audit trail | Per-field provenance — source format + parser version logged per row |
| License | Apache-2.0 / MIT |
External validation
- Featured in the 2026-06-14 article: From Bank Statements to Unified Transaction Intelligence
- Designed to satisfy BCBS 239 risk-data aggregation requirements
Standards
- ISO 20022 CAMT
- SWIFT MT940
- OFX
- BCBS 239