Data Workflow
Overview
The application processes bank statements through a defined pipeline to categorize and visualize transactions.
Pipeline Steps (the_pipeline.py)
graph TD
A[Input Data] --> B[Normalize]
B --> C[Rules]
C --> D[ML Fallback]
D --> E[Review Enqueue]
E --> F[Result]
1. Normalize (step_normalize)
- Input: Raw transaction data (payee, description, etc.).
- Merchant Matching: Scans all text fields against the Merchant Catalog.
- Strategy: Uses
word,contains, orregexaliases to find the best match. - Output: A canonical
merchantname (e.g., "REWE 123" -> "Rewe").
2. Rules (step_rules)
- User Rules: Checks user-specific overrides.
- Merchant Defaults: If a merchant was found in step 1, applies default categories (e.g., Rewe -> Groceries).
- System Rules: Runs general rules (regex/text matching) if no user rule matched.
- Recurring Detection: Analyzes history to flag recurring payments (monthly/quarterly).
3. ML Fallback (step_ml)
- Optional: If no rule matched, a machine learning model predicts the category based on description and amount.
4. Review Enqueue (step_review)
- Transactions with low confidence or no category are flagged for manual review in the UI.
Development & Troubleshooting
- Adding Rules: Add YAML files to
data/rules/. - Testing: Use the "Inspect Rules" UI route to test patterns against existing data.
- Common Errors:
- Polars Schema Error: If
Analysis failed: could not append value...appears, it usually means a column has mixed data types. Check input CSVs. - Legacy Rules:
ValueError: Legacy rule formatmeanswhen/thensyntax is used. Update tomatch/set.
- Polars Schema Error: If