Workflow

Data Workflow

Overview

The application processes bank statements through a defined pipeline to categorize and visualize transactions.

Pipeline Steps (`the_pipeline.py`)

graph TD
    A[Input Data] --> B[Normalize]
    B --> C[Rules]
    C --> D[ML Fallback]
    D --> E[Review Enqueue]
    E --> F[Result]

1. Normalize (`step_normalize`)

Input: Raw transaction data (payee, description, etc.).
Merchant Matching: Scans all text fields against the Merchant Catalog.
Strategy: Uses word, contains, or regex aliases to find the best match.
Output: A canonical merchant name (e.g., "REWE 123" -> "Rewe").

2. Rules (`step_rules`)

User Rules: Checks user-specific overrides.
Merchant Defaults: If a merchant was found in step 1, applies default categories (e.g., Rewe -> Groceries).
System Rules: Runs general rules (regex/text matching) if no user rule matched.
Recurring Detection: Analyzes history to flag recurring payments (monthly/quarterly).

3. ML Fallback (`step_ml`)

Optional: If no rule matched, a machine learning model predicts the category based on description and amount.

4. Review Enqueue (`step_review`)

Transactions with low confidence or no category are flagged for manual review in the UI.

Development & Troubleshooting

Adding Rules: Add YAML files to data/rules/.
Testing: Use the "Inspect Rules" UI route to test patterns against existing data.
Common Errors:
- Polars Schema Error: If Analysis failed: could not append value... appears, it usually means a column has mixed data types. Check input CSVs.
- Legacy Rules: ValueError: Legacy rule format means when/then syntax is used. Update to match/set.