Workflow

Data Workflow

Overview

The application processes bank statements through a defined pipeline to categorize and visualize transactions.

Pipeline Steps (the_pipeline.py)

graph TD
    A[Input Data] --> B[Normalize]
    B --> C[Rules]
    C --> D[ML Fallback]
    D --> E[Review Enqueue]
    E --> F[Result]

1. Normalize (step_normalize)

  • Input: Raw transaction data (payee, description, etc.).
  • Merchant Matching: Scans all text fields against the Merchant Catalog.
  • Strategy: Uses word, contains, or regex aliases to find the best match.
  • Output: A canonical merchant name (e.g., "REWE 123" -> "Rewe").

2. Rules (step_rules)

  • User Rules: Checks user-specific overrides.
  • Merchant Defaults: If a merchant was found in step 1, applies default categories (e.g., Rewe -> Groceries).
  • System Rules: Runs general rules (regex/text matching) if no user rule matched.
  • Recurring Detection: Analyzes history to flag recurring payments (monthly/quarterly).

3. ML Fallback (step_ml)

  • Optional: If no rule matched, a machine learning model predicts the category based on description and amount.

4. Review Enqueue (step_review)

  • Transactions with low confidence or no category are flagged for manual review in the UI.

Development & Troubleshooting

  • Adding Rules: Add YAML files to data/rules/.
  • Testing: Use the "Inspect Rules" UI route to test patterns against existing data.
  • Common Errors:
    • Polars Schema Error: If Analysis failed: could not append value... appears, it usually means a column has mixed data types. Check input CSVs.
    • Legacy Rules: ValueError: Legacy rule format means when/then syntax is used. Update to match/set.