Merchant Normalization

Maps raw transaction text to canonical merchant names.

How It Works

graph LR
    Raw["REWE SAGT DANKE 1234"] --> Normalize[Text Normalization]
    Normalize --> Match[Alias Matching]
    Match --> Merchant["REWE"]

Build text source: Combine payee, description, payer fields
Match aliases: Check each merchant's aliases against text
Select best match: Prioritize by mode rank + match length

Alias Modes

Mode	Priority	Description	Example
`word`	3 (best)	Exact word boundary	`{ word: "rewe" }`
`contains`	2	Substring match	`{ contains: "aldi sagt danke" }`
`regex`	1	Pattern match	`{ regex: "e\\.?(center\|deka)" }`

Merchant File Format

slug: merchants.germany
version: 2
normalize:
  fields: [payee, description, payer]
merchants:
  - name: "REWE"
    aliases:
      - { word: "rewe" }

  - name: "EDEKA"
    aliases:
      - { regex: "e\\.?(center|deka)" }

  - name: "DM"
    aliases:
      - { word: "dm" }
      - { contains: "dm drogerie" }

Catalog Files

data/rules/merchants/
├── base.yaml           # Global merchants (IKEA, Amazon)
├── germany.yaml        # German retailers (REWE, Aldi, DM)
├── international.yaml  # International (Netflix, Spotify)
└── saas.yaml          # SaaS services

Adding a Merchant

Find the right file (or create new)
Add entry with unique name
Add aliases (most specific mode first)

- name: "My Store"
  aliases:
    - { word: "mystore" }           # exact match first
    - { contains: "my store gmbh" } # fallback

Matching Priority

When multiple aliases match: 1. Higher mode rank wins (word > contains > regex) 2. Longer match length wins 3. Earlier alias order wins

Code Location

File	Purpose
`app/ingest/importer/pipeline/normalize.py`	Matching logic
`app/ingest/importer/pipeline/rules_loader.py`	YAML parsing
`app/ingest/importer/pipeline/ruleset_loader.py`	Catalog loading

Merchants

Merchant Normalization

How It Works

Alias Modes

Merchant File Format

Catalog Files

Adding a Merchant

Matching Priority

Code Location