MBA
All docs
sql_pairsfp_growthhuiai_catalog

Mining engines

Four engines ship today. Three are deterministic miners that work directly on your order history; the fourth is an LLM-driven catalog miner that solves the cold-start problem. All four return rules in the same shape, so the rest of the pipeline (scoring, opportunity ranking, proposals) doesn't care which engine produced them.

The shared contract

Every engine implements MiningEngineInterface and returns an EngineRuleResult, a list of association rules plus a metrics block. A rule is a tuple of:

  • antecedent, one or more item IDs (the “if” side)
  • consequent, one item ID (the “then” side)
  • support, how often the full itemset appears, as a ratio of all transactions
  • confidence, P(consequent | antecedent)
  • lift, how much more likely the consequent is given the antecedent, vs. baseline

The downstream scorer applies your profit / margin / discount rules to those numbers to produce ranked Opportunities. Engine choice affects which rules show up, not how they're ranked.

1. sql_pairs, the SQL pair miner

Pure SQL pair counting over the transaction table. Builds a temporary table of ordered (item_a, item_b) pairs with their co-occurrence counts via a single JSON_TABLE self-join, then derives support / confidence / lift from the item-frequency cache.

When to use it: tight on time, large transaction store, and you only need 2-item rules. SQL-native execution means nothing is materialized in PHP memory.

When not to use it: when you want 3- or 4-item kits.sql_pairs emits 2-item rules only, regardless of max_itemset_size.

2. fp_growth, the default

FP-Growth (Frequent Pattern Growth). Builds a compact prefix tree (FP-tree) of frequent items, then mines frequent itemsets recursively by following conditional pattern bases. Handles large catalogs without the candidate-explosion problem of classic generate-and-test miners.

Itemset size: emits 2, 3, and 4-item rules (capped by the job's max_itemset_size).

When to use it: the default for most stores. Handles 50k–500k transactions and catalogs of any size comfortably. The recommended engine for finding real bundles (3+ items) rather than just cross-sell pairs.

How to tune it:

  • Lower min_support to discover more rules; raise it to keep runtime down on big catalogs.
  • max_itemset_size = 2 for cross-sell only; 3-4 for bundles.
  • Bound the output with max_rules so a wide-net job doesn't flood the Opportunities grid.

3. hui, profit-aware mining

High-Utility Itemset mining. Where the support-based engines find itemsets that appear together often, HUI finds itemsets that generate the most profit together. The miner ranks by utility (margin contribution) instead of raw frequency, so a low-volume but high-margin combination can outscore a frequent but thin-margin one.

When to use it: when margin dispersion across the catalog is wide and you want bundle candidates ranked by gross profit contribution rather than co-occurrence count. Especially useful for catalogs with a long-tail of high-margin SKUs that frequency-based mining underweights.

When not to use it: when item-level margin data is not configured, or when you want a pure observation-based view of what customers buy together regardless of profit. Use fp_growth for that.

4. ai_catalog, the cold-start fix

The only engine that doesn't need order history. Sends your catalog (SKU + name + category path + price) to Claude with a structured prompt asking for natural complement pairs, a 0-1 score, and a one-sentence rationale per pair. Synthesizes a well-formed support / confidence / lift triplet for each pair so the downstream ranking + proposal generator work identically.

When to use it:

  • Fresh install with zero completed orders, “get to a populated Opportunities grid within minutes.”
  • Brand-new product launches where there's no co-purchase history yet.
  • Seasonal categories (gifting, holiday) where last year's data may not match this year's assortment.
  • As a complement to FP-Growth: run both, dedupe, and you get coverage for both established and brand-new SKUs.

Requirements: a configured Anthropic or OpenAI API key in store config, the same BYO-key setup on Magento and WooCommerce alike (see the AI features guide). The engine walks a bounded slice of the catalog, visible, enabled, configurable + simple parents only (variants are merged via the configurable parent path), so a 100k-SKU catalog won't burn unreasonable tokens, and the tokens you do burn are billed by your provider direct.

Caveat: the rules are reasoned, not observed. We mark them as synthetic in the metrics block; the Opportunities grid badges them so reviewers know which rules came from data vs. reasoning. We strongly recommend re-running with fp_growth once you have 30+ days of orders so observed rules supersede inferred ones.

Which engine should I pick?

SituationEngineWhy
First job ever, no order history yetai_catalogSkips the cold-start gap
Standard ongoing mining (≥ 1k orders, ≥ 200 SKUs)fp_growthBest general-purpose default; multi-item rules
Cross-sell only, very large transaction storesql_pairsSQL-native; fastest at scale for pairs
Profit-aware ranking, wide margin spreadhuiRanks bundle candidates by gross profit contribution

Shared engine knobs

All four engines respect the same per-job MiningConfig. Magento exposes them under Stores → Configuration → MBA Reports → Job Defaults; WooCommerce uses identical names in MBA → Settings → Mining.

SettingTypeDefaultWhat it does
max_itemset_sizeint (2–4)3Maximum items in a rule, including consequent. 2 for pairs only, 3-4 for true bundles.
min_supportfloat (0.0–1.0)0.01Fraction of transactions that must contain the itemset. Lower → more rules, slower mine.
min_confidencefloat (0.0–1.0)0.30P(consequent | antecedent) threshold. Higher → more selective recommendations.
min_liftfloat (≥ 0.0)1.5Co-occurrence above chance. < 1 means anti-correlated; > 1 means meaningful association.
max_rulesint (> 0)5000Hard cap on rules emitted per job, guardrail against flooding the Opportunities grid.

Where this lives in the code

Each engine is registered in EngineFactory via DI (Magento etc/di.xml; WooCommerce includes/Mining/EngineFactory.php). Adding a custom engine is a matter of implementing the interface and adding one DI entry, no core changes needed. Hosted-tier customers can request a private engine in their plan; see Hosted Engine.

Notify me when new mining engines ships

Seasonal cohort mining and return-aware engines are next. We'll email when each one lands.

One email when it ships. No drip campaign.

Ready to turn your order data into revenue?

Install on your platform in under 10 minutes. Or book a consulting call and we'll do the launch for you.