Affinity Analysis: Discovering co-occurrence relationships between different items in transactions

Affinity analysis finds items that tend to appear together within the same transaction. In retail it is called market-basket analysis, but the idea applies to any event that can be represented as a set of items: an order, a user session, a claim, or a support ticket. The outcome is a set of association rules that can support bundling, recommendations, and prioritisation decisions based on observed co-occurrence rather than assumptions. Many learners first meet the topic in a data analytics course in Bangalore because it connects clean data preparation with practical business decision-making.

Table of Contents

The metrics that make association rules useful

Association rules are usually written as X → Y, meaning “when X appears, Y is more likely to appear.”

Support

Support is the percentage of all transactions that contain both X and Y. If 900 out of 45,000 orders include {wireless mouse, laptop sleeve}, the support is 2%. Support helps you ignore patterns that are too rare to be stable or commercially meaningful. It also prevents overreacting to one-off combinations that may not repeat in future data.

Confidence

Confidence measures how often Y occurs among transactions that contain X. If 3,000 orders include “wireless mouse” and 900 of those also include “laptop sleeve”, then confidence for {mouse} → {sleeve} is 900/3,000 = 30%. Confidence is intuitive, but it can still overvalue rules where Y is very common. That is why confidence should rarely be used alone.

Lift

Lift adjusts for popularity by comparing confidence with the baseline rate of Y. If laptop sleeves appear in 10% of all orders, then lift = 0.30/0.10 = 3.0. Lift above 1 indicates co-occurrence stronger than chance and is often the most practical screening metric when you want rules that genuinely add signal.

Building a reliable affinity analysis workflow

Good rules depend more on definitions and validation than on a single algorithm run.

Define the transaction and item granularity

Choose a transaction unit that matches the business question. For checkout add-ons, use orders; for browsing patterns, sessions may fit better. Decide the item level too: SKU-level rules can be noisy, while category-level rules are usually more stable and easier to action. This single choice can change which associations appear “strong.”

Prepare data and set thresholds

Standardise identifiers, handle returns/cancellations, and decide how to treat variants (size, colour, multipacks). Then set minimum support and minimum lift to control noise. Threshold tuning determines whether you get a small, actionable rule set or thousands of weak associations, which is why it is stressed in a data analytics course in Bangalore. In practice, teams tune thresholds iteratively until the rules are both credible and implementable.

Choose a method and validate over time

Apriori is simple but can slow down when there are many unique items. FP-Growth is commonly used at scale because it reduces candidate generation. Whatever you use, validate rules on a later holdout window so you can see whether associations persist beyond a promotion or season. This step reduces the risk of deploying rules that work only during one campaign.

Where affinity analysis creates value

Affinity analysis is most useful when it triggers a measurable change.

E-commerce recommendations and merchandising

Rules can power “frequently bought together” suggestions, in-cart add-ons, or bundle packs. Measure impact with an experiment: track attachment rate (how often the suggested item is added) and changes in average order value, while monitoring returns so recommendations remain relevant. Even a high-lift rule can be a poor recommendation if it leads to unwanted add-ons or higher returns.

Operations and support ticket insights

In service data, “items” can be issue tags, error codes, or complaint categories. If {OTP delay} and {login failure} repeatedly co-occur within the same ticket or within a short customer window, that pattern can suggest shared upstream dependencies. This applied use case is a good reminder that affinity analysis is not limited to shopping baskets; it is also a practical technique discussed in a data analytics course in Bangalore for operational analytics and incident triage.

Common pitfalls to avoid

Popularity traps: high confidence can be driven by a universally common item; always inspect lift and baseline rates.
Spurious rules: promotions, placement, or default bundles can create misleading co-occurrence; use holdout validation.
Sparse catalogues: huge SKU counts push support down; group to category level where appropriate.
Rule overload: keep only rules tied to a clear action, owner, and measurement plan.

Conclusion

Affinity analysis turns transaction history into evidence-based co-occurrence relationships. Support ensures patterns have enough volume, confidence describes conditional likelihood, and lift highlights associations beyond simple popularity. With careful transaction definitions, sensible thresholds, and time-aware validation, the results can improve recommendations and operational prioritisation in a measurable way. If you revisit the topic through a data analytics course in Bangalore, focus on how each rule maps to a specific action and a clear way to test whether that action improves outcomes.