Association rule learning

by Irina 29. May 2007 04:40

In data mining and treatment learning, association rule learners are used to discover elements that co-occur frequently within a data set consisting of multiple independent selections of elements (such as purchasing transactions), and to discover rules, such as implication or correlation, which relate co-occurring elements. Questions such as "if a customer purchases product A, how likely is he to purchase product B?" and "What products will a customer buy if he buys products C and D?" are answered by association-finding algorithms.

This application of association rule learners is also known as market basket analysis. As with most data mining techniques, the task is to reduce a potentially huge amount of information to a small, understandable set of statistically supported statements.market basket analysis is often promoted as a means to obtain product associations to base a retailer’s promotion strategy on.

Associated products with a high lift/interest can be promoted effectively by only discounting just one of the two products. Implicitly, the assumption that market basket analysis automatically identifies complements. Academics , however, have shown that one should be careful with this conclusion. They show that this implicit assumption does not hold. Their empirical analysis reveals that market basket analysis identifies as many substitutes as complements. Therefore, market basket analysis should not be used to build a promotion expert system for retailers, unless supplemented by other, more empirical, methods of product relationship determination.

RULES

After reviewing the straight frequencies, clicking back to the Rules tab provides information about the relationships between the items. By default, the rules are expressed as “item A implies item B”, and are listed with the following measures: Expected confidence is the percentage of times item B occurs in the data.

Confidence is the percentage of cases in which item B is present when item A is present.
Support is the percentage of records containing both item A and item B.
Lift is how much more likely item B is if item A happens. A rule has lift when its confidence is higher than its expected confidence.
Count is the frequency of item A and item B occurring together.

Related posts

Add comment


(will show your Gravatar icon)  





Live preview

3/11/2010 3:27:50 PM

 

About the author

Irina Spivak Irina Spivak
Team Leader at G-Stat. More...


Send mail Email

Authors

Blogroll

    Disclaimer

    The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

    © Copyright 2010

    Sign in

    eXTReMe Tracker