What are Association Rules?
Association rule mining finds interesting relationships among large data sets. Association rules show conditions that occur frequently together in a given data set. A typical and widely used example of association rule mining is Market Basket Analysis. At the end of this post I attached a few links and resources for trying out Association rules yourself with R.
For example, data collected in stores and even more so in the app or game store, consists of a large number of transaction records. Each record lists all items bought by a customer within a single purchase transaction. It could be interesting to learn if certain items are consistently purchased together. This data can be used for cross-selling and up selling, for promotions and for identifying customer segments based on their individual buying patterns.
Association Rules Components
Association rules provide purchasing pattern information of this type in the form of "if-then" statements and are probabilistic in nature (i.e customers who purchased cake bought ice cream).
In association analysis the “if” and “then” are sets of items that are disjoint (do not have any items in common).
For the remainder of this post I will use the technical terms antecedent and consequent to represent the “if” and “then” sections, respectively.
(The) Support and Confidence
In addition to the rule an association rule has two numbers that express the degree of uncertainty about the rule. The first number is called the support for the rule. The support is simply the number of transactions that include all items in the antecedent and consequent parts of the rule. (The support is sometimes expressed as a percentage of the total number of records in the database.)
The other number is known as the confidence of the rule. Confidence is the ratio of the number of transactions that includes all items in the antecedent as well as the consequent (namely, the support) to the number of transactions that includes all items in the antecedent.
Implementation Examples for Games or Apps:
For example, if an online game database has 100,000 online transactions, out of which 1,500 include both items A and B and 600 of these include item C, the association rule "If A and B are purchased then C is purchased on the same online transaction" has a support of 800 transactions (alternatively 0.6% = 600/100,000) and a confidence of 40% (=600/1,500). The same can be calculated on a customer level, meaning customers who bought items A and B in the last month also bought C.
One way to think of support is that it is the probability that a randomly selected transaction from the database will contain all items in the antecedent and the consequent, whereas the confidence is the conditional probability that a randomly selected transaction will include all the items in the consequent, given that the transaction includes all the items in the antecedent.
Lift is the number of transactions that includes the consequent divided by the total number of transactions. Suppose the total number of transactions for C are 5,000. Thus, Expected Confidence is 8,000/100,000=8%.
In the context of our online game example:
Lift = Confidence/Expected Confidence = 40%/8% = 5
Therefore, lift is a value that gives us information about the increase in probability of the consequent given the antecedent.
Links to simple R code fragments you can experiment with:
(Many thanks to Micky Daniels for his linguistic and editorial advice)