The Improvement of PHP Algorithm for Association Rules

ASSOCIATION RULES MINING Association rules mining is the common way of data mining, we can find abundances interesting associations or relations f the data. There are two steps of association rules mining: the first step is to find all the sets of item that they can have a minimum support count, we can call those sets as itemset, and then we define those itemsets that have minimum support as frequent itemset. The set include k sets is so called k-item,. The second step is to generate association rules from frequent itemset. We can find out all the none null subset s of frequent itemset l, if support_count(l)/support_count(s)>=min_conf, then we get the rule sŸ(l-s). support count is the frequency of itemset appeared.

PHP ALGORITHM BASED ON ASSOCIATION RULES PHP algorithm is improved from Apriori algorithm and DHP algorithm. As for some shortcoming of DHP, PHP did some improvement of Apriori algorithm. In DHP algorithm, different candidate sets maybe in the same bucket because of the same Hash value, then the minimum support count will be no more than those bucket counts, but support count of candidate itemset in the bucket maybe no more than the minimum support count. These candidate itemset are so called false positive sets. Because of these false positive sets, it is very necessary to scan database support count and delete those false positive sets, afterwerds we can find the real frequent itemset. So in PHP algorithm, we have to define a bigger Hash list to map those candidates to variable buckets. each defined candidate has its own bucket count , no false positives exist, then the process will be improved by discount Hash list itemset count. In other way, it can be efficient by pruning the database in PHP algorithm. Before we generate candidate k-itemset of each affair of database, we should delete those false positive of the affair, if there is no false positive, then delete the affair, so database will be getting less., then candidate by affair will be much less. In this way we can have more efficient when the database is much bigger and frequent itemset is much less. Then before the candidate k-itemset of affair add to Hash, we should check all k-1 subset is frequent or not, if it satisfy the condition, candidate k-itemset will be possibley frequent, and then put them into Hash list. This is previous pruning, in this way we can control the size of Hash list.