Sunday, August 30, 2015

Random Intersection Trees

Found an interesting paper with a new idea on how to build interpretable decision rules. The paper
describes an algorithm to find a set of binary features that are together indicative of a class. For example, a set of binary features might be:

  • Humidity = high: true / false
  • Temperature > 10 C: true / false
  • Sunny = true / false
  • Rain = true false
A feature is said to be active whenever the result evaluates to true. For two classes C1 and C2, the goal is to find a subset of active features S  such that P(S|C1) > O2 and P(S|C2) < O1 for two thresholds 0 < O1 < O2 < 1

For example, let the class be good weather and bad weather. Furthermore, I specified the thresholds
as O1 = 0.5 and O2 = 0.25. A good result might then be 

  • P(S = {temperature > 10, sunny} | good weather)  > .5
  • P(S = {temperature > 10, sunny} | bad weather)    < .25
Such a result would be indeed interpretable and we could agree with the results that a higher temperature and sun would result in higher probabilities for good weather.

The algorithm works in two steps: 
  1. Candidate generation for subsets S.
  2. Check subsets for constraint: P(S|C1) > O2 and P(S|C2) < O1 
The main algorithm in the paper is an efficient candidate generation algorithm.
The main idea is to organise the subset search into a random tree of fixed depth and breadth.
At every node in the tree we pick a random example from the dataset with a positive label and then
compute the intersection of that example with the active feature set of the parent node.

By intersecting with positive examples only, the set of features gets more and more refined at each level. All remaining sets at the specified depth are candidates. Afterwards, we only have to check gif the subset holds to the constraints. 

A very interesting method to find rules in a dataset which is an interesting alternative to decision tree learning and apriori rule discovery.