Hoeffding Tree Classifier
Hoeffding Tree or Very Fast Decision Tree classifier. A Hoeffding Tree1 is an incremental, anytime decision tree induction algorithm that is capable of learning from massive data streams, assuming that the distribution generating examples does not change over time. Hoeffding trees exploit the fact that a small sample can often be enough to choose an optimal splitting attribute. This idea is supported mathematically by the Hoeffding bound, which quantifies the number of observations (in our case, examples) needed to estimate some statistics within a prescribed precision (in our case, the goodness of an attribute).
A theoretically appealing feature of Hoeffding Trees not shared by other incremental decision tree learners is that it has sound guarantees of performance. Using the Hoeffding bound one can show that its output is asymptotically nearly identical to that of a non-incremental learner using infinitely many examples. Implementation based on MOA2.
Parameters
-
n_classes(
int
) → The number of classes for the classifier. -
grace_period(
int
, Default:200
) → Number of instances a leaf should observe between split attempts. -
split_method(
str
, Default:gini
) → Split criterion to use.gini
- Giniinfo_gain
- Information Gainhellinger
- Helinger Distance
-
delta(
float
, Default:1e-07
) → Significance level to calculate the Hoeffding bound. The significance level is given by 1 - delta. Values closer to zero imply longer split decision delays. -
tau(
float
,Default:0.05
) → Threshold below which a split will be forced to break ties. -
leaf_pred_method(
str
,Default:mc
) → Prediction mechanism used at leafs. For now only Majority Class (mc
) is supported.
Example Usage
We can create an instance of the HoeffdingTreeClassifier model like this.
import turboml as tb
htc_model = tb.HoeffdingTreeClassifier(n_classes=2)