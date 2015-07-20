ZeroR gives us a 64.2% base-level learning accuracy in this example, but it'd be nice to do a bit better than that. That's where the OneR algorithm comes in. It's called a 'classification rule learner', in that, given what it learns from a training dataset, it generates rules that allow us to determine or 'classify' the result of a future instance.

If you look at the OneR table above, you can see how it works – each weather dataset attribute has a small number of possible values. For Outlook, they are 'sunny', 'overcast' and 'rainy'. For temperature, it's 'hot', 'mild' and 'cool' and so on. We create a separate list for each attribute value and then count how many times each value occurs in an instance by noting the number of 'yes' and 'no' results we get.

For example, going through the 14 instances, you can see when five instances where the outlook is sunny, giving us two 'yes' and three 'no' results. Likewise, 'outlook = overcast' gets four 'yes' votes and zero 'no' results. We then do likewise for all of the other attributes.

Next, we count up the errors – these are the smaller counts for each attribute value, so again, for 'outlook = sunny', the 'yes' count is only two; for 'outlook = overcast', the 'no' count is zero, for 'outlook = rainy', it's two and so on. The red boxes on the table show the most popular class values for each attribute value and it's from these that we make our first set of 'Outlook' rules:

Outlook = sunny -> Play = no

Outlook = overcast -> Play = yes

Outlook = rainy -> Play = no

Again, we do likewise for the other attributes. What we're doing is taking the most popular class value for each attribute value and assigning it to that attribute-value pair to make a rule, so for this example, outlook being 'sunny' leads to play being 'no' and so on. Next, we repeat this for each of the other three attributes. After that, we add up those 'error' counts for each attribute value, so Outlook is 2 + 0 + 2 totaling 4 out of 14 (4/14). For temperature, we get 5/14, 4/14 for Humidity and 5/14 for Windy.

Now, we choose the attribute with the smallest error count. Since in this example we have two attributes with error count of 4 out of 14 (Outlook and Humidity), you can choose either - we've gone with the first one, the 'Outlook' attribute ruleset above.

This now becomes our 'OneR' (one-rule) classification rule set. Using this rule on the training dataset, it correctly predicts 10 out of 14 instances or just under 71.5%. Remember, ZeroR gave us 64.2%, so OneR gains us greater accuracy, which is what we want.