Baseline Performance on Your Data, Exercises of Data Mining

When you start evaluating multiple machine learning algorithms on your dataset, you need a baseline for comparison. A baseline result gives you a point of reference to know whether the results for a given algorithm are good or bad, and by how much. In this lesson you will learn about the ZeroR algorithm that you can use as a baseline for classification and regression algorithms.

Typology: Exercises

2019/2020

This document is temporarily unavailable for download


Available from 08/20/2023

taruk
taruk 🇵🇭

4

(4)

2 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Lesson 08:
Baseline Performance on Your Data
When you start evaluating multiple machine learning algorithms on your dataset, you need a baseline
for comparison. A baseline result gives you a point of reference to know whether the results for a given
algorithm are good or bad, and by how much. In this lesson you will learn about the ZeroR algorithm
that you can use as a baseline for classification and regression algorithms.
1. Open the Weka GUI Chooser and then the Weka Explorer.
2. Load the data/diabetes.arff dataset.
3. Click the Classify tab. The ZeroR algorithm is chosen by default.
4. Click the Start button.
This will run the ZeroR algorithm using 10-fold cross validation on your dataset. The ZeroR algorithm
also called the Zero Rule is an algorithm that you can use to calculate a baseline of performance for all
algorithms on your dataset. It is the worst result and any algorithm that shows a better performance has
some skill on your problem.
On a classification algorithm, the ZeroR algorithm will always predict the most abundant category. If the
dataset has an equal number of classes, it will predict the first category value. On the diabetes dataset,
this results in a classification accuracy of 65%. For regression problems, the ZeroR algorithm will always
predict the mean output value.
Experiment with the ZeroR algorithm on at least 5 other different datasets. It is the algorithm
you should always run first before all others to develop a baseline.
pf3

This document is temporarily unavailable for download

Partial preview of the text

Download Baseline Performance on Your Data and more Exercises Data Mining in PDF only on Docsity!

Lesson 0 8 : Baseline Performance on Your Data

When you start evaluating multiple machine learning algorithms on your dataset, you need a baseline for comparison. A baseline result gives you a point of reference to know whether the results for a given algorithm are good or bad, and by how much. In this lesson you will learn about the ZeroR algorithm that you can use as a baseline for classification and regression algorithms.

  1. Open the Weka GUI Chooser and then the Weka Explorer.
  2. Load the data/diabetes.arff dataset.
  3. Click the Classify tab. The ZeroR algorithm is chosen by default.
  4. Click the Start button. This will run the ZeroR algorithm using 10-fold cross validation on your dataset. The ZeroR algorithm also called the Zero Rule is an algorithm that you can use to calculate a baseline of performance for all algorithms on your dataset. It is the worst result and any algorithm that shows a better performance has some skill on your problem. On a classification algorithm, the ZeroR algorithm will always predict the most abundant category. If the dataset has an equal number of classes, it will predict the first category value. On the diabetes dataset, this results in a classification accuracy of 65%. For regression problems, the ZeroR algorithm will always predict the mean output value.
  • Experiment with the ZeroR algorithm on at least 5 other different datasets. It is the algorithm you should always run first before all others to develop a baseline.

ZeroR Algorithm