eBangor

A study on diversity in classifier ensembles

Shipp, Catherine A. (2004) A study on diversity in classifier ensembles. PhD thesis, Prifysgol Bangor University.

[img] Text
Shipp signed declaration.pdf
Restricted to Repository staff only

Download (31kB)
[img]
Preview
Text
409560.pdf

Download (21MB) | Preview

Abstract

In this thesis we carry out a series of investigations into the relationship between diversity and combination methods and diversity and AdaBoost. In our first investigation we study the relationships between nine combination methods. Two data sets are used. We consider the overall accuracies of the combination methods, their improvement over the single best classifier, and the correlation between the ensemble outputs using the different combination methods. Next we introduce ten diversity measures. Using the same two data sets, we study the relationships between the diversity measures. Then we look at their relationship to the combination methods previously studied. The ranges of the ten diversity measures for three classifiers are derived. They are compared with the theoretical ranges and their implications for the accuracy of the ensemble are studied. We then proceed to investigate the diversity of classifier ensembles built using the AdaBoost algorithm. We carry out experiments with two datasets using ten-fold cross validation. We build 100 classifiers each time using linear classifiers, quadratic classifiers or neural networks. We study how diversity varies as the classifier ensemble grows and how the different types of classifier compare. Next we consider ways of improving AdaBoost's performance. We conduct an investigation into how modifying the size of the training sets and the complexity of the individual classifiers alter the ensemble's performance. We carry out experiments using three datasets. Lastly we consider using pareto optimality to determine which classifiers built by AdaBoost to add to the ensemble. We carry out experiments with ten datasets. We compare standard AdaBoost to AdaBoost with two versions of the Pareto-optimality method called Pareto 5 and Pareto 10, to see whether we can reduce the ensemble size without harming the ensemble accuracy.

Item Type: Thesis (PhD)
Subjects: Degree Thesis
Departments: College of Physical and Applied Sciences > School of Computer Science
Degree Thesis
Date Deposited: 14 May 2015 04:48
Last Modified: 31 Mar 2017 16:36
URI: http://e.bangor.ac.uk/id/eprint/4337
Administer Item Administer Item

eBangor is powered by EPrints 3 which is developed by the School of Electronics and Computer Science at the University of Southampton. More information and software credits.