eBangor

Feature selection and classification of non-traditional data : examples from veterinary medicine

Hoare, Zoe Susannah Jane (2007) Feature selection and classification of non-traditional data : examples from veterinary medicine. PhD thesis, Prifysgol Bangor University.

[img] Text
Signed Declaration Hoare.pdf
Restricted to Repository staff only

Download (38kB)
[img]
Preview
Text
434890.pdf

Download (12MB) | Preview

Abstract

Early diagnosis of notifiable diseases in the veterinary domain is important with regard to agriculture, the health sector and the economy. With no diagnostic test in the live animal for either BSE or Scrapie many cases may be mis-diagnosed. Traditionally, data for pattern recognition is stored as recorded cases of interest either labelled with their outcome (suitable for supervised classification) or unlabelled. Each case is described by a collection of symptoms, recorded as present / absent. These are called "binary features". In the case of medical data, the amount of cases recorded in this way may be limited for many reasons. To overcome this lack of data expert-estimated probability tables have been proposed as a substitute. These "non-traditional" tables contain the estimated percentage frequencies of clinical symptoms in various diseases. The construction of the tables assumed that the clinical signs (features) were independent given the diseases (classes). Given the "non-traditional" data, various feature selection techniques were applied and compared in this study in order to select a reduced subset of features (symptoms). The potential, limitations and stability of Sequential Forward Selection (SFS) in particular, were investigated. Decision trees and Naive Bayes classifier models were applied for the diagnosis task. The apparent success and stability of Naive Bayes in the medical domain led to an indepth investigation of the effects of this type of data and its inherent assumptions on the model. Naive Bayes is known to be optimal in the case of independent features, which is the condition assumed by the estimated probability tables in the "non-traditional" data. Various proposed adaptations to the Naive Bayes model were investigated with regard to their optimality when the independence assumption is violated. Finally, the performance of Naive Bayes with regard to traditionally stored medical data with binary features was assessed. Naive Bayes and its adaptations performed well with the traditional data. Since the effect of assuming independence when it is not true is minimal, using the "non-traditional" data with the Naive Bayes classifier can be a practical solution for veterinary diagnosis.

Item Type: Thesis (PhD)
Subjects: Degree Thesis
Departments: College of Physical and Applied Sciences > School of Computer Science
Degree Thesis
Date Deposited: 14 May 2015 04:54
Last Modified: 16 Aug 2016 08:36
URI: http://e.bangor.ac.uk/id/eprint/4386
Administer Item Administer Item

eBangor is powered by EPrints 3 which is developed by the School of Electronics and Computer Science at the University of Southampton. More information and software credits.