Bayes-optimal linear discriminant analysis under heteroscedasticity

  • Sarfo Gyamfi

    Student thesis: Doctoral ThesisDoctor of Philosophy


    Linear discriminant analysis (LDA) has been applied to many machine learning applications such as medical diagnosis, face and object detection, handwriting recognition, spam filtering and credit card fraud prediction. LDA is used either for supervised linear dimensionality reduction of high-dimensional datasets or for statistical classification. Under the assumptions of normally-distributed classes and equal covariance matrices among the classes, LDA is known to be optimal in terms of minimising the Bayes error—the minimum achievable error rate by a classifier whose predictions are based on the knowledge of the stochastic process generating the data. The widespread use of LDA in the application areas indicated above is not because the datasets necessarily satisfy the two assumptions, but mainly due to the robustness of LDA. Nonetheless, for many other applications, the performance of LDA can be unsatisfactory, if the assumptions of normally-distributed classes and equal covariance are not met.

    This thesis primarily addresses the violation of the assumption of equal covariance, also known as homoscedasticity.

    For statistical classification, accounting for homoscedasticity has led to a number of heteroscedastic extensions of LDA, the most natural extension being quadratic discriminant analysis (QDA). However, QDA tends to over-fit for many real-world datasets, especially if the normal distribution assumption is also violated. Thus, heteroscedastic LDA (HLDA) procedures have involved finding a linear approximation to the quadratic boundary in QDA. However, most of these HLDA procedures have no principled optimisation procedure, as they are obtained via trial and error. As a result, they tend to be computationally intractable for high-dimensional datasets. Other HLDA approaches constrain the domain of the search space in an attempt to reduce the computational complexity; this, however, leads to poor performance in terms of the classification accuracy and the area under the receiver operating characteristics curve (AUC) under class imbalance. Using first and second-order optimality conditions for the minimisation of the Bayes error, a dynamic Bayes-optimal linear classifier for heteroscedastic LDA that is robust against class imbalance, and is optimised via a computationally efficient iterative procedure, is derived. The proposed model, referred to as Gaussian linear discriminant (GLD), is also formulated as a kernel classifier, in order to learn non-linear decision boundaries.

    For the purpose of linear dimensionality reduction (LDR), existing heteroscedastic LDA approaches involve the minimisation of some upper bounds of the Bayes error, or the maximisation of some measures of class separation. These procedures are often reformulated as eigenvalue decomposition or singular value decomposition (SVD) problems, after which a desired dimensionality q is chosen by taking the first q independent vectors after the decomposition. However, these procedures provide no optimal dimensionality to which to reduce the data, and consequently, they do not preserve the classification information in the original data after the dimensionality reduction. This thesis presents a novel LDR technique to reduce the dimensionality of the original data to K −1 for a K-class problem, such that the linearly-reduced data is well-primed for Bayesian classification. This technique is referred to as multiclass Gaussian linear discriminant (M-GLD), and it involves sequentially constructing GLD classifiers that minimise the Bayes error via a gradient descent procedure, under an assumption of within-class normality.

    Experimental validation carried out on several artificial and real-world datasets from the University of California, Irvine (UCI) machine learning repository, highlight the scenarios under which the proposed algorithms achieve superior performance to the original LDA and existing HLDA approaches.

    Finally, the utility of the proposed algorithms is demonstrated by applying them to flow meter fault diagnosis. Using data from 4 liquid ultrasonic flow meters, the proposed M-GLD dimensionality reduction procedure and GLD classifier are used to achieve diagnostic accuracies of between 97.2% and 100%; this far exceeds the performance of existing LDA procedures, as well as that of support vector machine (SVM). High diagnostic accuracies promise significant cost benefits in oil and gas operations.
    Date of AwardApr 2018
    Original languageEnglish
    Awarding Institution
    • Coventry University
    SupervisorJames Brusey (Supervisor), Andrew Hunt (Supervisor) & Elena Gaura (Supervisor)

    Cite this