The Digested Read: Every PhD in Health Informatics

By Paul M Taylor, on 10 November 2010

No one has time to read other people’s PhD theses. So, inspired by the Guardian’s Digested Read column, we’ve decided to publish succinct – very succinct – summaries of the major categories of Health Informatics PhD.

Today, in issue 1 of the series, we deal with classification problems.

Using a Fashionable New Computer Science Paradigm to Classify Cases of an Important Disease

This disease is a major health problem and one where diagnostic or management errors can occur. Previous attempts to apply computers to classify cases of this disease have achieved promising results (classification accuracy of over 70% is commonly claimed) but haven’t yet led to clinical take-up. A fashionable new computer science paradigm seems to offer an exciting and novel approach to this problem.

An innovative algorithm based on the fashionable new computer science paradigm was implemented by the candidate. Applying the algorithm to a publicly available dataset with questionable ground truth achieved a classification accuracy of barely more than 60%. Successive, increasingly desperate, and, in the end, almost random modifications allow a classification accuracy of nearly 70% to be claimed.

A new, but disappointingly small, dataset was collected in collaboration with the local clinical team. On this dataset, the modified algorithm achieved an accuracy of just less than 60% when tested against the consensus opinion of clinicians. Closer analysis identified a subset of interesting cases where junior clinicians only achieved agreement levels of 58% when assessed against each other.

The application of the fashionable new computer science paradigm to this important disease seems highly promising.