SGEM Xtra: It’s All About the Bayes, ‘Bout the Bayes, No Fisher

Subscribe: RSS

Guest Skeptic: Dr. Dan Lane has a Masters in Health Services Research at the University of Calgary, a Doctor of Philosophy in Clinical Epidemiology from the University of Toronto and is currently a medical student at the University of Calgary.

Dan is naturally a contrarian, he strives to understand first principles of conventions in medical research in order to identify and challenge poor practices that have become dogma. He is passionate about statistics and epidemiology and wants to share that passion by making these topics more practical and approachable for clinicians. Believing the key to proper interpretation of medical research does not begin with memorizing some arbitrary threshold for statistical significance, Dan hopes to contribute to the SGEM through sharing an understanding of what story the numbers are actually telling about the data. Dan has no funding whatsoever, and no associations with industry. He is currently a medical student at the University of Calgary.

Dan has some pet peeves when it comes to statistics there used and critical appraisals. We will do some more in depth SGEM Xtras on each of these issues.

Thomas Bayes

Absolute vs. Relative Estimates
Effect Estimates and Not P-Values
All Models are Wrong
Predication vs. Classification
Bayes No Frequentists

The purpose of this SGEM Xtra, beside to introduce a new SGEM faculty member, is also to announce we are adding a new segment to the SGEM. It is going to be called Statistically Significant.

We want to make the SGEM even better and address some of the criticisms from the ClinEpi world about clinicians trying to do critical appraisal. In order to do that we now have a Dr. Dan Lane PhD who will be commenting on each the SGEM episodes.

The first instalment of Statistically Significant segment will be on this weeks’ SGEMHOP looking at troponin testing in the elderly patients presenting with non-specific complaints (SGEM#280). Let me know what you think of this idea. We have a few more lined up and feedback is always appreciated. Send me an email TheSGEM@gmail.com

Statistically Significant #280: Sensitivity and Specificity

Despite their dogmatic use in the literature, sensitivity and specificity have a number of limitations that are rarely considered or addressed in diagnostic test studies.

Sensitivity and Specificity are crude metrics, meaning they only look at the effect of a single measure and a single outcome. As crude measures they fail to incorporate any other information into their estimates, including potential confounders for the relationship between the test result and the outcome.

In this particular study, age is part of the primary objective for the study (geriatric patients) but is also a confounder of the relationship between troponin level (which may increase with age) and acute coronary syndrome risk (also increases with age). When confounders like age are present, crude measures will be influenced based on the prevalence of confounders in each the groups – for example, if there were more older patients in the troponin positive group, the estimates for sensitivity may be inflated.

Another limitation of sensitivity and specificity is they require a test result be classified as positive or negative. This is problematic when the real measure is a continuous measure, such as troponin. In the current study the test was considered “positive” if the troponin level was above the 99th percentiles for that enzyme. But this arbitrarily treats patients above or below the 99th percentile as homogeneous groups, meaning the statistics consider everyone above the threshold to be the same, and everyone below the threshold to be the same.

Consider a patient with a troponin right below the threshold and another patient right above the threshold – surely these patients are almost identical in terms of their risk for having ACS. But by inserting an arbitrary break into the measure, the statistics will treat them as different resulting in more misclassifications simply because a threshold for positive or negative was selected.

Instead of these binary classifications, researchers could focus directly on the patient’s risk of the outcome. This can be represented using probabilities and a smooth curve that shows the probability of ACS based on the exact troponin value. Using simple statistical models, these probability estimates can be adjusted for confounders, like age, and provide easily interpretable probability estimates for the entire range of troponins – no classification required!

References:

Amrhein, Greenland and McShare. Scientists rise up against statistical significance. Nature 2019
Reginal Nuzzo. STATISTICAL ERRORS. P values, the ‘gold standard’ of statistical validity, are not as reliable as many scientists assume. Nature 2014
Fatovich and Phillips. The probability of probability and research truths. AEM 2017
Greenland et al. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. EJE 2016
Guggenmoos-Holzmann and van Houwelingen. The (In)Validity of sensitivity and specificity. Statistics in Medicine 2000