Statistical learning models for the estimation of disease

Although Coronavirus Disease 2019 (COVID-19) is of immediate, pressing interest worldwide, the underlying principles and statistical learning models used for estimating disease undercounts would apply to many other diseases as well. Establishing an accurate estimate of a disease’s infection count is vital for medical resource allocation, as well as planning mitigating steps and making health related policies and guidelines. Statistical learning is a branch of Statistical Science which focuses on using models to make accurate predictions on future, unseen datasets, based on training data which are used to generate a prediction model. This summer research project has the following three objectives: • To utilize currently available models to predict COVID-19 undercounts in specific, geographic areas of the US, and to compare the models’ relative accuracy • To make tweaks, adjustments, and to propose improvements to these models, based on their relative accuracy • To use a Bayesian approach in this prediction problem, where real time updates of emerging COVID-19 counts allow for a constantly evolving amount of prior information about the detected case count. The ability to update prior information about a process in real time is at the heart of the Bayesian statistical paradigm, and this project seeks to streamline that approach into a software package routine which can then be readily applied.

