Large observational data networks that leverage routine clinical practice data in electronic health records (EHRs) are critical resources for research on COVID-19. Data normalization is a key challenge for the secondary use of EHRs for COVID-19 research across institutions. In this study, we addressed the challenge of automating the normalization of COVID-19 diagnostic tests, which are critical data elements, but for which controlled terminology terms were published after clinical implementation. We developed a simple but effective rule-based tool called COVID-19 TestNorm to automatically normalize local COVID-19 testing names to standard LOINC codes. COVID-19 TestNorm was developed and evaluated using 568 test names collected from eight healthcare systems. Our results show that it could achieve an accuracy of 97.4% on an independent test set. COVID-19 TestNorm is available as an open-source package for developers and as an online web application for end-users (https://clamp.uth.edu/covid/loinc.php). We believe it will be a useful tool to support secondary use of EHRs for research on COVID-19.

Keywords: COVID-19; COVID-19 TestNorm; LOINC; Natural Language Processing; Testing name normalization.

Git