






Prepara tus exámenes y mejora tus resultados gracias a la gran cantidad de recursos disponibles en Docsity
Gana puntos ayudando a otros estudiantes o consíguelos activando un Plan Premium
Prepara tus exámenes
Prepara tus exámenes y mejora tus resultados gracias a la gran cantidad de recursos disponibles en Docsity
Prepara tus exámenes con los documentos que comparten otros estudiantes como tú en Docsity
Encuentra los documentos específicos para los exámenes de tu universidad
Estudia con lecciones y exámenes resueltos basados en los programas académicos de las mejores universidades
Responde a preguntas de exámenes reales y pon a prueba tu preparación
Consigue puntos base para descargar
Gana puntos ayudando a otros estudiantes o consíguelos activando un Plan Premium
Comunidad
Pide ayuda a la comunidad y resuelve tus dudas de estudio
Ebooks gratuitos
Descarga nuestras guías gratuitas sobre técnicas de estudio, métodos para controlar la ansiedad y consejos para la tesis preparadas por los tutores de Docsity
This document evaluates the performance of a bayesian classifier called coco, which assigns patients to one of seven syndromic categories based on free-text triage chief complaints. The study compares coco's classifications with criterion syndromic classification based on icd-9 discharge diagnoses and tests its applicability to chief complaints from a second location. The importance of syndromic surveillance systems for early outbreak detection and the challenges in evaluating their accuracy.
Tipo: Apuntes
1 / 11
Esta página no es visible en la vista previa
¡No te pierdas las partes importantes!







Wendy W. Chapman, PhD John N. Dowling, MD, MS
Michael M. Wagner, MD, PhD
From the Real-time Outbreak and Disease Surveillance Laboratory, Center for Biomedical Informatics, Department of Medicine, University of Pittsburgh, Pittsburgh, PA.
Study objective: Electronic surveillance systems often monitor triage chief complaints in hopes of detecting an outbreak earlier than can be accomplished with traditional reporting methods. We measured the accuracy of a Bayesian chief complaint classifier called CoCo that assigns patients 1 of 7 syndromic categories (respiratory, botulinic, gastrointestinal, neurologic, rash, constitutional, or hemorrhagic) based on free-text triage chief complaints.
Methods: We compared CoCo’s classifications with criterion syndromic classification based on International Classification of Diseases, Ninth Revision (ICD-9) discharge diagnoses. We assigned the criterion classification to a patient based on whether the patient’s primary diagnosis was a member of a set of ICD-9 codes associated with CoCo’s 7 syndromes. We tested CoCo’s performance on a set of 527,228 chief complaints from patients registered at the University of Pittsburgh Medical Center emergency department (ED) between 1990 and 2003. We performed a sensitivity analysis by varying the ICD-9 codes in the criterion standard. We also tested CoCo on chief complaints from EDs in a second location (Utah).
Results: Approximately 16% (85,569/527,228) of the patients were classified according to the criterion standard into 1 of the 7 syndromes. CoCo’s classification performance (number of cases by criterion standard, sensitivity [95% confidence interval (CI)], and specificity [95% CI]) was respiratory (34,916, 63.1 [62.6 to 63.6], 94.3 [94.3 to 94.4]); botulinic (1,961, 30.1 [28.2 to 32.2], 99.3 [99. to 99.3]); gastrointestinal (20,431, 69.0 [68.4 to 69.6], 95.6 [95.6 to 95.7]); neurologic (7,393, 67. [66.6 to 68.7], 92.7 [92.6 to 92.8]); rash (2,232, 46.8 [44.8 to 48.9], 99.3 [99.3 to 99.3]); constitutional (10,603, 45.8 [44.9 to 46.8], 96.6 [96.6 to 96.7]); and hemorrhagic (8,033, 75. [74.3 to 76.2], 98.5 [98.4 to 98.5]). The sensitivity analysis showed that the results were not affected by the choice of ICD-9 codes in the criterion standard. Classification accuracy did not differ on chief complaints from the second location.
Conclusion: Our results suggest that, for most syndromes, our chief complaint classification system can identify about half of the patients with relevant syndromic presentations, with specificities higher than 90% and positive predictive values ranging from 12% to 44%. [Ann Emerg Med. 2005;46: 445-455.]
0196-0644/$-see front matter Copyright ª 2005 by the American College of Emergency Physicians. doi:10.1016/j.annemergmed.2005.04.
Since 1999, electronic syndromic surveillance systems have been deployed across the country.1-13^ Emergency department (ED) data are the foundation of many syndromic surveillance systems, and researchers have shown that common outbreaks can be detected 1 to 2 weeks earlier with ED data than through conventional disease reporting methods. 14 Earlier detection of outbreaks may save many lives. 15 Some surveillance systems require manual classification of patients into relevant syndromes by triage nurses or emergency physicians, 1-3^ whereas others use
preexisting electronic ED data 4,5^ that typically include date of admission, sex, age, address, coded discharge diagnosis, 6-8^ and free-text triage chief complaint. 9- Evaluating the ability of syndromic surveillance systems to detect outbreaks is difficult because outbreaks are rare, and those of potentially bioterroristic-induced diseases are virtually nonexistent. Successful outbreak detection from syndromic surveillance entails accurately identifying cases of concern and determining when the number of relevant cases has exceeded the number expected for a certain period or geographic region. 11,
Volume 46, no. 5 : November 2005 Annals of Emergency Medicine 445
Editor’s Capsule Summary What is already known on this topic In theory, syndromic surveillance systems based on chief complaint can detect outbreaks sooner than diagnosis- based systems because chief complaints are immediately available, whereas diagnoses require coding. This advantage must be balanced against the possibility that chief complaints are inadequate to accurately identify patients with bioterrorism syndromes. What question this study addressed This study assessed the accuracy of a classifier that used Bayesian analysis of free-text chief complaints to identify patients with features of 1 or more of 7 bioterrorism syndromes. The criterion standard was patients assigned to these 7 syndromes on the basis of their International Classification of Diseases, Ninth Revision (ICD-9) diagnostic codes. The study was conducted using patients from the locale where the classifier was developed and repeated using data from a different region of the United States. What this study adds to our knowledge This study demonstrates that this chief complaint classifier had only moderate sensitivity (30% to 75%) for the 7 syndromes as identified by ICD-9 code. Results were similar in the local and distant data sets. Although some discrepancies were due to errors in the chief complaint classifier that could be improved with design refinements, most were due to a mismatch between the chief complaint and the diagnosis. These discrepancies would be inherent in any chief complaint–based classifier. How this might change clinical practice Although these data suggest that chief complaint classifiers may not be sensitive or accurate for individual patients, they do not necessarily imply that chief complaints are unable to perform adequate surveillance in populations. Further study is needed to determine the best ways to screen routine emergency department data to identify bioterrorism outbreaks as quickly and accurately as possible.
This article addresses the first point: syndromic case classification.
Background It is unclear what types of ED data are most useful for syndromic surveillance. Coded ED diagnoses are attractive because of the specificity of information but are not available at all hospitals or are only available several hours or days after admission. Free-text triage chief complaints have the advantage
of being nearly ubiquitously available in the United States and are usually available electronically as soon as the patient is registered. However, to be useful, the chief complaints must first be classified into syndromic categories or into some other type of coded representation that can be manipulated by a computer. In the Real-time Outbreak and Disease Surveillance system (RODS), 10,17^ chief complaints are classified into syndromic categories by a naive Bayesian classifier called CoCo. 18 CoCo assigns every patient a syndromic category based on the patient’s chief complaint. The number of classifications in every syndromic category is monitored by time-series detection algorithms 14,16^ and shown in graphic form on the RODS user interface. If the number of patients presenting with gastrointestinal complaints, for instance, exceeds the number expected, RODS sends an electronic alarm to a team of researchers and public health physicians. RODS is an open-source 19 biosurveillance system, the development of which began in 1999. RODS collects ED registration data in real time, including age, sex, zip code, and triage chief complaint from more than 100 emergency care facilities in Pennsylvania, Utah, Ohio, and New Jersey. In this study, we measured CoCo’s accuracy at identifying individual cases of concern to public health for 7 early presentations of disease (syndromes): respiratory, gas- trointestinal, neurologic, hemorrhagic, rash, constitutional, and botulinic. We measured the performance of syndromic case classification from free-text triage chief complaints in a single ED using primary International Classification of Diseases, Ninth Revision (ICD-9) discharge diagnoses as the criterion standard classification for 527,228 patients during a 13-year period. Our evaluation had 2 objectives: (1) determine how accurately CoCo classifies patients into syndromic categories and (2) determine whether CoCo can be applied to chief complaints from geographic locations different from the locality where the chief complaints in CoCo’s training set were generated.
Study Design This observational study examined the performance of a Bayesian classifier at categorizing patients into 1 of 7 syndromes based on triage chief complaints. The study used retrospective data collected throughout 13 years at a single ED.
Setting The study was conducted on data collected from the University of Pittsburgh Medical Center (UPMC) Presbyterian Hospital ED from December 1990 to September 2003. The ED at the UPMC Presbyterian Hospital admits approximately 40,000 adult patients a year (48% women, 52% men in 2004), and patient visit data have been stored in the Medical Archival System database since 1990, including free-text triage chief complaints, dictated and transcribed ED reports, and coded ICD-9 discharge diagnoses.
Emergency Department Chief Complaints Chapman, Dowling & Wagner
446 Annals of Emergency Medicine Volume 46, no. 5 : November 2005
been evaluated for syndromic case classification performance on single syndromes, such as gastrointestinal syndrome, 23 and for 5 syndromes on a medium-sized test set. 21 The aim of objective 1 was to evaluate CoCo’s performance on a large database of patients for all 7 syndromes monitored by RODS. Monitoring CoCo’s classifications for unusual spatial or temporal patterns acts as a screening test for potential outbreaks that could provide an earlier signal than other types of data, such as laboratory-confirmed diagnoses. In tests designed to screen individuals, high sensitivitydeven at the expense of extra false positivesdis desirable. Outbreak detection algorithms are not designed to screen individuals but to screen a population of individuals. A high sensitivity would enable surveillance systems to detect smaller outbreaks sooner. However, a low sensitivity may not mean that the outbreak cannot be detected but that the size of the outbreak must be larger to be detected and that detection will be less timely than if case classification performance were perfect. Retrospective studies have shown that chief complaint classification can detect respiratory, gastrointestinal, and influenza outbreaks earlier than traditional reporting methods, despite the imperfect sensitivity of chief complaint classification. 13,24^ An acceptable level of sensitivity is dependent on the size of an outbreak, but to provide earlier detection than traditional reporting methods, a syndromic case classifier probably needs to perform with at least 35% sensitivity. We performed an error analysis of CoCo’s syndromic case classifications. For every syndrome, we randomly selected 100 false-negative and 100 false-positive classifications. JND manually classified the 1,400 chief complaints into syndromic categories, and we calculated the proportion of CoCo’s classifications that did not match JND’s. A high proportion of errors by CoCo may mean that misclassifications in syndromic case classification in the primary data analysis were directly due to CoCo’s errors, which could be the result of a misclassification (eg, classifying ‘‘viral meningitis’’ as constitutional instead of neurologic) or classification into 1 category instead of 2 (eg, classifying ‘‘headache nausea’’ as
neurologic when it should be neurologic and gastrointestinal). Half credit was given for the second type of error. JND also examined the chief complaint and correlating primary ICD-9 diagnosis for each of the 1,400 patients. He recorded whether the syndromic misclassification was (1) due to CoCo’s error (eg, classifying ‘‘stomach flu’’ as constitutional instead of gastrointestinal) or (2) due to a mismatch between the syndrome indicated by the chief complaint and that indicated by the ICD-9 diagnosis (eg, if CoCo correctly classified ‘‘mi’’ as ‘‘other’’ but the ICD-9 discharge diagnosis was 486 [Pneumonia], which provides a criterion standard classification of respiratory). Syndromic misclassifications caused by a mismatch between the chief complaint and the ICD-9 code are not an indication that CoCo erred but that the chief complaint did not adequately reflect the patient presentation. Objective 1 evaluated CoCo’s case classification per- formance on patients at a single hospital in Pennsylvania (UPMC test set). The set of chief complaints used to train CoCo consisted of more than 10,000 chief complaints, approximately half of which came from Pennsylvania and half from Utah. RODS uses CoCo to assign syndromic categories to chief complaints from more than a hundred hospitals in 4 states, and an important practical question is how well CoCo performs on chief complaints from a geographic location not represented in CoCo’s training set. If CoCo does not perform well for other geographic locations, CoCo would require substantial retraining for every new location monitored by RODS. We evaluated CoCo’s performance when trained on chief complaints originating from the same location as those in the test set (local training set) and on chief complaints originating from a different location than those in the test set (nonlocal training set). We divided the chief complaints used to train CoCo into 2 sets: the 5,474 chief complaints that originated from UPMC (UPMC training) and the 5,323 chief complaints that originated from hospitals in Utah (Utah training). All training cases were previously classified into syndromic
Table 1. Examples of ICD-9 codes in criterion standard classification lists.
Syndrome (Number of Codes) Examples of ICD-9 Codes
Respiratory (n=287) 020.3 (Primary pneumonic plague), 021.2 (pulmonary tularemia), 011 (pulmonary tuberculosis), 480 (viral pneumonia), 487 (influenza), 033 (whooping cough), 511 (pleurisy), 786.05 (shortness of breath) Botulinic (n=60) 005.1 (Botulism), 045.0 (acute paralytic poliomyelitis, bulbar), 357 (acute infective polyneuritis), 351. (Bell’s palsy), 368.2 (diplopia), 374.3 (ptosis of eyelid), 787.2 (dysphagia) Gastrointestinal (n=119) 001 (Cholera), 003.0 (salmonella gastroenteritis), 005 (food poisoning, other bacterial), 007. (cryptosporidiosis), 787.91 (diarrhea) Neurologic (n=111) 066.4 (West Nile fever), 331.81 (Reye’s syndrome), 323 (encephalitis, myelitis, and encephalomyelitis), 094.2 (syphilitic meningitis), 320 (bacterial meningitis), 780.01 (coma), 784.0 (headache) Rash (n=99) 022.0 (Cutaneous anthrax), 050 (smallpox), 034.1 (scarlet fever), 053 (herpes zoster), 055 (measles), 684 (impetigo) Constitutional (n=66) 020.0 (Bubonic plague), 002.0 (typhoid fever), 075 (infectious mononucleosis), 079.9 (viral infection nos), 780.6 (fever), 780.7 (malaise and fatigue) Hemorrhagic (n=89) 065 (Arthropod hemorrhagic fever), 530.82 (esophageal hemorrhage), 535.01 (acute gastritis w/hemorrhage), 578.0 (hematemesis), 599.7 (hematuria)
Emergency Department Chief Complaints Chapman, Dowling & Wagner
448 Annals of Emergency Medicine Volume 46, no. 5 : November 2005
categories by a single physician blinded to the origin of the complaints. We trained a separate version of CoCo for each training set. Each version of CoCo was tested on 2 test sets: (1) the UPMC test set evaluated in objective 1 and (2) 30,094 patients from various hospitals in Utah, described in Gesteland et al^21 (Utah test set). For each test set, we calculated outcome metrics described above for every syndrome and compared CoCo trained on local and nonlocal chief complaints by calculating the differences in the percentage of records correctly classified, along with 95% confidence intervals for the differences. 25, Because the Utah test set had been classified into syndromes by Gesteland et al^21 for a previous evaluation, we applied the Gesteland criterion standard to the UPMC test set for the evaluation of objective 2. The Gesteland criterion standard ICD-9 list consists of 455 codes that are a subset of the criterion standard used for objective 1 and apply to only 5 of the 7 syndromes monitored by RODS. Therefore, the evaluations for objective 2 included only respiratory, botulinic, gastrointestinal, neurologic, and rash syndromes. To better understand the differences in the 2 training sets, we calculated how many terms were unique to either the UPMC or Utah training set and how many terms were in common.
Sensitivity Analysis In the primary data analysis, we made the assumption that a group of ICD-9 codes representing diseases and findings consistent with the syndromes is a valid criterion standard for syndromic classification. However, the results of the evaluation may be sensitive to the choice of codes we included in the criterion standard. We therefore compared CoCo’s performance on 4 groupings of ICD-9 codes, ranging from the broad grouping used for objective 1 to a narrow grouping containing only bioterrorism diseases. To do this, JND classified all criterion standard codes into 1 of the following 4 classes (indicated in Table E1): (1) Bioterrorism Diseases (n=98): diseases that could have been caused by a bioterrorist threat (eg, botulism, typhoid fever, cholera, hemorrhagic fever, histoplasmosis meningitis, smallpox, pneumonic plague) (2) Analog Diseases (n=178): diseases that present similarly to bioterrorism diseases but are most likely caused by nonbioterrorist threats (eg, myasthenia gravis, toxoplasmosis, food poisoning, acute gastritis with hemorrhage, tuberculosis meningitis, chicken pox, pneumonia caused by streptococcus) (3) Related Disorders (n=298): diseases and disorders that may share the same syndromic presentation in the early stages but become distinguishable from bioterrorism diseases as they develop (eg, acute poliomyelitis, Coxsackie virus infection, unspecified protozoal intestinal disease, acute gastritis with hemorrhage, enteroviral meningitis, herpes simplex, whooping cough) (4) Signs and Symptoms (n=257): signs and symptoms that commonly occur with bioterrorism diseases (eg, diplopia,
fever, diarrhea, unspecified hemorrhage, acute delirium, shortness of breath) We calculated outcome metrics for CoCo on 4 groupings of ICD-9 codes: the broadest group included all 4 types of codes, a narrower group included all but the signs and symptoms, a still narrower group included only analog diseases and bioterrorism diseases, and the narrowest group included only bioterrorism diseases. We used computer programs written in the Python language by WWC to perform all analyses.
Of 577,522 patients admitted during the study period, 527,228 were included in the study. We excluded approximately 19,000 patients because of missing chief complaints or discharge diagnoses and 31,000 because of an error in the computer script that retrieved only one third of the patients admitted in 1995. Of the 527,228 patients in the study, 85,569 (16.2%) were classified into 1 of the 7 syndromes by criterion standard classification. The most frequent syndromic classification was respiratory syndrome (34,916), and the least frequent was botulinic (1,961). Table 2 shows CoCo’s syndromic classification performance on the test patients. Accuracy ranged from 92% for neurologic and respiratory to 99% for botulinic. CoCo’s sensitivity ranged from 30% for botulinic to 75% for hemorrhagic, with a median of 63%. Specificity and negative predictive value were high for all syndromes, whereas positive predictive value ranged from only 12% for neurologic to 44% for respiratory. From a random sample of 1,400 false negatives and positives, we estimated the proportion of chief complaints incorrectly classified by CoCo by comparing CoCo’s classifications to those of a criterion standard physician. Overall, 26% (369.5/1,400) of the sample chief complaints were incorrectly classified (31% [216/700] of false negatives and 22% [153.5/700] of false positives). These results demonstrate that some of the false positives and negatives were caused by CoCo’s errors in chief complaint classification but also that the majority of misclassifications were not due to CoCo’s errors. By examining the chief complaint and the correlating ICD-9 code for each of the 1,400 patients, we calculated the percentage of false-positive and false-negative cases caused by CoCo’s errors versus a mismatch between the ICD-9 code and the chief complaint. CoCo’s errors accounted for 14% (202/ 1,400) of misclassifications (25% [177/700] of false-negative and 3.6% [25/700] of false-positive classifications). The remaining 86% of misclassifications were due to a mismatch between the syndrome indicated by the chief complaint and that indicated by the criterion-standard ICD-9 code. Results from training CoCo on local and nonlocal chief complaints are shown in Table 3. The effect size of training with local data was small. Although there were a few statistically significant differences in the percentage of records correctly classified between local and nonlocal training sets, the largest
Chapman, Dowling & Wagner Emergency Department Chief Complaints
Volume 46, no. 5 : November 2005 Annals of Emergency Medicine 449
threats (ie, influenza and tuberculosis). Because these diseases were listed as threats by 2 agencies in the compilation by Wagner et al, 20 their corresponding ICD-9 codes were classified in this project as bioterrorism diseases. As expected, specificity was robust across the different criterion standards. Sensitivity was fairly robust across the different criterion standards, with a few exceptions. Sensitivity for botulinic syndrome was very low until the signs and symptoms were added to the criterion standard. Three patients with botulism (005.1) as a primary diagnosis occurred in the test set, and CoCo did not detect any of them. The chief complaints for the patients were ‘‘myasthenia gravis,’’ ‘‘botulism,’’ and ‘‘stroke.’’ CoCo had not been trained on the former 2 complaints and classified the third as neurologic. Some of the chief complaints for patients with analog diseases and related disorders, including 358.0 (myasthenia gravis), 358.
(myoneural disorders), and 351.0 (Bell’s palsy), described signs and symptoms CoCo was trained to call botulinic (eg, ‘‘slurred speech’’ and ‘‘double vision’’). However, the majority of the complaints were neurologic (eg, ‘‘poss tia,’’ ‘‘poss cva,’’ ‘‘rt sided facial droop,’’ ‘‘l sided numbness,’’ ‘‘stroke’’). Many of the ICD-9 codes may represent findings consistent with more than 1 syndrome, but we allowed a code to represent only a single syndrome in the criterion standard. The sensitivity analysis revealed the ambiguity, particularly between the constitutional and respiratory categories. For instance, sensitivity in the constitutional category dropped from 62.5% when only bioterrorism codes were used to 16.8% after analog diseases such as 075 (infectious mononucleosis) and 078.5 (cytomegaloviral disease) were added, which typically present with constitutional and respiratory complaints, such as sore throat and pneumonia, respectively. Also, as shown in Table 5, the majority of cases with
Table 4. Twenty most frequent terms shared by UPMC and Utah test sets (Shared) and unique to each set.
Shared Unique to UPMC Training Unique to Utah Training
Word Frequency Word Frequency Word Frequency
Pain 1,231 Injured 25 Walkin 292 Inj 587 S 25 Fup 244 R 361 Liver 20 Gx 128 L 358 Mi 16 Fu 118 Lac 356 Passenger 14 U 41 Abd 290 Abnormal 13 Glf 34 Rt 271 Known 12 Mult 34 Back 254 Packing 12 Skiing 27 In 237 Patient 12 Preg 25 Fever 228 Return 12 Lac 24 Lt 205 Ambulate 9 Snowboarding 20 Vomiting 197 Angina 9 Pn 16 Cough 190 Per 9 Ce 14 Chest 162 Vomit 9 Prob 14 Walk 156 Ca 8 Complaint 13 Head 155 Sensation 8 Nki 13 Nausea 155 Cirrhosis 7 Other 13 Finger 153 Os 7 Pw 13 Arm 147 Sepsis 7 Runny 13 Injury 147 Third 7 Fing 12
Table 5. ICD-9 bioterrorism codes that occurred at least 5 times in the set of 527,228 patients.
ICD-9 Code for Bioterrorism Disease Frequency Correct, No. Correct Classification Sensitivity 95% CI
487.1 Influenza with other respiratory manifestations 434 96 Respiratory 18.5–26.
487.8 Influenza with other manifestations 15 9 Constitutional 35.8–80.
487.0 Influenza with pneumonia 12 5 Respiratory 19.3–68.
011.90 Unspecified pulmonary tuberculosis, unspecified examination 10 4 Respiratory 16.8–68.
011.93 Unspecified pulmonary tuberculosis, tubercle bacilli found (in sputum) by microscopy
6 1 Respiratory 3.0–56.
003.0 Salmonella gastroenteritis 5 4 Gastrointestinal 37.6–96.
Chapman, Dowling & Wagner Emergency Department Chief Complaints
Volume 46, no. 5 : November 2005 Annals of Emergency Medicine 451
a respiratory bioterrorism disease (434) had a primary diagnosis of 487.1 (influenza with other respiratory manifestations). Patients with this ICD-9 diagnosis may be as likely to have a constitutional complaint as a respiratory one, and, accordingly, CoCo’s sensitivity at respiratory classification of patients with bioterrorism codes was only 24%.
The main limitation of our study was the use of ICD- codes for the criterion standard. The majority of misclassified cases were not due to CoCo’s errors but to a mismatch between the patients’ ICD-9 diagnoses and their chief complaints. Others have measured a lack of correlation between the syndrome implied by the chief complaints and ICD-9 discharge diagnoses. 27,28^ Evidence suggests that ICD-9 discharge diagnoses are more accurate than chief complaints at predicting a patient’s syndromic classification. Beitel et al 27 showed that ICD- discharge diagnoses were more accurate at identifying pediatric respiratory illness than correlating chief complaints. A study by Fleischauer et al^29 indicated that ICD-9 classification of patients was more accurate than chief complaint classification for 10 syndromes when compared with manual classification. ICD-9 codes have been used with various degrees of success as diagnostic criterion standards. 30-32ICD-9 codes are often assigned to optimize the amount billed for a visit and therefore do not always accurately represent clinical information. 33 However, because we used ICD-9 codes to select patients with a general syndromic presentation, the selection method should result in fewer false positives than if we were using the codes to select patients with specific diseases. We are in the process of evaluating syndromic case detection performance with a subset of 1,600 cases from the UPMC test set in which the criterion standard will be physician judgment based on reading the ED reports. Another potential limitation in our design was having only 1 physician compile the criterion standard codes, which may have been more complete if we had combined codes from several physicians’ compilations. It should be noted that a large number of the codes the physician included (455) were from the study by Gesteland et al^21 in which multiple
physicians verified inclusion of the codes. A single physician also acted as the criterion standard in the error analysis. Generating a reliable criterion standard for classification of chief complaints into syndromic categories not previously defined or validated is not straightforward. Because CoCo’s training cases were manually classified by JND, we believed his classifications could be considered the optimal criterion standard for evaluating CoCo’s classifications. However, if we had used multiple physicians for the error analysis, we could have quantified the reliability of the criterion standard.
This paper presents a detailed evaluation of the ability to accurately classify patients into syndromic categories based on their chief complaints by testing on all ED admissions at UPMC Presbyterian Hospital during a 13-year period and by assessing performance on 7 syndromes, including syndromes that are rare and difficult to characterize. Approximately 16% of the patients in the study were classified into 1 of 7 syndromic classifications by the criterion standard ICD-9 discharge diagnosis. Prevalence of the individual syndromes ranged from 0.4% for botulinic to 6.6% for respiratory. Syndromic surveillance applications attempt to identify potential outbreaks in a screened population before the outbreak has been confirmed. Because surveillance tools generally monitor chief complaints for thousands of patients in a given geographic region, perfect sensitivity is not required for detection of an outbreak. For example, if only 1 of 2 patients presenting to the ED with respiratory symptoms is correctly classified as respiratory, an increase in the number of respiratory patients could still be detected if the increase were statistically higher than the expected number of respiratory patients. CoCo identified 61% of all criterion standard syndromic cases (52, true-positive cases for all 7 syndromes divided by 85, criterion-standard-positive cases), with sensitivity ranging from 30% to 75%, depending on the syndrome. These results suggest that despite the fact that triage chief complaints are brief and are entered into the computer before a patient has been treated by a physician, the complaints contain information
Table 6. CoCo’s performance when the criterion standard set is grouped by type of ICD-9 code, ranging from only codes representing bioterrorism threats to all relevant codes.*
Bioterrorism Codes
Analog Disease and Bioterrorism Codes
Related Disorder, Analog Disease, and Bioterrorism Codes
Signs and Symptoms, Related Disorder, Analog Disease, and Bioterrorism Codes
Syndrome Num Sens Spec Num Sens Spec Num Sens Spec Num Sens Spec
Respiratory 481 24.1 90.6 8,057 61.5 91.3 29,336 67.2 93.9 34,916 63.1 94. Botulinic 3 0.0 99.2 81 3.7 99.2 394 2.0 99.2 1,961 30.1 99. Gastrointestinal 7 85.7 93.1 2,587 53.3 93.3 6,412 66.2 93.8 20,431 69.0 95. Neurologic 0 N/A 91.8 179 46.3 91.9 624 55.3 91.9 7,393 67.6 92. Rash 0 N/A 99.1 1,877 51.9 99.3 2,232 46.8 99.3 2,232 46.8 99. Constitutional 16 62.5 95.8 358 16.8 95.8 4,303 31.1 96.0 10,603 45.8 96. Hemorrhagic 0 N/A 97.3 1,693 75.1 97.6 2,393 69.0 97.6 8,033 75.2 98.
Num, Number; N/A, not applicable. *The broadest group (farthest right column) is the criterion standard used for the primary data analysis.
Emergency Department Chief Complaints Chapman, Dowling & Wagner
452 Annals of Emergency Medicine Volume 46, no. 5 : November 2005
surveillance should be applied requires more thorough evaluation of its effectiveness. 39 We have performed a detailed evaluation of syndromic case classification into 7 syndromes for 13 years of ED registrations. This study helps define the range of accuracy that is obtainable for syndromic classification of patients using only their ED chief complaints. The results show that for the syndromes currently used by RODS, which are similar to syndromes used by other syndromic surveillance systems, automated chief complaint classification can identify more than half of the patients with the syndromes. The accuracy is not sufficient for individual patient classification, but other research has demonstrated that it is adequate for retrospective detection of large outbreaks.5,13,24^ Automated chief complaint classification also has potential for contributing to detection of smaller outbreaks, should they be geographically or sociodemographically confined. This research does not directly measure whether detection will occur earlier than current best practices. Nevertheless, this study brings us closer to answering the critical question of the effectiveness of syndromic surveillance from ED data.
We would like to thank Jagan Dara, MS, for his help in retraining CoCo.
Supervising editor: Jonathan M. Teich, MD, PhD
Author contributions: WWC, JND, and MMW conceived and designed the study. MMW obtained research funding. JND was the medical consultant who designed the criterion standard and performed the error analysis. WWC collected and analyzed the data. WWC performed the statistical analysis of the data with input from JND and MMW. WWC drafted the manuscript, and all authors contributed substantially to its revision. WWC takes responsibility for the paper as a whole.
Publication dates: Received for publication August 25, 2004. Revisions received November 4, 2004, and March 4, 2005. Accepted for publication April 14, 2005. Available online July 14, 2005.
Address for reprints: Wendy W. Chapman, PhD, Center for Biomedical Informatics, Suite 8084 Forbes Tower, 200 Lothrop Street, Pittsburgh, PA 15213; 412-647-7113, fax 412-647-7190; E-mail [email protected].
REFERENCES
www.nyam.org/events/syndromicconference/2002/posterpdf/ dembek_poster.pdf. Accessed April 16, 2003.
Emergency Department Chief Complaints Chapman, Dowling & Wagner
454 Annals of Emergency Medicine Volume 46, no. 5 : November 2005
Chapman, Dowling & Wagner Emergency Department Chief Complaints
Volume 46, no. 5 : November 2005 Annals of Emergency Medicine 455