Kocher Criteria

David Stewart, MD

January 8, 2024

Other researchers have not been able to reproduce the predictive claims of the Kocher Criteria, likely due to methodological issues of the initial study.

This material is presented for educational purposes only and does not constitute medical advice. Consult with a board-certified orthopedic surgeon for specific recommendations for your child.

Introduction

The Kocher Criteria are four indicators proposed in 1999 by Dr. Mininder Kocher and colleagues at Boston Children's Hospital in an article in the Journal of Bone and Joint Surgery[1] for distinguishing septic hip, a bacterial infection requiring emergency surgery, from transient synovitis, a self-limited condition that resolves spontaneously, in children with hip pain. Since that time, the Kocher Criteria have been widely used in teaching and clinical practice. They have at times been extrapolated more broadly, being represented on some review sites as predictive of septic arthritis of other or any joint, notwithstanding the initial research examining only the hip.

The Claim

Kocher and colleagues wrote: “By combining variables, we were able to construct a set of independent multivariate predictors that, together, had excellent diagnostic performance in differentiating between septic arthritis and transient synovitis of the hip in children.” They presented variables ("predictors") as follows:

1. Non-weight bearing
2. History of fever ≥38.5° C
3. Erythrocyte sedimentation rate ≥ 40 mm/h
4. Serum white blood cell count ≥12×109 cells/L

They represent the probability of septic arthritis to be predicted by the number of "predictors" present:

0 criterion: 0.2%, 1 criterion: 3%, 2 criteria: 40%, 3 criteria: 93.1%, 4 criteria: 99.6%

The clear cut-offs made the Kocher Criteria attractive for clinical use and teaching. With only one or two factors present, the risk of septic arthritis was ostensibly remote, whereas three or four factors purportedly made septic arthritis overwhelmingly likely. Only in the mid-range, with two of four factors present, was there commonly interpreted to be significant uncertainly requiring clinical judgment and workup.

Less Predictive Than Claimed

Attempts by other researchers to validate the Kocher Criteria, however, found that these variables had substantially less predictive power than claimed in Kocher's paper.

Luhmann and colleagues at Washington University St. Louis found only 59% predictive accuracy with all four factors present.[2] They found three factors to be most correlated in their patients: (1) a history of fever, (2) a serum white blood-cell count of >12000/mm(3) (>12.0 x 10(9)/L), and (3) a previous health-care visit, with a 71% rate of septic arthritis in patients with all three factors.

Michelle Caird and colleagues at the University of Michigan found that a fifth variable, the C-reactive protein (CRP) > 2.0 mg/dL, had to be added.[3] Even then, the correlation was substantially lower than reported in Kocher's study. Some variables were more correlated with septic arthritis than others, with fever being most correlated, followed by CRP and WBC; refusal to bear weight and elevated WBC were the least correlated.

Sultan and Hughes in the United Kingdom found that the Caird Criteria performed better than the Kocher Criteria in all populations, but even then, septic arthritis of the hip was present in only 59.9% of patients with all five indicators present.[4]

Nickel and colleagues in Minnesota found that the Caird algorithm (Kocher Criteria plus CRP) was superior to the original Kocher criteria alone in all groups, and that it was helpful for distinguishing septic arthritis from other causes of single joint pain in all joints with an AUC (area under curve) of 0.80 for non-hip joints.[5] In contrast, Obey and colleagues at Washington University St. Louis reported that the Kocher criteria were poorly sensitive for septic arthritis of the knee, missing 52% of cases; adding CRP did not improve predictive power.[6]

Notwithstanding abundant additional research of similar or higher level of evidence and more robust methodological design compared to the original Kocher paper demonstrating far less predictive power than claimed, the Kocher criteria have continued to be widely cited, especially in teaching materials.

Research Shortcomings

It was already generally known that patients with bacterial joint infections more likely to demonstrate indicators of inflammation and discomfort (elevated temperature, erythrocyte sedimentation rate, white blood cell count, and non-weightbearing) than patients without. The Kocher Criteria do not deliver what was represented: a highly predictive indicator of septic joint with clear cutoffs between high and low risk patients and a narrow zone of uncertainty, especially for patients with multiple factors present.

Research to validate even well-designed models generally finds correlation somewhat below that cited in the original report. That is because best-fit models are optimized to the data points used to formulate them. Because of randomness or stochasticity, the model is less likely to fit external data as well.

The Kocher article demonstrates a major methodological issue which has received little critique to date: the arbitrary exclusion of more than 40% of the study population. 114/282 children (40.4%) presenting with hip pain were retrospectively excluded.

The authors wrote:

“Excluded patients (114) included those in atypical groups, such as those with immunocompromise (fourteen patients), renal failure (six patients), neonatal sepsis (six patients), postoperative infection of the hip (four patients), later development of rheumatological disease (four patients), later development of Legg-Perthes disease (one patient), or associated proximal femoral osteomyelitis (six patients) confirmed with bone aspiration in patients who had either radiographic changes in the proximal aspect of the femur or failure to appropriately respond to arthrotomy and antibiotics given intravenously.
“A patient was excluded if joint-fluid aspirate had not been obtained for a cell count, gram stain, or culture (fifty-seven patients); if peripheral blood had not been obtained for culture or a cell count (eight); if the white blood-cell count in the joint fluid was less than 50,000 cells per cubic millimeter with negative cultures but the patient was managed with an arthrotomy and intravenous administration of antibiotics (six); or if the white blood-cell count in the joint fluid was less than 50,000 cells per cubic millimeter with negative cultures but the patient was managed with intravenous administration of antibiotics alone on the medical service (two).”

Most of these exclusions were based on information not available to the clinician at presentation and subsequent clinician treatment choices rather than known patient factors. There is no way to know in advance, for instance, which patients will eventually be diagnosed with a rheumatologic condition or Perthes' Disease. A predictive algorithm, by definition, must apply to patients who present to the clinician on the basis of information available to the clinician at the time.

A proposed predictive model that discards a large portion of cases - not on the basis of information available at the time, but that retrospectively based on arbitrary factors - cannot be applied to all comers. Arbitrary exclusions based on information only subsequently available represent a form of data manipulation ("cherry-picking") that results in exaggeration of the predictivity of designated factors. How would one view a model that claimed to reliably predict whether a team would win or lose a sporting competition, but retrospectively excluded matches in which there was inclement weather, key players were injured, or the referees called multiple penalties?

That the "predictive factors" were not applied to over 40% of the patients who presented to the authors should substantially curb expectations of applicability to other populations. At a minimum, the paper required a forthright discussion of its exclusions and how these may substantially limit the model's applicability.

Results for these patients should have been presented. Simply removing outliers because they complicate the narrative or weaken the claim is problematic. Disclosures such as those below would have better delineated the model's limitations:

“For ___ patients, we were not able to definitively determine whether an infection was present or not.”
“These indicators were poorly predictive in patients with medical comorbidities including _____…”
“n patients with 3 or more positive indicators were subsequently determined to have [osteomyelitis/myositis/other conditions] rather than a septic hip.”

Proper exclusions can remove patients who don't qualify for the study because of specific factors, such as known disqualifying factors at presentation. Lack of available follow-up or disqualifying deviations from study protocol can also represent separate analysis, but require disclosure. Retrospective exclusion of outliers or patients who don't fit the authors' hypothesis is a form of data manipulation and is inconsistent with claims of prospective predictive value. Prospective studies are less prone to such data manipulation.

Because exclusion was based on factors not known upon presentation, it is unclear who exactly the Kocher criteria apply to. The authors methodology, It certainly does not apply, as the authors' abstract suggests and as it has been widely used, to every child with hip pain who comes through the door. This is a major issue limiting the study's relevance and generalizability.

Dispassionate Analysis or PR Claims?

The authors use rhetorical flourish more than compelling evidence to make their case. The article's title is of doubtful accuracy, claiming to be "an evidence-based clinical prediction algorithm" when it is in fact a retrospective study excluding over 40% of children who presented with hip pain. The authors present the arbitrary exclusion in noble-sounding terms, stating that it was done “to avoid information bias associated with incomplete data analysis and to avoid selection bias associated with inclusion of patients who had presumptive and inconsistent diagnoses,” when in fact it introduces significant biases and poses question marks over their findings.

The abstract makes no mention of the number of participants or exclusions. Listing the numbers of participants and their outcomes is standard in scholarly abstracts. The omission presents an incomplete and misleading picture to readers without access to the full text behind the journal's paywall, while deemphasizing these data for full readers.

The article makes sweeping claims of predictive value, while failing to acknowledge major limitations. The authors wrote:

“By combining variables, we were able to construct a set of independent multivariate predictors that, together, had excellent diagnostic performance in differentiating between septic arthritis and transient synovitis of the hip in children.”

Yet the study is retrospective, not prospective or prognostic. Factors identified are correlates, not predictors. This correlation cannot be excellent with over 40% of the study population excluded. The use of language not supported by the evidence presented ("evidence-base medicine", "predictors," "excellent diagnostic performance," etc.) obfuscates shortcomings.

The Kocher article identifies itself as a Level of Evidence (LOE) III retrospective study. Yet "poor or nonindependent reference standards," here the arbitrary post-hoc exclusions, would appear to reduce the level of evidence at least to Level IV. Nor is it clear precisely what a level of evidence would mean without a formulation of which patients the study would apply to at presentation. The JBJS is not clear on who is to designate a study's level of evidence. Levels of evidence would be more impartially assigned by an independent panel of methodology reviewers following consistent methodological standards rather than the study authors, who have a vested interest claiming the most robust level of evidence possible.

Based on the authors' methodology, one would not expect independent researchers conducting a truly perspective study to come anywhere close to reproducing the original authors' claims of predictive value for their factors identified. This is what has occurred. Multiple studies over more than two decades have demonstrated far lower predictive value of the Kocher Criteria than claimed by the original authors.

Conclusion

Outside research has consistently found the predictive accuracy of the Kocher Criteria to be lower than claimed in the original article. The exclusion of over 40% of children presenting with hip pain in the original study undermines its predictive claims. Fever, weightbearing status, white blood cell count, erythrocyte sedimentation rate, and (in the Caird modification of the Kocher criteria) the C-reactive protein provide valuable clinical information. Limitations of the model should be kept in mind and all relevant clinical factors considered.

References

[1] Kocher MS, Zurakowski D, Kasser JR. Differentiating between septic arthritis and transient synovitis of the hip in children: an evidence-based clinical prediction algorithm. J Bone Joint Surg Am. 1999;81(12):1662-1670. Level III retrospective study [claimed]

[2] Luhmann SJ, Jones A, Schootman M, Gordon JE, Schoenecker PL, Luhmann JD. Differentiation between septic arthritis and transient synovitis of the hip in children with clinical prediction algorithms. J Bone Joint Surg Am. 2004;86(5):956-962. doi:10.2106/00004623-200405000-00011.

[3] Caird MS, Flynn JM, Leung YL, Millman JE, D'Italia JG, Dormans JP. Factors distinguishing septic arthritis from transient synovitis of the hip in children. A prospective study. J Bone Joint Surg Am. 2006;88(6):1251-1257. doi:10.2106/JBJS.E.00216. Level II prospective cohort study*

[4] Sultan J, Hughes PJ. Septic arthritis or transient synovitis of the hip in children: the value of clinical prediction algorithms. J Bone Joint Surg Br. 2010;92(9):1289-1293. doi:10.1302/0301-620X.92B9.24286

[5] Nickel AJ, Bretscher BS, Truong WH, Laine JC, Kharbanda AB. Novel Uses of Traditional Algorithms for Septic Arthritis. J Pediatr Orthop. 2022;42(2):e212-e217. doi:10.1097/BPO.0000000000002024. Level III retrospective study

[6] Obey MR, Minaie A, Schipper JA, Hosseinzadeh P. Pediatric Septic Arthritis of the Knee: Predictors of Septic Hip Do Not Apply. J Pediatr Orthop. 2019;39(10):e769-e772. doi:10.1097/BPO.0000000000001377. Level III retrospective study

Any opinions expressed are solely those of the author and not those of Cure 4 The Kids Foundation.