Measuring Outcomes in Hand Surgery




Methods for measuring outcomes after hand and upper extremity surgery continue to evolve, but remain inconsistent in quality. This article reviews the use of patient-reported outcomes measures in patients having upper extremity surgery, and provides a practical guide to questionnaire selection, assessment, and use. It also presents the future direction of health services research, and how it will drive changes in measuring outcomes in hand surgery.


Key points








  • A shift towards value-based insurance models has begun with programs designed to reduce patient costs and increase access and use of high-value treatments, while discouraging low-value treatments.



  • It is important to choose the questionnaire(s) that will suit the study needs.



  • In addition, the investigator must be careful to avoid overburdening subjects with numerous tests and questionnaires.



  • Finding the right combination of outcomes metrics without compromising study quality can be facilitated in part by thoughtful selection of robust and appropriate instruments.



  • As the use of item response theory and computerized adaptive testing continues to mold the future of health services and outcomes research, measuring outcomes in hand surgery will again require a shift in technique, metric design, and study execution.






Overview


The upper extremity is a highly specialized functional, sensory, and aesthetic unit. The upper extremity can also suffer a unique range of insults. In 2010, the United States Bureau of Labor Statistics reported the annual incidence of hand injuries at 25.1 per 10,000 workers, and the most frequently injured population were young, active workers. When the costs of medical care, rehabilitation, and productivity loss are computed for this younger population of patients with trauma as well as the often older population with arthritis, neuropathies, and other sources of pain and functional loss, the burden of hand disorders is massive. How providers evaluate and manage hand disorders is critical to individuals and to society.


On the national level, as health care delivery and reimbursement in the United States undergoes rapid and substantial change, the focus on quality and value of care continues to increase. A shift towards value-based insurance models has begun. These programs aim to reduce patient costs and increase access and use of high-value treatments, while discouraging low-value treatments. Choosing Wisely and other similar campaigns are also emphasizing appropriate and evidence-based surgical interventions. Fee-for-service reimbursement is changing, and quality of care will play an increasing role in provider compensation. These developments have resulted in a renewed focus on the need for high-quality evidence to support provider decision making and delivery of care.


In the United States, Canada, the United Kingdom, and many other nations, health services research has been a substantial area of focus for more than 15 years. This field analyzes how patients access health care, what care costs, and what outcomes the patients experience as a result of this care. As work in this area continues to increase, the volume of literature addressing challenging issues in treatment quality, value, effectiveness, and appropriateness is growing. However, the quality of this literature is inconsistent. Interpreting the various results and their potential impact also continues to be a challenge. Considering the volume of hand disorders in the United States and worldwide, providing high-quality, sustainable, effective, and cost-conscious care is paramount. Especially when preparing for the changing landscape of health care, an awareness of the various factors affecting outcomes after hand surgery is critical for continued improvement and success.




Patient-reported outcomes in hand surgery


When evaluating the quality of care in hand surgery, standard functional metrics have traditionally been measured: fracture healing, range of motion, strength, sensation, and others. However, in many cases, what providers consider substantial improvement does not align with the perceptions and experiences of patients. That is not to say that traditional objective metrics cannot show significant differences in outcomes; rather, what is measured by these functional tests often does not translate to the outcomes desired by the patient, provider, or society. For example, fracture union on radiograph does not always equate with a patient having high satisfaction with their outcome or with returning to activities of daily living (ADL). A growing appreciation of this dichotomy has led the drive to using patient-reported outcome (PRO) metrics in the assessment of upper extremity disease. PRO questionnaires allow providers to assess function, health-related quality-of-life (HRQL), and satisfaction from the patient’s perspective.


Understanding of a patient’s HRQL requires an appreciation of physical, mental, and social well-being. How satisfaction, function, pain control, and other components can affect HRQL has a substantial impact on treatment decisions and outcomes. In addition, the degree to which expenditures can be justified is guided by the expected improvement in HRQL. Improving the way these components are measured has formed the basis for design, application, and evaluation of numerous PRO instruments. The design and refinement of a PRO instrument is a difficult task, requiring a mix of qualitative and quantitative assessments. It must be tested with pilot patient cohorts, and complex statistical analysis is needed to determine reliability and consistency. The instrument must then be evaluated for validity and responsiveness for the disease-state in question, which requires that each new metric be examined for each specific subset of patients the investigators intend to evaluate. The details of this process, and the various statistical measurements that are used, have been described by numerous investigators and are not covered in detail here. Table 1 contains a list of key quality domains and definitions.



Table 1

Definitions for key measurement properties used in evaluating the quality of patient-reported outcomes instruments








































Measurement Property Definition
Content validity The degree to which the content of an instrument is an adequate reflection of the construct to be measured
Criterion validity Strength of relationship between questionnaire scores and a measurable external criterion (the gold standard)
Construct validity The degree to which the scores of a questionnaire are consistent with the theoretic construct (hypothesis) that is being measured
Face validity The degree to which items in an instrument look as though they are an adequate reflection of the construct being measured
Internal consistency The extent to which the items are interrelated, and thus measure the same construct
Reliability The extent to which patients can be distinguished from each other despite measurement errors
Test-retest reliability The extent to which scores for patients who have not changed are the same in repeated measurements over time
Inter-rater reliability The extent to which scores for patients who have not changed are the same over repeated measurements by different examiners during the same visit
Responsiveness The ability to detect clinically meaningful change over time in the construct being measured
Interpretability The degree to which quantitative scores can be given qualitative meaning. Identifying clinically important differences in results
Cross-cultural equivalence The same measurement instrument used in different cultures measures the same construct without additional external cultural influences on results




Choosing an outcomes instrument


Even when properly vetted, validated PRO metrics do not all perform at the same level. For example, the Michigan Hand Questionnaire (MHQ) and the Disability of Arm, Shoulder, and Hand (DASH) questionnaire have both been validated for patients with carpal tunnel syndrome (CTS) ; however, with additional subdomains geared towards more than just functional aspects of disease, the MHQ is better able to evaluate the symptomatic components of CTS. Another example is the Short Form-36 (SF-36) that has been validated for rheumatoid arthritis (RA). For patients with RA, DASH scores were highly correlated with SF-36 for pain, but DASH was only moderately correlated for physical and mental function. In contrast, for patients after distal radius fracture fixation, MHQ and DASH are significantly more responsive than SF-36. The challenge in hand surgery is deciding which metrics should be used for each patient population. Although the number of available PROs continues to grow, the number of valid and robust outcomes measures remains few and inconsistently used. Appropriate selection of PRO metrics governs the value of any study results.




PRO instruments


PRO questionnaires are classified as general, system specific, and disease specific. General PRO measures evaluate qualitative and quantitative aspects of the patient’s life without focusing on any specific disease or organ system. They ascertain general well-being, including components of pain, vitality, emotional and mental health, and self-assessment of ability to perform daily functions and activities. The SF-36 and Arthritis Impact Measurement Scales 2 (AIMS2) are frequently used general PRO measures in hand surgery outcomes research.


System-specific, or domain-specific, instruments focus on an organ system or functional unit. These PRO metrics are geared toward better understanding of how the specific system of interest is affected by a disease state, what effects this has on the patient, and how these problems improve after intervention, which makes domain-specific instruments more valuable in intervention trials, but less likely to detect broader features of health states. The most commonly used instruments in upper extremity studies are the MHQ, DASH, and Patient-rated Wrist Evaluation (PRWE) outcomes questionnaire.


Disease-specific instruments are geared toward a population grouped by a particular disorder. These metrics are used in evaluating treatment of the specific disease. The focused nature of the questionnaire often results in high responsiveness when used in the appropriate patient population. However, the design often limits use in evaluating other diseases, even within the same system, which restricts how the results from a disease-specific instrument are used. The Carpal Tunnel Questionnaire (CTQ) is a commonly used disease-specific instrument.


For PRO metrics of all types, it is important to consider cross-cultural applications as well. Validity and responsiveness are population dependent, and this is an even greater issue when the different populations of interest do not speak the same language or live with similar cultural norms. The process of translating and subsequently validating quantitative and PRO instruments is challenging. It not only requires language conversion but also ensuring that subtle nuances and organizational aspects of the translated questionnaire do not adversely affect the way patients understand and answer questions. This can be something as clear as Korean patients showing limited understanding of questions related to self-feeding with a spoon rather than using chopsticks. It can also be more complex, such as loss of idiomatic quality in translation from English to Spanish resulting in patients perceiving the questionnaire as less serious. The details of these concepts are beyond the scope of this article. However, as health care delivery and research becomes increasingly global, instruments with adequate cross-cultural equivalence will have broader usability in patient care and health services research.




Understanding the literature on PRO metric quality


Understanding the classification scheme discussed earlier is only a small part of the decision tree in selecting outcomes tools. Adequate consistency, reliability, validity, and responsiveness of the instrument are a large component of this decision process as well. Although the volume of literature evaluating these quality measures of the different PRO metrics continues to increase, understanding these studies and the quality of their results remains challenging for most. Making this even more problematic, definitions and usage of terms are inconsistent across various studies, which results in difficult decision making in planning a PRO-focused study, and limits the quality of methodology and content of systematic reviews.


A common concern when using PRO measures is how to interpret the scores. For example, what does a 10-point difference in the MHQ after treatment really mean; it is statistically significant, but is it clinically significant? Interpretability provides an indication as to how well the quantitative data can be translated into qualitatively (clinically) relevant results. This is most often done by determining the minimal clinically important difference (MCID). In patients with CTS, the MCID of the MHQ pain subdomain is 23, whereas the MCID for the function subdomain is 13. For patients with RA, the MHQ subdomain MCID for pain is 11 and for function it is 13. Although useful when available, the applicability is limited because meaningful clinical change varies between patient groups. However, having the MCID for a questionnaire in the population being evaluated gives an indication as to the clinical relevance of study results.


An additional approach to addressing the challenges in PRO metric evaluation has been to set guidelines and quality standards. Terwee and colleagues published quality criteria for measurement properties ( Table 2 ), and provided guidelines as to how readers can critically evaluate published results. The Consensus-based Standards for the Selection of Health Measurement Instruments (COSMIN) study group has presented results of a 4-round Delphi study, releasing additional guidelines on evaluating the methodological quality of studies on health status measurement instruments. These guidelines include taxonomy of relationships of measured properties ( Fig. 1 ), and a thorough analysis of what properties and methods must be used and reported for the study to be of adequate quality. The COSMIN group has challenged some of the traditional tools and methods used in vetting these studies, and developed a series of checklists that guide thorough analysis of published results. These sets of standards and checklists are not intended to rate the specific instruments; rather, they provide a systematic approach to evaluating the studies that report on instrument quality, regardless of the study’s conclusion. Based on these standards, numerous systematic reviews have assessed the measurement properties and clinimetrics of available PRO metrics. One such study evaluated the clinimetric properties of instruments used to assess patients with hand injuries. They concluded that most functional and patient-reported measures have been inadequately evaluated. MHQ, DASH, and CTQ are 3 of only 5 questionnaires to receive strong ratings, in that well-executed studies properly report reliability, validity, and responsiveness of these metrics.



Table 2

Quality criteria for the key measurement properties used in evaluating patient-reported outcomes instruments































Measurement Property Quality Criteria: Positive Rating
Content validity A clear description is provided of the measurement aim, the target population, the concepts that are being measured, and the item selection; and target population and investigators and/or experts were involved in item selection
Internal consistency Factor analyses performed on adequate sample size (7 × # items and ≥100), and Cronbach α calculated per dimension, and Cronbach α between 0.70 and 0.95
Criterion validity Convincing argument that gold standard is gold, and correlation with gold standard ≥0.70
Construct validity Specific hypotheses were formulated, and at least 75% of the results are in accordance with these hypotheses
Reliability Intraclass correlation coefficient (for continuous measures) or weighted κ (for ordinal measures) ≥0.70
Responsiveness SDC or SDC < MCID or MCID outside the limits of agreement or Guyatt responsiveness ratio >1.96 or area under the receiver operating curve ≥0.70
Floor and ceiling effects ≤15% of the respondents achieved the highest or lowest possible scores
Interpretability Mean and standard deviation scores presented for at least 4 relevant subgroups of patients, and MCID defined

Abbreviation: SDC, smallest detectable change.

Adapted from Terwee CB, Bot SD, de Boer MR, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol 2007;60:34–42; with permission.



Fig. 1


Consensus-based standards for the selection of health measurement instruments (COSMIN) taxonomy of relationships of measurement properties. HR-PRO, health-related patient-reported outcome.

( From Mokkink LB, Terwee CB, Patrick DL, et al. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol 2010;63:737–45; with permission.)

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Nov 20, 2017 | Posted by in General Surgery | Comments Off on Measuring Outcomes in Hand Surgery

Full access? Get Clinical Tree

Get Clinical Tree app for offline access