39 Outcomes And Quality-Of-Life Measures In Pelvic Floor Research

Outcome measures are the tools used to determine the efficacy, safety, and side effects of a treatment. Researchers assess outcome measures before and after treatments in order to determine their relative efficacy. Clinicians can use outcome measures to track the success of their treatments and/or longitudinally to follow the outcomes of individual patients. Urinary incontinence, fecal incontinence, pelvic organ prolapse, and other pelvic floor disorders are multidimensional phenomena that can affect a patient in a wide variety of ways. They rarely result in severe morbidity or mortality; rather, they cause symptoms that can affect a woman’s daily activities and negatively affect her quality of life. No single measure can fully characterize the outcome of an intervention for these conditions. Therefore, outcomes of treatment should be evaluated in multiple areas or domains. A number of organizations including the International Continence Society, National Institutes of Health (NIH), and World Health Organization’s International Consultation on Incontinence have made recommendations to standardize outcome measures in studies of pelvic floor disorders. In general, all agree on several basic principles: (1) outcome assessments should be made using the same measures before and after the intervention, (2) both subjective and objective measures should be included, incorporating improvements and deterioration in function as well as complications of the intervention, and (3) pelvic floor disorders should be assessed from multiple domains including some or all of the following:

• The subject’s observations (symptoms)

• Quantification of symptoms

• The clinician’s observations (anatomic and functional)

• Quality of life

• Socioeconomic measures

A good outcome measure should be valid, reliable, simple to implement, easy to interpret, and able to detect clinically meaningful change. When planning a study, outcome measures should be selected within the context of the specific study’s hypothesis or goal. In general, they should be chosen so that they will be clinically relevant and so the results may be incorporated into practice at the end of the study. For a clinical trial, researchers will typically define a primary outcome and several secondary outcomes. The primary outcome is of central interest and should be tied directly to the study’s primary hypothesis. It is the primary outcome that is used for sample size determination. Secondary outcomes are the remaining outcome measures being assessed in the study. They are not the focus of the main study objective but provide additional data that are complementary to the primary outcome measure. Because of the desire to assess pelvic floor disorders in multiple domains, there are typically several secondary outcomes measures used in studies of these conditions.

In contrast to trials evaluating treatment of cancer or cardiovascular disease, where mortality is an obvious clinically relevant primary outcome measure, choosing the appropriate primary outcome measure for clinical trials of pelvic floor disorders can be challenging because “success” of treatment is often difficult to define. For a condition such as urinary incontinence, it might seem obvious that the primary outcome measure for a therapeutic trial should be “continence,” but unfortunately it is not that simple. No clear consensus exists on how best to define continence. Should a therapy be considered “successful” if the patient reports that she no longer leaks urine but has developed voiding dysfunction or new-onset urinary urgency? Similarly, should a patient beconsidered “continent” if she reports no urinary leakage after treatment on a voiding diary but leaks urine on multiple occasions during a urodynamic evaluation or has a positive pad test? The outcome or outcomes chosen as the primary outcome measure of a study can have a profound impact on the study’s results. This is nicely illustrated by the UK TVT RCT, a multicenter randomized trial comparing tension-free vaginal tape (TVT) to Burch colposuspension for treatment of stress urinary incontinence (Ward and Hilton, 2002). The authors of this study defined “cure” as the absence of stress incontinence at urodynamics and a negative 1-hour pad test (both criteria had to be satisfied to be considered a cure). Under this definition, 66% of subjects receiving TVT and 57% of those receiving colposuspension were cured. Figure 39-1 illustrates how varying the definition of cure would have resulted in different cure rates. Using the absence of stress incontinence at urodynamic testing alone as the definition of cure would have resulted in cure rates of 81% and 65% for TVT and colposuspension, respectively, consistent with previous reports in the literature. In contrast, the cure rate would have been less than 40% for both procedures if the authors had chosen a patient report of “no urinary leakage” on a symptom questionnaire as the primary outcome measure. This example illustrates not only the impact of choice of outcome measure on study results, but also emphasizes the importance of choosing the outcome measure prior to the onset of the study. To wait until completion of a study to define “success” or “cure” would allow for considerable manipulation of results.

Figure 39-1 Cure rates of the UK TVT RCT, a multicenter randomized trial of tension-free vaginal tape (TVT) versus Burch colposuspension for stress urinary incontinence, based on various definitions of cure, for TVT (dark grey) and colposuspension (light grey). CMG, Cystometrogram.

(From Hilton P. Trials of surgery for stress incontinence-thoughts on the ‘Humpty Dumpty principle.’ Br J Obstet Gynecol 2002;109:1091, with permission.)

In spite of the recent effort by national and international organizations to standardize outcomes in studies of pelvic floor disorders, there is currently no clearly established single best outcome measure or group of measures for any of these conditions, hence the emphasis on assessing outcomes in many domains. Traditionally, there has been a tendency to use objective physician- or test-based measures, such as urodynamic outcomes for studies of urinary incontinence and physical examination for studies of pelvic organ prolapse, as the primary outcome variables in intervention trials of pelvic floor disorders. However, in the last decade there has been increasing emphasis on patient-based outcomes such as symptom questionnaires and quality-of-life assessments. In fact, a survey of physicians, nurses, and patients by Tincello and Alfirevic (2002) found that subjective measures and improvement in quality of life were regarded by all groups as the most important outcomes in urogynecology studies. In spite of this, subjective outcomes alone are inadequate to accurately characterize the effects of treatment on disorders of the pelvic floor. Both subjective and objective outcome measures should be assessed and the primary outcome should be chosen based on the study’s goal.

Clinical trials can be separated broadly into explanatory trials and pragmatic trials. In explanatory trials, the intent is to determine not only if one treatment is superior to another, but why it is superior and to determine the mechanism of success or failure. In explanatory trials the inclusion/exclusion criteria tend to be strict in order to capture an “ideal” patient population. There is an emphasis on objective outcome measures that study mechanism of disease and treatment; these might include physiologic tests (i.e., urodynamics, anal manometry), radiologic evaluation, biochemical measures, etc. The goal of a pragmatic trial is to determine which treatment is superior in a real-world setting, without consideration of why it is superior or how it works. The inclusion/exclusion criteria of these trials tend to be liberal in order to mimic clinical practice, and the emphasis is on subjective patient-driven outcomes such as questionnaires or symptom diaries. When choosing outcome measures during the planning of a trial, a good first step is to determine whether the intent of the trial is to be explanatory or pragmatic. The next step should be to identify valid, reliable outcome measures that will appropriately meet the study’s objectives.

In this chapter, we will review many of the currently available outcome measures available to clinicians and researchers to assess the outcomes of treatment for pelvic floor disorders, including symptom diaries, pad tests, physical examination, physiologic tests such as urodynamics, symptom severity or bother questionnaires, quality-of-life questionnaires, and socioeconomic measures.

SYMPTOM DIARIES

Physicians often attempt to determine the presence and severity of a patient’s symptoms by history-taking or administering a questionnaire. These methods depend on the patient’s ability to accurately recall and report her recent health experiences. Research has shown, however, that recall is often unreliable and can result in inaccuracies and bias. The use of symptom diaries has been advocated in order to limit recall bias by capturing experiences prospectively close to or at the time of occurrence.

The bladder or urinary diary is perhaps the most common outcome measure used in studies of urinary incontinence and other forms of lower urinary tract dysfunction. It is also a useful clinical tool (see Chapters 6 and 14. In its simplest form, a patient is asked to prospectively record the time and number of voluntary voids and incontinence episodes over a specified period of time, usually 1 to 7 days. In more complex forms, a subject may be asked to record pad usage, type and amount of fluid intake, voided volumes, frequency and severity of urinary urgency, and/or activities that occur in relation to her lower urinary tract symptoms. Subjects are also often asked to record the time that they go to bed and the time they awaken in order to distinguish daytime from nighttime symptoms. For research studies of urinary incontinence, the National Institutes of Health recommends a 3-day bladder diary that records and reports, at a minimum, pad usage, urinary incontinence episodes, and voiding frequency. Diaries in which patients are asked to report fluid intake and record voided volume using a graduated toilet insert are often called frequency-volume charts. Although more cumbersome for patients, frequency-volume charts do provide a significant amount of additional data about lower urinary tract function not available in simpler bladder diaries, including average daily fluid intake, total daily voided volume, mean voided volume, largest single void (functional bladder capacity), and daytime and nighttime voided volumes. Although diaries have been used primarily as an outcome measure for studies of urinary symptoms in the Gynecology and Urology literature, symptom diaries are used extensively in many areas of clinical research as well. For instance, they have been used frequently in studies of bowel dysfunction to record the frequency of bowel movements, fecal incontinence episodes, etc. Similarly, pain diaries are a standard outcome measure in studies of acute and chronic pain management.

The accuracy of symptom diaries depends on the subject’s ability to follow instructions. The circumstances under which a diary is kept should approximate everyday life and should be similar before and after the intervention in order to allow for meaningful comparison. Reproducibility depends on the nature of the diary and the parameters being measured. In general, the reproducibility of symptom diaries improves as the duration of self-reporting increases. However, as diary duration increases, patient compliance tends to decrease. The most appropriate duration for bladder diaries has not been established. Although some have advocated the use of a single 24-hour diary, the reliability of diaries using this short duration is poor, limiting its use in research. The most commonly used duration is 7 days. Studies have demonstrated a high reliability for incontinence episodes, urinary frequency, urgency, and nocturic episodes in both men and women with either stress incontinence or overactive bladder with this diary duration. In women with stress incontinence, a 3-day diary appears to have similar reproducibility as a 7-day dairy with regard to number of incontinence episodes and voiding frequency. Similarly, in patients with overactive bladder, the reliability of urge incontinence episodes, urgency episodes, daytime and nighttime frequency was adequate for a 3-day diary, but not as good as a 7-day diary. As mentioned, the NIH recommends a diary duration of at least 3 days for the evaluation of lower urinary tract symptoms.

The primary strength of symptom diaries, at least in theory, is that they avoid the biases and inaccuracies of memory recall and record a subject’s symptoms in their normal day-to-day environment. There is evidence, however, that many patients may not actually complete their diary prospectively. In a study of adults with chronic pain by Stone et al. (2003), subjects were asked to complete a pain diary for 21 consecutive days. Each of these “paper and pen” diaries was fitted with an unobtrusive photosensor that detected light and recorded when the diary was opened and closed. Subjects were asked to record their pain at three set time periods each day and were not informed of the presence of the photosensor. At the end of the study, subjects reported a greater than 90% compliance with the diary; however, the photosensor revealed that only 11% of subjects had filled out their diaries at the prescribed times. For most subjects, the records were marked by long periods, from days to weeks, when the diary was not opened even though entries were made for those days when the diary was turned in at the end of the study, suggesting that subjects frequently backfilled their diary to complete missing days. Such backfilling is particularly subject to retrospective biases. In this study, a parallel group of patients were given computer diaries that prompted them to complete their diary at the specified times and 94% true compliance was noted in this group, suggesting an advantage of computer diaries over paper ones. However, such computer prompting would not be useful for recording spontaneous events like voiding or incontinence episodes. In spite of this concern about retrospective completion of symptom diaries, they remain an important outcome measure in the study of pelvic floor disorders because of their widespread use, general acceptance, and proven reproducibility, particularly for studying lower urinary tract dysfunction. When a bladder diary or similar symptom diary is used in a study, time should be spent instructing the subject in the proper use of the diary and the importance of completing the diary in a prospective manner.

Another strength of symptom diaries is that they provide information on symptom frequency and, in some cases, severity, in a way that is quantifiable. This is particularly useful in studies of an intervention in which patients are not commonly “cured,” but may show an improvement in symptoms, such as studies of medical or behavioral therapy for urinary incontinence. In fact, bladder diaries are the most common primary outcome measure used in studies of this type. Furthermore, outcome measures that are continuous variables tend to provide greater statistical power than dichotomous variables, so using a variable from a bladder diary such as number of incontinence episodes per week rather than a dichotomous outcome such as “cure/failure” will usually allow for a smaller study sample size. An additional strength of bladder diaries in particular is that normal population values for variables like voiding frequency, mean voided volume, and daytime and nighttime urine output have been published, providing useful reference values for defining study populations and estimating treatment goals.

In addition to the possibility that symptom diaries may not always be completed contemporaneously, another potential weakness of this outcome tool is a lower patient compliance with completing symptom diaries when compared with simpler measures like questionnaires. In large pharmaceutical trials in which subjects are carefully selected and often financially compensated to participate, compliance with symptom diaries is typically high, often over 90%. In smaller less-funded studies, patient compliance with symptom diaries can be poor. Singh et al. (2004) prospectively studied 107 women who underwent pubovaginal slings. They used the Simplified Urinary Incontinence Outcome Score (SUIOS), a composite outcome that combines the results of a questionnaire, 24-hour pad test, and a 24-hour bladder diary into single score, as their primary outcome. Although all patients completed the questionnaire postoperatively, only 52% completed the symptom diary and/or the pad test even after repeated telephone contacts, reducing the number of subjects in whom the primary outcome was available to half the original study population. When considering using a symptom diary as an outcome measure, the advantages of this tool must be weighed against the possibility of poor patient compliance.

Another important consideration when using a bladder diary as an outcome measure is the therapeutic effect that dairy completion in itself may have on lower urinary tract function. As described in Chapter 14, bladder retraining using diaries is an effective intervention for both stress and urge urinary incontinence. Several authors have suggested that the high improvement rates seen in the placebo groups in pharmaceutical trials of overactive bladder and other similar trials (often greater than 40%) are due in part to a “bladder retraining effect” that occurs just by using diaries throughout the trial. Some studies suggest that this effect can occur as early as 4 days after starting diary use. When a bladder diary is used to evaluate the effect of an intervention, the only certain way of accounting for the therapeutic effect of the diary itself is to include a control group in the study.

PAD TESTING

Pad testing attempts to objectively quantify the volume of urine loss by weighing a perineal pad before and after a specified time and/or group of activities. It is currently the only incontinence severity measure that captures the actual volume of leakage. Pad testing has also been used to attempt to distinguish continent from incontinent women. Numerous pad test protocols have been described, but in general they can be divided into short-term and long-term tests.

The short-term pad tests each ask subjects to perform a set of standardized provocative maneuvers in the office that, depending upon the protocol, can last from 10 minutes to 2 hours. In an attempt to standardize bladder volumes, most short-term pad tests specify that subjects start the pad test with a symptomatically full bladder, drink a standardized volume of liquid, or have a standard volume of fluid instilled in the bladder prior to the test. A preweighed pad is then worn while performing a predefined group of activities that typically includes such things as walking, climbing stairs, jumping, bending, coughing, and washing hands over a specified period. The volume of urine loss is obtained by weighing the pad at the completion of the test. For short-term tests, a change in pad weight of greater than 1 g is considered positive. When a short-term pad test is used as a study outcome, the specific protocol used should be described. In 1983, the ICS recommended the 1-hour pad test (described in Box 39-1) in an attempt to standardize this outcome measure across studies. Compared with long-term tests, short-term pad tests are easy and quick, and patient compliance can be directly monitored. Because of this, they are used frequently in clinical trials. However, a significant disadvantage of short-term pad tests is that they lack authenticity. These office tests do not necessarily reproduce the activities or situations that result in urine loss in a patient’s everyday life. In fact, some patients may not be physically capable of completing all of the prescribed activities in the protocol. Another limitation of short-term pad tests is their poor test-retest reliability. Although some studies have demonstrated good correlation between short-term pad tests performed in the same subject on two separate occasions, many have found poor repeatability with this test. Lose et al. (1988) demonstrated differences of up to 24 g between two test results in the same subject 1 to 15 days apart using the ICS 1-hour pad test, and concluded that this test is not precise enough to allow reliable quantitation of urinary incontinence. This variation within subjects is largely attributable to differences in bladder volumes at the time of the test, and protocols that standardize pretest bladder volumes tend to have higher reliability.

BOX 39-1 STEPS OF THE 1-HOUR PAD TEST RECOMMENDED BY THE INTERNATIONAL CONTINENCE SOCIETY

• Test is started without the patient voiding.

• Preweighed pad is put on and the first 1-hour test period begins.

• Subject drinks 500 mL sodium-free liquid within a short period (max. 15 minutes), then sits or rests.

• Half hour period: subject walks, including stair-climbing equivalent to one flight up and down.

• At the end of the 1-hour test the pad is removed and weighed.

• If the test is regarded as representative, the subject voids and the volume is recorded.

• Otherwise, the test is repeated, preferably without voiding.

(From: Abrams P, Blaivas JG, Stanton SL, Andersen JT: The standardisation of terminology of lower urinary tract function. The International Continence Society Committee on Standardisation of Terminology. Scand J Urol Nephrol Suppl 1988;114:5.)

Long-term pad tests are performed by giving a patient several preweighed pads to take home and wear for 24 to 48 hours. Patients are encouraged to mimic their regular daily activities and change the pads as they wish during the study period. Subjects should be instructed to place the pads in a sealed plastic bag after use in order to avoid evaporation. Afterwards, pads are mailed to the clinic to be weighed on a precision scale to determine the total urine loss over the specified period. Studies have shown that, as long as sealed bags are used, evaporation loss is minimal for up to 2 weeks. A bladder diary is often completed concurrently with the pad test to provide a comprehensive lower urinary tract evaluation. Changes in pad weights of up to 4 g/24 hours can be seen in healthy continent women, so values less than this should be considered insignificant. The primary advantage of long-term pad tests is that their results reflect everyday life. Also, the reproducibility of long-term tests is generally higher than short-term pad tests. Increasing the duration of the test from 24 hours to 48 to 72 hours increases the reliability further but decreases patient compliance. Not surprisingly, the compliance with long-term pad tests varies considerably from study to study. As with bladder diaries, in large trials in which patients are carefully selected and are compensated for study participation, compliance with long-term pad tests tends to be high; in smaller, less-funded studies, compliance is often lower.