Objective Measurement of Outcomes inFacial Palsy

Acknowledgement

We thank Fabian Bachl and Jakob Hochreiter for recording and editing the EMG-Videos. Parts of the presented work were supported by the German Federal Ministry of Education and Research (BMBF; IRESTRA grant 16SV7209), the Deutsche Forschungsgemeinschaft (DFG) grant DE 735/15-1 and GU-463/12-1. and the DEGUM (German Society for Medical Ultrasound).

Current Standards for the Classification of Motor Function Disorder and Synkinesis in Facial Nerve Palsy

The usual standard in clinical routine practice is the subjective evaluation of the severity of the motor disorder by the therapist (examiner-based) with a graduated scoring scheme, often with a scoring scale. There is no national or international standard and there is no optimal observer-dependent, and certainly no objective observer-independent, procedure. There are at least 19 classification schemes in use worldwide, most of which have been designed for the assessment of acute idiopathic facial palsy (Bell’s palsy). If one analyzes the current systems based on criteria for an ideal classification system, only the Sunnybrook Facial Grading System , and partly, the Facial Nerve Grading System 2.0 , meet these criteria. Experts worldwide continue to use the very unreliable House-Brackmann grading scale, which is still recommended in guidelines on acute facial nerve palsy. ^, This is certainly due to the degree of familiarity with the House-Brackmann scale. A particular advantage is that it allows for fast classification in clinical routine practice without further aids. However, the House-Brackmann grading scale does not classify synkinesis, which would be important for the patients in the post-paretic phase of the disease. The aspect of speed with simultaneous standardization is thus also significant for any emerging method, as this will be an important feature for its acceptance in clinical routine practice.

Due to observer dependence and therefore lack of objectivity, all evaluation schemes inevitably have limited intraobserver and interobserver reliability. Most evaluation schemes are designed to be used face-to-face but in reality, more or less well standardized photographic or video sequences are often rated post hoc. gives an impression of a photo session for documenting facial deficits in the photo lab of the Facial-Nerve-Center Jena, Germany. However, because of the lack of alternatives, these schemes are also used as primary outcome criteria in large multicenter clinical trials. ^, To minimize variability created by the photographer during video recordings, the following videos provide standardized instructions to guide the patient through the whole documentation of facial movements ( https://vimeo.com/203921699 ) ( Fig. 8.1 ). Due to the defects inherent in these subjective schemes, the choice of method can have a considerable influence on the study results. Recently, a new subjective method has been widely publicized, a clinician-graded Facial Function Scale (eFACE). Its attractiveness is less due to the items that are queried than to the possibility of electronic use with a smartphone or tablet. The answers are given on visual analogue scales and the results are presented in graphic form for the user when used electronically, thus generating a pseudo-objectivity. However, the correlation of the results of the eFACE with the Sunnybrook Facial Grading System is good.

Any visual assessment does not directly examine the facial muscles, but the resulting facial movement on the skin. Electromyography (EMG) is a measurable, partly objective, but also user-dependent examination method of the facial musculature. In clinical routine practice, EMG as either needle EMG and surface EMG is generally only used to roughly estimate the severity of the injury and prognosis. Theoretically, needle EMG can be used to assess the voluntary activity of the facial muscle as well as change in the course, ^, although this is not routinely established due to the effort and invasiveness involved. The synchronous derivation of many facial muscles with a multichannel surface or needle EMG is also possible, but so far too time-consuming for routine use. (See section on Electromyography for more details on how to perform and interpret a clinical needle EMG.)

The differences between the therapist’s perception versus the patient’s also needs to be considered: the therapist’s assessment of the severity of the disease may differ significantly from the self-assessment of the patients. Therefore, in addition to the above-mentioned instruments for therapists, two instruments have been established for self-evaluation by the patients with facial palsy: the Facial Clinimetric Evaluation (FaCE) scale and Facial Disability Index (FDI). Both instruments are so-called patient-related outcome measures (PROMs) and used for an integrative description of the quality of life of patients. These PROMs are also important for the quantification of the symptoms of chronic facial palsy with aberrant reinnervation and synkinesis because they not only ask questions about motor function, but also about psychological, social, and to some extent, communicative restrictions. Nevertheless, for the evaluation of patients with chronic facial palsy and synkinesis, FaCE and FDI provide only a few synkinesis-specific items. These deficits are targeted by the Synkinesis Assessment Questionnaire (SAQ). All these instruments are discussed in more detail in the next section.

Classification of Synkinesis in Facial Palsy with Proms

The use of self-assessment tools by patients by means of questionnaires gives an impression of the influence of the disease on their quality of life. There are disease-specific and non–disease-specific questionnaires available for this.

Non–disease-specific questionnaires such as the Short Form 36 (SF-36) questionnaire or the International Quality of Life Assessment (IQOLA) allow a general assessment of physical and mental health. With them, the disease-dependent quality of life of different diseases can be compared. However, they do not allow for an assessment of disease-specific stress factors. Therefore, there are no questions that specifically address the symptoms of facial palsy such as synkinesis.

In contrast, disease-specific questionnaires are tailored to the stress factors of the target group of patients with the specific disease. Whereas they do not allow a comparison of quality of life with other patient groups, they do allow for a more specific assessment of the underlying disease and disease-specific symptoms.

Perhaps the most popular facial palsy–specific PROM is the FaCE scale. The FaCE Scale, developed by Kahn et al., covers functional aspects of facial palsy and psychosocial stress factors alike. It was first described in 2001 and has since been translated into and validated in several languages. ^, It is regarded by experts as reliable and valid. The questionnaire contains 15 questions in six categories (facial movement, facial well-being, oral function, eye well-being, tear function, and social function) which are answered in the form of a five-point Likert scale . The total FaCE score is determined as the sum of the individual results. Between 0 and 100 points can be achieved, with a worse result if the number of points increases. Questions that directly address synkinetic facial movements do not exist. Only three questions indirectly target the presence of synkinesis on the face. These are questions 4 (“Parts of my face feel tense, exhausted and uncomfortable”), 6 (“When I try to move my face, I feel tension, pain and cramps”), and 13 (“My face feels tired or I feel tension, pain or cramps, when I try to move it”).

Five years before FaCE, the FDI was first published. The FDI contains a total of ten questions, five on physical function and five on social function, which are answered in a Likert scale. For physical function, 25% (worst result) to 100% (best result) are possible, whereas for social function, results between 0% (worst result) and 100% (best result) can be achieved. There are no questions that specifically target synkinesis in chronic facial paresis. This lack of synkinesis-specific items is targeted by the SAQ ( Table 8.1 ).

TABLE 8.1

Synkinesis Assessment Questionnaire (SAQ).

Modified from Mehta RP, WernickRobinson M, Hadlock TA. Validation of the Synkinesis Assessment Questionnaire. Laryngoscope . 2007;117(5):923-926. doi: 10.1097/MLG.0b013e3180412460. PMID: 17473697.

Please answer the following questions regarding facial function, on a scale from 1 to 5 according to the following scale:

1:

seldom or not at all, 2: occasionally, or very mildly, 3: sometimes, or mildly, 4: most of the time or moderately, 5: all the time or severely

Question	Score
1. When I smile, my eye closes
2. When I speak, my eye closes
3. When I whistle or pucker my lips, my eye closes
4. When I smile, my neck tightens
5. When I close my eyes, my face gets tight
6. When I close my eye, the corner of my mouth moves
7. When I close my eyes, my neck tightens
8. When I eat, my eye waters
9. When I move my face, my chin develops a dimpled area
Total Synkinesis Score:	Sum of Scores 1 to 9/45 × 100

The SAQ is a specific instrument for the self-assessment of synkinesis and was developed in Boston in 2007. It consists of nine questions that alone evaluate the synkinetic dysfunctions of facial palsy. A score of 0 indicates no facial synkinesis, whereas the maximum score of 100 indicates strongest and lasting facial synkinesis. The SAQ thereby enables synkinesis to be evaluated in the course of facial paralysis or after therapeutic measures. It is available in three languages.

PROMs are by definition very dependent on the self-perception of the patients. Therefore, they are not objective in the sense of an automatic measurement without human influence. But in contrast to expert-based rating systems like the Sunnybrook or the eFACE, they are only dependent on the patient. In addition, they can be easily integrated into a clinical setting and, with minimal cost and time, a detailed follow-up of patients with facial palsy is possible. However, there are often large discrepancies between the PROMs and expert-based ratings which is still not fully understood and is a topic of ongoing research. ^, In summary, before using PROMs, it must be decided which aspects of the disease are of interest as the SAQ only records functional/motor targets, whereas the FaCE and the FDI also map emotional stress factors. Another criterion of note is availability of the PROM in the patient’s mother tongue.

Automatic Image Analysis of Facial Expression in Facial Nerve Palsy

Currently, there are no automated procedures for clinical routine or standards for clinical trials. The main reasons for this are the individually chosen hardware and software solutions, the high acquisition costs, the huge amount of time spent creating the videos, the need to attach markers to the face before recording, the complex evaluation after the medical check-up allowing only semi-quantitative evaluation (e.g. the examiner has to place measuring points in the images), or simply the inadequate conversion of a laboratory workstation into a setting suitable for clinical routine examination. No automated system has been validated in a clinical setting on a large representative group of patients with facial nerve palsy.

The large number of approaches initiated by clinicians only illustrates the unsatisfactory solution to the problem. ^, ^, The first method to use modern computer vision methods was introduced in 2010 to automate the subjective investigator-based House-Brackmann grading scale. Anatomical landmarks defined around the eyes, mouth, and ears were automatically localized without markers in two-dimensional (2D) color images which were subsequently used for data enrichment and synthesis of “virtual” faces with facial nerve palsy. Based on these synthetic data and heuristically determined distance thresholds of corresponding landmarks of the two halves of the face, the House-Brackmann Index was estimated.

A hybrid approach was presented in a 2016 publication which is rule-based and uses learned classifiers as a control instance. It first distinguishes between volunteers and patients, then recognizes the type of facial paresis, and finally indicates the House-Brackmann grading. 2D grayscale images of five different facial poses are used and the facial landmarks are localized without markers and then distances between these facial landmarks are used as facial features. Both approaches only use single images of very few patients.

The use of heuristically determined distance thresholds is not representative enough for generalization, as 2D landmarks cannot be assumed to be normalized against affine transformations.

Even in a recently introduced smartphone-based diagnostic system, facial features are determined exclusively by distances and angles between 2D landmarks. However, image series in the form of videos of three movement exercises are used. The facial features are extracted from the single images and the House-Brackmann grading scale is learned by means of support-vector-machine (SVM). In the work of Peterson et al., action units (AUs) are used as facial features to evaluate patients with blepharospasm in videos. The computer expression recognition toolbox (CERT) was used to calculate the AUs. In both studies, time series of mimic facial movements in the form of image sequences are used for the automatic evaluation of mimic dysfunctions.

In order to get closer to the three-dimensional (3D) movement of the face, the first automatic 3D analysis systems for patients with facial nerve palsy were introduced. Facegram works with red-green-blue (RGB) cameras that combine conventional 2D color images with depth information (e.g., the Microsoft Kinect system). With Facegram , however, special markers must be applied to the face, which limited its adoption. Endpoints for trajectories were drawn on the face and the change in trajectories during movement was analyzed. ^, The Kinect v2 was used as a prototype to automatically calculate an asymmetry index for the healthy side in patients with facial paresis without landmarks, or integrated into a feedback system to give patients under exercise therapy feedback on successful completion of the task.

The mentioned works used all anatomical 3D landmarks of the face, which were projected from 2D into the 3D point cloud and use individual 3D point clouds. Often, 68 defined landmarks around the eyes, eyebrows, nose, mouth, and cheeks are used to describe expression in faces ( Fig. 8.9 ). However, landmarks can alternatively be defined based on facial curvatures which should be more representative, especially for 3D images, from a biological and clinical perspective.

Corresponding curves were extracted from two faces in a comparative manner in order to analyze an improvement in facial nerve palsy after treatment with botulinum toxin. ^, Sequences of 3D images were used as transformations to four-dimensional (4D) images. The 4D measurement enabled a static and a dynamic evaluation. Sixteen patients were asked to perform eight facial expressions before and after treatment. The recorded facial point clouds were mirrored and registered frame by frame and the resulting point cloud pairs were used to create dense scalar fields , which reflect and visualize the level of asymmetry.

Although the work uses 4D measurements and curvature describing features for asymmetry analysis, only the asymmetry is determined automatically in the form of visualization by two registered point clouds as an objective tool to measure dysfunction in patients with facial palsy.

For facial action coding system (FACS) analyses for emotion recognition, valid automatic video-based AU detection algorithms are now also used. In addition to the use in classic psychological experiments, automatic FACS analyses are now also used for neurological or psychiatric diseases. Using automatic FACS analyses, patients with chronic facial paresis were also examined whereby the AU analyses themselves were not used at all, but eye closure and smiles were analyzed on the basis of pixel shifts. Subsequently, we were able to show that it is possible to automatically detect AUs in standard FACS datasets with an active appearance model (AAM) approach with very high quality results. And finally, using this method we were able to classify the AUs on both the paralyzed and opposite side in photographs of 299 patients with acute facial paresis. In the following section, more technical details for automatic quantification of facial palsy are presented.

Automatic Facial Landmark Localization

Facial landmarks are descriptive points in the face which characterize the shape of a face and serve as an accurate identification of specific facial features. They are located at the eyes, eyebrows, nose, mouth, chin, and cheek. However, the number and position of landmarks is not defined. Therefore, in practice, there are numerous facial datasets with a different number of annotated landmarks available. A list of available datasets is shown in Table 8.2 .

TABLE 8.2

List of Datasets with Different Number of Facial Landmarks.

From Johnston B, de Chazal P. A review of image-based automatic facial landmark identification techniques. J Image Video Proc . 2018;86:(2018).

Name	Images	Subjects	Landmarks	Year
XM2VTS	2360	295	68	1999
BioID	1521	23	20	2001
LFW	13233	5749	68	2007
Caltech	7092	Unknown	4	2007
PUT	9971	100	60	2008
MULTI-PIE	755370	337	68	2008
MUCT	3755	276	68 (+4)	2010
AFLW	2330	Unknown	97	2012
HELEN	2330	Unknown	97	2012
300W	600	Unknown	68	2013
Menpo Benchmark	8979	Unknown	68 (profile: 39)	2017

The most widely used landmark shape consists of 68 landmarks and is shown in Fig. 8.9 .

Fortunately, the available annotated datasets of Table 8.2 can be exploited for use in machine-learning algorithms for automatic localization in nonannotated images by an appropriate training process. This reduces the effort needed by experts if they were to perform the annotation task manually. In addition, the trained models can be applied to face images of facial palsy patients where only the landmarks of one facial hemisphere are used for model training as, by vertically flipping the images, all landmarks can still be used.

A first approach for automatic landmark localization is an AAM introduced by Cootes et al. An AAM combines the shape information and texture information of a face in a holistic generative manner. It models the visual appearance of the face in a global manner by considering typical interrelationships of the face shape and the face texture. The generative model can be applied to unseen new images to automatically localize a face shape (landmarks). Here, as a result, a vector of real numbers is obtained. These vectors can be used as features, for example, to train a model to automatically grade facial palsy. Additionally, the shape and texture information of both face hemispheres can be used separately to train single AAM models.

Another method to localize facial landmarks automatically is based on regression. In facial landmark regression, the single landmarks are located independently of all other landmarks. Compared to AAMs, it is an advantage if not all landmarks of the face are visible in the image. Convolutional neural networks (CNNs), a special variant of artificial neural networks , can be used as regressors. CNNs have a multilayered architecture and the capability to learn specific features of the input data by themselves in order to make better use of them for the regression task. The first layers use convolutional operations to learn a number of filters based on the appearance of the input data. The fully connected layer at the end of the CNN is used to learn the locations of landmarks in the form of a multioutput regression task. Fig. 8.2 illustrates an example of CNN architecture for facial landmark regression.

Complex Facial Features

For automatic grading of facial palsy, suitable and powerful facial features that describe the shape (and texture) of a face are necessary. In the following sections, we introduce facial features based on statistical models and features extracted by a deep learning model.

AAM Features

As described in the former section, AAMs provide powerful facial features characterizing the shape and the texture of important parts of the face in a compact form of a vector containing real numbers.

AU Features

Facial AUs are defined in the FACS by Ekman and Friesen : The FACS attempts to deconstruct all facial expressions into single facial muscle group activations called AUs, which encode a facial expression. In some cases, the AU corresponds to a single muscle, but in other cases the action seen is the result of multiple muscles working together. Multiple activations of facial muscles result in emotions like happiness, sadness, surprise, fear, anger, disgust, or contempt. These basic emotions are coded in combinations of AUs, for example, AU6 + AU12 for happiness. Public annotated datasets (e.g. CK+) can be used to train a model for AU parameter prediction, and the AUs can later be used as facial features.

CNN Features

In the previous section, CNNs were introduced as an automatic facial landmark regression task. Whereas CNNs autonomously learn powerful image features in the initial layers in terms of input data, they can also be used as feature donators. After training a CNN with facial input images for an auxiliary task such as facial landmark localization, the weight vectors of certain layers can be extracted.

Automatic Facial Palsy Assessment

Landmark Relations

Facial landmarks describe the shape of a face that can be exploited to analyze asymmetries in the face shape by comparing corresponding landmarks of both face hemispheres. For a detailed asymmetry analysis, suitable landmark distances between two landmarks of each side are well qualified. In Fig. 8.3 some of those suitable distances are illustrated.

Another approach analyzes certain angular relationships at corresponding landmarks on both halves of the face. To calculate those angles, two lines are created by using the pivot landmark and two other suitable landmarks of the related face hemisphere. Fig. 8.4 illustrates two proposed angles.

Both landmark distance relationships and landmark angle relationships of both facial hemispheres can be done with 2D and 3D landmarks.

Automatic Facial Palsy Index Prediction Using 2D Images

One baseline for automatic facial palsy grading is face-describing 2D image features, which are defined in a previous section.

By applying a pretrained AAM model to an image, the resulting image-related AAM parameters can be exploited as image features. For facial palsy grading, an SVM or a random decision forest (RDF) can be trained using the AAM parameters and the facial palsy annotations of the related images.

As the amount and the variance of data of facial palsy patients is not enough for AAM training, initially, single AAM models of both face hemispheres can be trained using public face datasets (see Table 8.2 ). Afterwards, these models trained on the left and the right face hemisphere are separately applied to facial palsy patients. The resulting AAM parameter vectors are concatenated to a new hemisphere-combined AAM parameter vector, which is used as a feature vector together with the image-related facial palsy annotations for SVM/RDF training.

Other suitable facial features are AUs. AUs can be derived using the AAM parameters and a Gaussian process regression which is described by Haase et al. in detail.

In Modersohn and Denzler, the proposed features were applied to a dataset of 235 facial palsy patients to automatically predict the House-Brackmann, Stennert, and Sunnybrook Index.

Modern CNNs introduced by Krizhevsky et al. ^, are strong classifiers, as 2D convolution operations are exploited to learn various filter masks in the early layers of the network. The final dense layers are then used for the task, e.g. facial palsy index prediction. However, CNN training needs a huge amount of data, which is rarely available in medical applications.

Hence, CNNs can also provide powerful facial image features if the CNN is trained on facial image data and a face-related task, e.g. automatic facial landmark regression. These features are extracted from weights of later network layers and can be used for SVM training, too, to automatically predict a facial palsy index. This method is extensively evaluated in Li et al.

If the amount of data allows for it, an entire CNN can also be trained. In terms of the complexity and the large number of parameters of a CNN, a lot of training data is necessary. For the CNN training, the pretrained network for automatic facial landmark localization can be used as a weight initialization because the image domain of both tasks is the same. For example, the VVG-16 CNN architecture trained for a facial recognition task can be used to extract powerful features.

Electromyography

Promising higher objectivity and reliability than, for example, the classic topodiagnostic testing, electrical testing has become the mainstay for prognostic testing of the facial nerve. Adour credits Duchenne in the 1800s as one of the earliest electrodiagnostic practitioners. In discussing “rheumatismal” facial palsy, Duchenne noted that the palsies that persisted had absent muscular contractility on nerve stimulation. He claimed his tests could reliably predict prognosis. In 2013, the American Academy of Otolaryngology−Head and Neck Surgery Foundation published a clinical practice guideline for the treatment of Bell’s palsy. This guideline recommends offering electrodiagnostic testing primarily in cases of complete paralysis as incomplete paralysis patients already, by and large, have a good prognosis. However, clinicians should refer patients to a facial nerve specialist when new or worsening neurologic findings occur at any point, ocular symptoms develop, or when, after 3 months, there is still an incomplete facial recovery. As a rule, an idiopathic facial palsy always shows some degree of recovery, at worst with synkinesis. Also in these cases, electrodiagnostic tests have their justification. In the following sections it will be shown that in the hands of a trained facial nerve specialist, electrodiagnostics support the diagnostics in cases of incomplete and complete facial palsy and also in detecting early reinnervation or synkinesis.

Needle Electromyography

The first part of the electrophysiologic investigation of patients is usually the needle electromyography (needle EMG). Using a small EMG needle (20 to 40 mm), all the facial muscles can easily be investigated ( Fig. 8.5 ). There are several types of needle electrodes of which bipolar concentric ones are the most versatile. A monopolar needle electrode is only routinely used for EMG-guided injection of botulinum toxin. It is a stainless steel needle fully insulated with a thin insulating coating, except for the tip. The recording area of this electrode is spherical. The reference electrode is placed at a myoelectric inactive location of the body and may be a surface electrode. For facial EMG, a suitable place for the reference electrode is the skin over the manubrium sterni due to the lack of interfering muscles close by and the symmetric position in the midline of the patient.