|Year : 2016 | Volume
| Issue : 1 | Page : 127-131
Perceptual evaluation of tracheoesophageal speech: Is it a reliable tool?
IS Suhail, RA Kazi, M Jagade
Department of Ear Nose and Throat and Head–Neck Surgery, Grant Medical College and J J Group of Hospitals, Byculla, Mumbai, India
|Date of Web Publication||28-Apr-2016|
I S Suhail
Department of Ear Nose and Throat and Head–Neck Surgery, Grant Medical College and J J Group of Hospitals, Byculla, Mumbai
Source of Support: None, Conflict of Interest: None
Perceptual voice evaluation is a common clinical tool for rating the severity of vocal quality impairment. It has been used in research as a gold standard for comparison with acoustic and aerodynamics measurements. Nevertheless, it has disadvantages in the form of being time-consuming, a group of raters is needed and last but not the least it is a subjective manner of evaluation. Intraobserver and interobserver reliability is an important issue in perceptual evaluations. Different perceptual scales have been developed to describe the quality of a patient's voice but none is internationally accepted. Although not entirely comprehensive, perceptual evaluations will be used as a standard against which other measures will be evaluated. Data were collected by conducting a computer-aided search of the MED-LINE and PUBMED databases, supplemented by hand searches of key journals. More than 50 articles in the last three decades on the topic have been reviewed out of which approximately 31 were found to be relevant to this article.
Keywords: Acoustic analysis, anchors, evaluation scales, perceptual evaluation, raters
|How to cite this article:|
Suhail I S, Kazi R A, Jagade M. Perceptual evaluation of tracheoesophageal speech: Is it a reliable tool?. Indian J Cancer 2016;53:127-31
| » Introduction|| |
Perceptual voice evaluation is a common clinical tool for rating the severity of vocal quality impairment. Human speech is essentially used for communication. This implies that the judgment of voice, being a property of speech is a perceptual matter. Raters' perceptions of the voice are based on perceptual aspects, and the patients themselves, their relatives, and clinicians (speech language pathologists, otolaryngologists) base their judgments of voice quality on perception as well. It is thus not surprising that the perceptual measures are the standard against which other acoustic and clinical measures of voice are evaluated. Although voice evaluation is perceptual, there are many ways to objectively measure it., Patients seek treatment for voice disorders because they do not sound normal, and they often decide on whether treatment has been successful based on whether they sound better or not. Perceptual evaluation is essential for the assessment of vocal quality, overall severity of defects, and their impact on communication skills. For this and other reasons, speech is far more instrumental for objective measures. Furthermore, listeners' judgments are usually the standard against which other measures of voice (acoustic, aerodynamic) are evaluated. Therefore, it has maintained its place next to these more technical and objective evaluations. However, the usefulness of acoustic parameters essentially depends on their meaningful relationship to experienced listeners' impressions of voice quality. The power of perceptual scales lies in their accessibility for any clinician and researcher involved in the study of voice. Inevitably, the human factor plays a large role in the use of this subjective tool. Correlation of objective voice parameters and perceptual indices is important to determine not only the crucial links between objective assessments and auditory perception but also to optimize the intelligibility.
| » Materials and Methods|| |
Different perceptual scales have been developed to describe the quality of a patient's voice, such as the GRBAS scale by the Japan Society of Logopaedics and Phoniatrics, the Hammarberg scale, the Buffalo Voice Profile, Vocal Profile Analysis Scheme, Visual Sort and Rate method, Consensus Auditory–Perceptual Evaluation of Voice (CAPE-V), and Voice Activity and Participation Profile (VAPP) [Table 1].,,
|Table 1: Evaluation scales used in perceptual evaluation of tracheoesophageal speech|
Click here to view
With regard to the GRBAS scale, Abe et al., found that reproducibility differs according to listeners and scales. High reproducibility is possible when listeners are selected prudently. Recent work of Dejonckere et al., showed that the best correlation was found for overall grade of hoarseness (G) parameter (P = 0.7). Impressions about asthenicity (A) and strained quality (S) were less consistent. In a recent study by De Bodt et al., test–retest reliability was moderate in a mixed group of otorhinolaryngologists and speech and language therapists. The best agreement between the observers was obtained for the G parameter and the worst agreement for the S parameter. Results showed no significant influence with respect to the level of experience or professional background, but the group of experienced listeners scored higher than the inexperienced listeners. This is also the case for the speech and language pathologists, scoring higher than otorhinolaryngologists. Differences between experienced and inexperienced listeners are smaller than that between otolaryngologists and speech and language pathologists suggesting that professional background has a greater impact on perceptual rating than experience. In their rating, experienced otolaryngologists clearly approach best the overall. The influence of speech fragments has been studied by De Krom who concluded that stimulus type had virtually had no effect on either intra- or interlistener consistencies on the grade, roughness, or the ratings. Sakata et al., showed a difference in perceptual evaluation of sustained vowel running speech produced by different voices. G ends to be evaluated higher in running speech than in sustained vowels, and A, S tend to be evaluated higher in vowel than in running speech.
Naive listeners have performed perceptual evaluations to gain insight into the perception of tracheoesophageal speech in daily communication, whereas trained expert listeners (speech–language pathologists) have performed perceptual evaluations to gain insight in the clinical point of view that could serve as a standard against which other clinical evaluation methods (acoustic analyses, video fluoroscopy, and digital high-speed imaging) could be evaluated. Results have shown that a minimal basic subset of four perceptual scales covers the two underlying perceptual dimension (voice quality and pitch) found for the naive raters sufficiently, and that for the Speech and Language pathologist's (SLPs) a minimal basic subset of eight perceptual scales is sufficient to cover the four underlying perceptual dimensions (voice quality, tonicity, pitch, and tempo) found for them. The perceptual scales used, consist of semantic bipolar 7-point scales. The instruction given to the listeners is to choose the extreme scale ends (1 or 7) when the term is very much applicable to the voice quality, to choose one interval more toward the middle (2 or 6) when the term is moderately applicable, one interval more to the middle (3 or 5) when the term is a little applicable, and the middle (4) when both terms are equally applicable or neither one of the terms is applicable. In addition, a number of other perceptual scales such as bubbly–not bubbly, unsteady–steady, weak–powerful, and monotonous–melodious are suitable for perceptual evaluation of tracheoesophageal speech. These scales can be added to the minimal subset, depending on the purpose of the perceptual evaluations. Apart from the perceptual scale judgments, it is also advised to judge the overall voice quality as good, reasonable, or poor. A good voice is defined as “almost similar to normal voice,” a poor voice is defined as “very deviant from normal voice,” and a reasonable voice is defined as “somewhere in between both extremes.”
It has been shown that the trained expert raters judged the tracheoesophageal voices more positive than the naive raters did and that they used the range of the perceptual scales better. For the naive raters only two perceptual dimensions could be extracted from the ratings, whereas for the trained raters four perceptual dimensions were extracted. This indicates that the trained expert listeners differentiate more between various perceptual aspects of tracheoesophageal voice quality. With respect to neoglottic characteristics and clinical practice, the judgments of the trained expert listeners are clinically more relevant than the judgments of the naive listeners. The judgments of the trained raters are more useful with respect to the specific neoglottic characteristics leading to specific perceptual aspects of tracheoesophageal voice quality, because they differ more between the various perceptual aspects. Insight into the relationships between neoglottic characteristics and anatomic and morphologic characteristics of the neoglottis is relevant with respect to voice and speech training after total laryngectomy as well as with respect to the effects of different surgical techniques. In addition, it is clinically more relevant to use the judgments of the speech–language pathologist, who is giving the voice training, since the speech pathologist is monitoring the result of the training and is trying, together with the patient, to achieve an optimal voice quality.
| » Results|| |
In a study by Kazi et al., two independent trained expert raters were given voice recordings of the normal subjects and tracheoesophageal (TO) speakers and were asked to rate these according to the “GRBAS” scale and the simpler “Overall Voice Quality (OVQ)” scale. The results of this evaluation in median (range) are presented in [Table 2].
The raters were clearly able to distinguish normal speakers from TO speakers effectively and appeared to ascribe higher scores (poorer values) to TO speakers as compared with the normal subjects across the scale.
[Table 3] shows the results on inter-rater reliability and test–retest reliability using the intra-class correlation coefficient testing. When rating TO speakers, there appeared to be more reliability between raters when ascribing an overall grade (G) to speech quality on the GRBAS scale designating the OVQ grade as compared with the other parameters. There also appeared to be some concordance between the raters when evaluating the breathy asthenic and strained nature of TO speech and but less agreement in deciding on roughness. Data from Expert Rater I (ERI) was used to examine the test–retest reliability. ERI rated the same voice recordings of our TO speakers 3 weeks apart for this purpose. There appeared to be a high degree of test–retest reliability.
|Table 3: Interrater and test-retest reliability of perceptual evaluation using intra-class correlation coefficient|
Click here to view
Collectively, these analyses suggest that the overall grade (G) of GRBAS, OVQ, and to some extent breathiness (B) of the GRBAS scale score is robust perceptual parameters when assessing TO speakers. There was an excellent correlation between G and OVQ (Spearman rank, 0.97, P < 0.0001).
The use of a judgment of the overall voice quality allows a concise and a simple overall impression of the voice quality. The perceptual evaluation of the study cohort of the TO speakers by trained raters using the GRBAS and OVQ establishes that the overall grade (G) and OVQ are clearly measurable and reliable measures. These two scales significantly correlated with average fundamental frequency, jitter, shimmer, normalized noise energy (NNE) (from use of sustained vowel), and irregularity (from use of connected speech) thus establishing the crucial links between objective and perceptual data. Both OVQ and GRBAS can be used reliably with expert raters to obtain an overall “impression” in TO speakers, but this should be supplemented with other methods of assessment.
The relationship between objective acoustic and perceptual measures of voice and their relative importance have long occupied voice researchers. Some authors have argued that perceptual measures of voice have greater content validity and that acoustic measures are useful only to the extent that they capture perceptual aerodynamic or physiologic information. Despite such arguments, some clinicians and researchers favor acoustic analysis over perceptual evaluation of pathologic voice. Studies in this tradition emphasize problems of listener unreliability and the lack of standardized terminology. In contrast, acoustic measures are considered well defined, objective, and reliable. Thus, much effort has been devoted to developing acoustic measures and to the search for acoustic correlates of various perceptual qualities or pathologic states., Such approaches apparently assume that some day acoustic measures may function in the place of perceptual assessment, thus alleviating concerns about listener unreliability.
Several authors have investigated the relationship between perceptual judgment and acoustic measurement. Most studies have investigated the quantitative correlation between isolated acoustic variables (jitter, shimmer, or harmonics/noise ratio) with perceptual judgment., Other authors have applied multiple linear regression analysis to investigate the relationship between combinations of acoustic variables and perceptual ratings. Wolfe et al., indicated that none of the acoustic variables was strongly correlated with the dysphonia ratings and neither was a combination of variables successful in predicting dysphonia. Eskenazi et al., showed, however, a fair prediction of the degree of dysphonia based on pitch amplitude and harmonics-to-noise ratio.
Rabinov et al., described the advantage of perceptual ratings and the disadvantage of acoustic measures in severe voice abnormality. They compared the reliability of perceptual ratings of roughness with acoustic measures of frequency perturbation produced by several voice analysis systems. They showed that, overall, listeners agreed as well as or better than objective algorithms. In a study comparing several systems for perturbation measurement, Blelamowicz et al., suggested that the reliability of measures produced by different systems might be worse than assumed. Listeners and analysis packages differ greatly in their measurement characteristics, and reliability alone is not a good enough reason for preferring acoustic to perceptual measures. Patterns of unreliability suggest that, for clinical purposes, perceptual measures are probably superior to current acoustic analysis systems, at least for perturbation-based measures. Standardization of acoustic measurement procedures may improve agreement among algorithms but will not itself resolve the problem of measurement utility.
Correlation of objective voice parameters and perceptual indices is important to determine not only the crucial links between objective assessment and auditory perceptions but also helps to optimize the intelligibility.
Judgments of the perceived quality of a voice sample are affected by many variables: Listener characteristics (experience and training), the phonetic content of the sample, pitch, loudness, and rate., Several authors emphasize the importance of training and experience. Anders et al., studied the effects of training (ie, classes of auditors) and of culture (language) on the perception of hoarseness. They found small, but not significant, differences between classes of listeners in favor of trained listeners. They concluded that the types of training and professional and cultural background are not dominant factors. Hammarberg et al., found good consistency and reliability of judgment ratings in a group of experienced listeners. Experienced listeners (speech language pathologists) in Gelfer's study judged somewhat more consistently than untrained raters did. In contrast, Kreiman et al., showed that naive and expert listeners use different perceptual strategies. They found that expert listeners show less agreement than naive listeners about the relative importance of various aspects of voice quality and suggest that care must be taken when using data averaged across different clinicians. Bassich and Ludlow reported that 8 h of training were required for previously inexperienced listeners to attain 80% interrater reliability using a 13-dimension perceptual rating system. Yiu et al., studied the effect of Anchors and Training on the Reliability of Perceptual Voice Evaluation and concluded that anchors and training helped to improve the reliability of perceptual voice evaluation, especially in the rating of male voices. It was also found that anchors of synthesized signals combined with training judged perceptual roughness and breathiness better than natural voice anchors [Table 4].
| » Discussion and Conclusion|| |
From the perceptual evaluations, it can be concluded that, although voice rehabilitation of laryngectomized patients has improved substantially over the past 25 years, the voice quality of tracheoesophageal speech is still rather deviant from that of normal speakers–something that can have implications on the psychosocial functioning of these patients in their daily life. This becomes especially clear from the judgments of the naive raters who tend to judge these voices at the lower ends of the scales.
For research purposes, in which perceptual evaluations will be used as a standard against which other measures will be evaluated, the use of experienced raters is recommended because they have professional knowledge about the anatomy and morphology of the neoglottis and they listen more analytically and are capable of using tracheoesophageal voice as their internal standard. In this respect, it is also important to train these experienced raters in performing perceptual evaluations. Literature has shown the necessity of training with respect to the different internal standards that experienced raters use. For research goals regarding the communicative function of tracheoesophageal speech, the use of perceptual judgments by naive raters is recommended. Obviously, the judgments of naive and trained raters differ from each other so much that they cannot serve as a substitute for each other.
According to the relationships between the scale judgments and judgment of overall voice quality given by the trained raters, it can be said that the use of a judgment of overall voice quality given by trained expert raters already gives a good impression of the quality of the voice. Specific scale judgments, however, remain necessary to gain insight into the more specific perceptual characteristics in relation to anatomic and morphologic aspects of the neoglottis.
The relationship between voice quality and judged intelligibility warrants further study to reveal more exact relationships between more specific aspects of voice quality and intelligibility. Although not entirely comprehensive perceptual evaluation, systems with expert raters could be used in laryngectomies with good effect in conjunction with other modalities of assessment. Despite controversies in history concerning terminology, methodology, and reliability, perceptual evaluation is even today commonly applied. Most researchers emphasize its importance, especially in evaluating the voice disorder as a whole but realize its limitations. The recent developments in objective evaluation of voice have not been replaced its function but have undoubtedly added greatly to our knowledge and have enabled more accurate measurements of vocal performance. However, it is unlikely that objective parameters will ever obviate the need for systematic perceptual assessment of voice quality because of the high cost, anxiety, and discomfort, which such techniques may cause the subject. Perceptual evaluation is a valuable primary tool in differential diagnosis and clinical management. It is most comprehensible for patients and plays an important role in the subjective appreciation of voice therapy.
| » References|| |
Hammarberg B, Nord L. The alaryngeal voice source as analysed by videofluoroscopy, fibreendoscopy, and perceptual–acoustic assessment. Stockholm, Sweden: Proceedings of the International Congress of Phonetic Sciences; 1995.
Baken RJ. Clinical measurement of Speech and Voice. London: Taylor and Francis Ltd; 1987.
Hirano M. Clinical examination of voice. New York: Springer Verlag; 1981. p. 81-4.
Gerratt BR, Kreiman J, Antonanzas-Barroso N, Berke GS. Comparing internal and external standards in voice quality judgments. J Speech Hear Res 1993;36:14-20.
Fukazawa T, Blaugrand SM, EI- Assuooty A, Gould WJ. Acoustic analysis of hoarse voice: A preliminary report. J Voice 1988;2:127-31.
Kreiman J. Gerratt BR, Kempster GB, Erman A, Berke GS. Perceptual evaluation of voice quality: Review, tutorial, and a framework for future research. J Speech Hear Res 1993;36:21-40.
Abe, H, Yongkawa H, Orta E. Imaizumi S. Reproducibility of hoarse voice psychoacoustic evaluation. Jpn J Logoped Phoniat 1986;27:168-77.
Dejonckere PH, Ordens C, De Moor GM, Wieneke GH. Perceptual evaluation of dysphonia: Reliability and relevance. Folia Phoniatr (Basel) 1993;45:76-83.
Laver J. The phonetic description of voice quality. Cambridge: Cambridge University Press; 1980.
Kazi R, Singh A, De Cordova J, Clarke P, Harrington K, Rhys-Evans P. A new self-administered questionnaire to determine patient experience with Blom–Singer valves. J Postgrad Med 2005;51:253-8.
Granqvist S. The Visual Sort and Rate method for perceptual evaluation in listening tests. Logoped Phoniatr Vocol 2003;28:109-16.
Hammarberg B, Pathological voice qualities. Perceptual and acoustic characteristics of a set of Swedish “reference” voice. Bulletin d' Audiophonology 1992;8:39-52.
De Bodt MS, Wuyts FL, Van De Heyning PH, Croux C. Test–retest study of the GRBAS Scale: The influence of experience and professional background on perceptual rating of voice quality. J Voice 1997;11:74-80.
De Krom G. Consistency and reliability of voice quality ratings for different types of speech fragments. J Speech Hear 1994;37:985-1000.
Sakata T, Kubota N, Yongkawa H. Imaizumi S. Numi S. GRBAS evaluation of running speech and sustained phonation. In: Proceedings of the world congress of the XXII IALP, 6- 9 August, IALP, Cairo, 1995. p. 33-6.
Van as C. Tracheoesophageal speech: A multidimensional assessment of voice quality. Amsterdam: University of Amsterdam; 2001.
Jensen MP. Questionnaire validation: A brief guide for reader of the research literature. Clin J Pain 2003;19:345-52.
Hammarberg B. Fritzel B. Acoustic and perceptual analysis of vocal dysfunction. J Phonetics 1986;14:533-47.
Wolfe V, Martin D. Acoustic correlates of dysphonia: Type and severity. J Commun Disord 1997;30:403-16.
Rabinov CR, Kreiman J, Gerratt BR, Bielamowicz S. Comparing reliability of perceptual ratings of roughness and acoustic measures of fitter. J Speech Hear Res 1995;38:26-32.
Eskenazi L, Childers DG, Hicks DM. Acoustics correlates of voice quality. J Speech Hear Res 1990;33:298-306.
Wolfe V, Fitch J, Comell R. Acoustica prediction of severity in commonly occurring voice problems. J Speech Hear Res 1995;38:273-9.
Bielamowicz S, Kreiman J, Gerrati BR, Dauer MS, Berke GS. Comparison of voice analysis system for perturbation measurements. J Speech Hear Res 1996;39:126-34.
Bassich CJ, Ludlow C. The Use of Perceptual Methods by New Clinical for assessing voice quality. J Speech Hear Disord 1986;51:125-33.
Anders LC, Hollien H, Hurme P, Soninnien A, Wendler J. Perception of hoarseness by several classes of listeners. Folia Phoniat 1988;40:91-100.
Gelfer MP. Fundamental frequency, intensity, and vowel selection: Effects on measures of phonatory stability. J Speech Hear Res 1995;38:1189-98.
Chan KM, Yiu EM. The effect of anchors and training on the reliability of perceptual voice evaluation. J Speech Lang Hear Res 2002;45:111-26.
Wilson DK. Voice problems of children, 3rd
ed. Baltimore: Williams and Wilkins; 1987.
[Table 1], [Table 2], [Table 3], [Table 4]
|This article has been cited by|
||Multimodal Speech Capture System for Speech Rehabilitation and Learning
| || |
| ||IEEE Transactions on Biomedical Engineering. 2017; 64(11): 2639 |
|[Pubmed] | [DOI]|