10.24425/aoa.2024.148818
Method for Vocal Fold Paralysis Detection Based on Perceptual and Acoustic Assessment
References
Aha D.W., Kibler D., Albert M.K. (1991), Instance-based learning algorithms, Machine learning, 6: 37–66, https://doi.org/10.1007/bf00153759.
Airas M., Alku P. (2007), Comparison of multiple voice source parameters in different phonation types, [in:] Eighth Annual Conference of the International Speech Communication Association, https://doi.org/10.21437/interspeech.2007-28.
Alku P., Backstrom T., Vilkman E. (2002), Normalized amplitude quotient for parametrization of the glottal flow, The Journal of the Acoustical Society of America, 112(2): 701–710, https://doi.org/10.1121/1.1490365.
Alku P., Strik H., Vilkman E. (1997), Parabolic spectral parameter – A new method for quantification of the glottal flow, Speech Communication, 22(1): 67–79, https://doi.org/10.1016/s0167-6393(97)00020-4.
Alpaydin E. (2004), Introduction to Machine Learning, MIT Press.
Askenfelt A.G., Hammarberg B. (1986), Speech waveform perturbation analysis: A perceptual-acoustical comparison of seven measures, Journal of Speech, Language, and Hearing Research, 29(1): 50–64, https://doi.org/10.1044/jshr.2901.50.
Barsties B., Maryn Y. (2012), Der acoustic voice quality index [in German: Ein Messverfahren zur allgemeinen Stimmqualitat], HNO, 60(8): 715–720, https://doi.org/10.1007/s00106-012-2499-9.
Behrbohm H., Kaschke O., Nawka T., Swift A.C. (2011), Ear, Nose and Throat Diseases with Head and Neck Surgery [in Polish: Choroby ucha, nosa i gardła z chirurgią głowy i szyi], 2nd ed., Edra Urban & Partner.
Boersma P. (2001), Praat, a system for doing phonetics by computer, Glot International, 5(9/10): 341–345.
Chen H.-C., Jen Y.-M., Wang C.-H., Lee J.-C., Lin Y.-S. (2007), Etiology of vocal cord paralysis, ORL, 69(3): 167–171, https://doi.org/10.1159/000099226.
Childers D.G., Lee C.K. (1991), Vocal quality factors: Analysis, synthesis, and perception, The Journal of the Acoustical Society of America, 90(5): 2394–2410, https://doi.org/10.1121/1.402044.
Compton E.C. et al. (2022), Developing an Artificial Intelligence tool to predict vocal cord pathology in primary care settings, The Laryngoscope, 133(8): 1531–4995, https://doi.org/10.1002/lary.30432.
Cooper W.E., Sorensen J.M. (1981), Fundamental Frequency in Sentence Production, Springer Science & Business Media.
Crowson M.G. et al. (2020), A contemporary review of machine learning in otolaryngology–head and neck surgery, The Laryngoscope, 130(1): 45–51, https://doi.org/10.1002/lary.27850.
Degottex G., Kane J., Drugman T., Raitio T., Scherer S. (2014), COVAREP – A collaborative voice analysis repository for speech technologies, [in:] 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 960–964, https://doi.org/10.1109/icassp.2014.6853739.
Dejonckere P.H. et al. (2001), A basic protocol for functional assessment of voice pathology, especially for investigating the efficacy of (phonosurgical) treatments and evaluating new assessment techniques, European Archives of Oto-rhino-laryngology, 258: 77–82, https://doi.org/10.1007/s004050000299.
Deliyski D.D., Shaw H.S., Evans M.K. (2005), Adverse effects of environmental noise on acoustic voice quality measurements, Journal of Voice, 19(1): 15–28, https://doi.org/10.1016/j.jvoice.2004.07.003.
Dibazar A.A., Berger T.W., Narayanan S.S. (2006), Pathological voice assessment, [in:] 2006 International Conference of the IEEE Engineering in Medicine and Biology Society, 2006: 1669–1673, https://doi.org/10.1109/IEMBS.2006.259835.
Friedman N., Geiger D., Goldszmidt M. (1997), Bayesian network classifiers, Machine Learning, 29: 131–163, https://doi.org/10.1023/A:1007465528199.
Godino-Llorente J.I., Gómez-Vilda P., Saenz-Lechón N., Blanco-Velasco M., Cruz-Roldan F., Ferrer-Ballester M.A. (2005), Support vector machines applied to the detection of voice disorders, [in:] Nonlinear Analyses and Algorithms for Speech Processing. NOLISP 2005. Lecture Notes in Computer Science, Faundez-Zanuy M., Janer L., Esposito A., Satue-Villar A., Roure J., Espinosa-Duro V. [Eds.], pp. 219–230, https://doi.org/10.1007/11613107_19.
Hacki T. (1989), Classification of glottal dysfunctions on the basis of electroglottography [in German: Klassifizierung von glottiscysfunktionen mit hilfe der elektroglottographie], Folia phoniatrica, 41(1): 43–48, https://doi.org/10.1159/000265931.
Hanson H.M. (1997), Glottal characteristics of female speakers: Acoustic correlates, The Journal of the Acoustical Society of America, 101(1): 466–481, https://doi.org/10.1121/1.417991.
Hillenbrand J., Houde R.A. (1996), Acoustic correlates of breathy vocal quality: Dysphonic voices and continuous speech, Journal of Speech, Language, and Hearing Research, 39(2): 311–321, https://doi.org/10.1044/jshr.3902.311.
Hirano M. (1981), Clinical Examination of Voice, Springer-Verlag, New York.
Hogikyan N.D. (2004), The voice-related quality of life (V-RQOL) measure: History and ongoing utility of a validated voice outcomes instrument, Perspectives on Voice and Voice Disorders, 14(1): 3–5, https://doi.org/ 10.1044/vvd14.1.3.
Hosokawa K. et al. (2017), Validation of the acoustic voice quality index in the Japanese language, Journal of Voice, 31(2): 260.e1–260.e9, https://doi.org/10.1016/j.jvoice.2016.05.010.
Ingrisano D.R., Perry C.K., Jepson K.R. (1998), Environmental noise: A threat to automatic voice analysis, American Journal of Speech-Language Pathology, 7(1): 91–96, doi: https://doi.org/10.1044/1058-0360.0701.91.
Jeong G.-E. et al. (2022), Treatment efficacy of voice therapy following injection laryngoplasty for unilateral vocal fold paralysis, Journal of Voice, 36(2): 242–248, https://doi.org/10.1016/j.jvoice.2020.05.014.
Kane J., Gobl C. (2011), Identifying regions of nonmodal phonation using features of the wavelet transform, [in:] Twelfth Annual Conference of the International Speech Communication Association, pp. 177–180, https://doi.org/10.21437/interspeech.2011-76.
Kane J., Gobl C. (2013), Wavelet maxima dispersion for breathy to tense voice discrimination, [in:] IEEE Transactions on Audio, Speech, and Language Processing, 21(6): 1170–1179, https://doi.org/10.1109/tasl.2013.2245653.
Kankare E. et al. (2020), The acoustic voice quality index version 02.02 in the Finnish-speaking population, Logopedics Phoniatrics Vocology, 45(2): 49–56, https://doi.org/10.1080/14015439.2018.1556332.
Kosztyła-Hojna B., Moskal D., Kuryliszyn-Moskal A., Rutkowski R. (2014), Visual assessment of voice disorders in patients with occupational dysphonia, Annals of Agricultural and Environmental Medicine, 21(4): 898–902, https://doi.org/10.5604/12321966.1129955.
Landwehr N., Hall M., Frank E. (2005), Logistic model trees, Machine Learning, 59: 161–205, https://doi.org/10.1007/s10994-005-0466-3.
Laukkanen A.-M., Rantala L. (2022), Does the acoustic voice quality index (AVQI) correlate with perceived creak and strain in normophonic young adult Finnish females?, Folia Phoniatrica et Logopaedica, 74(1): 62–69, https://doi.org/10.1159/000514796.
Majkowska M. (2004), Basic issues of voice emission and hygiene [in Polish: Podstawowe zagadnienia emisji i higieny głosu], [in:] Prace Naukowe Akademii im. Jana Długosza w Częstochowie, 5: 93–101.
Maryn Y., Corthals P., Van Cauwenberge P., Roy N., De Bodt M. (2010), Toward improved ecological validity in the acoustic measurement of overall voice quality: Combining continuous speech and sustained vowels, [in:] Journal of Voice, 24(5): 540–555, https://doi.org/10.1016/j.jvoice.2008.12.014.
Maryn Y., De Bodt M., Barsties B., Roy N. (2014), The value of the acoustic voice quality index as a measure of dysphonia severity in subjects speaking different languages, European Archives of Oto-Rhino-Laryngology, 271: 1609–1619, https://doi.org/10.1007/s00405-013-2730-7.
Maryn Y., Roy N. (2012), Sustained vowels and continuous speech in the auditory-perceptual evaluation of dysphonia severity, Jornal da Sociedade Brasileira de Fonoaudiologia, 24: 107–112, https://doi.org/10.1590/s2179-64912012000200003.
Maryn Y., Roy N., De Bodt M., Van Cauwenberge P., Corthals P. (2009), Acoustic measurement of overall voice quality: A meta-analysis, The Journal of the Acoustical Society of America, 126(5): 2619–2634, https://doi.org/10.1121/1.3224706.
Maryn Y., Weenink D. (2015), Objective dysphonia measures in the program Praat: smoothed cepstral peak prominence and acoustic voice quality index, Journal of Voice, 29(1): 35–43, https://doi.org/10.1016/j.jvoice.2014.06.015.
Montalbaron M.B. et al. (2023), Presumptive diagnosis in tele-health laryngology: A multi-center observational study, The Annals of Otology, Rhinology, and Laryngology, 132(12): 1511–1519, https://doi.org/10.1177/00034894231165811.
Nawka, T., Anders, L., Wendler, J. (1994), The auditory assessment of hoarse voices according to the RBH system [in German], Sprache, Stimme, Gehor, 18: 130–133.
Nemr K. et al. (2012), GRBAS and Cape-V scales: High reliability and consensus when applied at different times, Journal of Voice, 26(6): 812.e17–218.e22, https://doi.org/10.1016/j.jvoice.2012.03.005.
Parsa V., Jamieson D.G. (2001), Acoustic discrimination of pathological voice: Sustained vowels versus continuous speech, Journal of Speech, Language, and Hearing Research, 44(2): 327–339, https://doi.org/10.1044/1092-4388(2001/027).
Patel R.R. et al. (2018), Recommended protocols for instrumental assessment of voice: American Speech-Language-Hearing Association expert panel to develop a protocol for instrumental assessment of vocal function, American Journal of Speech-Language Pathology, 27(3): 887–905, https://doi.org/10.1044/2018 ajslp-17-0009.
Portney L.G., Watkins M.P. (2009), Foundations of Clinical Research: Applications to Practice, 3rd ed., Pearson/Prentice Hall Upper Saddle River, NJ.
Quinlan J.R. (1999), C4.5: Programs for Machine Learning, Morgan Kaufman.
Reynolds V. et al. (2012), Objective assessment of pediatric voice disorders with the acoustic voice quality index, Journal of Voice, 26(5): 672.e1–372.e7, https://doi.org/10.1016/j.jvoice.2012.02.002.
Roper T.A. (2014), Clinical Skills, 2nd ed., Oxford University Press.
Rosłanowski A. (2008), Phoniatric database [in Polish: Baza nagrań foniatrycznych], B.Eng., Polish-Japanese Academy of Information Technology.
Speyer R. et al. (2010), Maximum phonation time: Variability and reliability, Journal of Voice, 24(3): 281–284, https://doi.org/10.1016/j.jvoice.2008.10.004.
Suvvari T.K. (2023), The role of Artificial Intelligence in diagnosis and management of laryngeal disorders, Ear, Nose & Throat Journal, https://doi.org/10.1177/01455613231175053.
Szklanny K. (2019), Acoustic parameters in the evaluation of voice quality of choral singers. Prototype of mobile application for voice quality evaluation, Archives of Acoustics, 44(3): 439–446, https://doi.org/10.24425/aoa.2019.129257.
Szklanny K., Wrzeciono P. (2019), Relation of RBH auditory-perceptual scale to acoustic and electroglottographic voice analysis in children with vocal nodules, IEEE Access, 7: 41647–41658, https://doi.org/10.1109/ACCESS.2019.2907397.
Tadeusiewicz R. (1988), Speech Signal [in Polish: Sygnał mowy], Wydawnictwa Komunikacji i Łączności, Warszawa.
Tirronen S., Javanmardi F., Kodali M., Reddy Kadiri S., Alku P. (2023), Utilizing Wav2Vec in database-independent voice disorder detection, [in:] ICASSP 2023 – 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5, https://doi.org/10.1109/ICASSP49357.2023.10094798.
Uloza V., Petrauskas T., Padervinskis E., Ulozaite N., Barsties B., Maryn Y. (2017), Validation of the acoustic voice quality index in the Lithuanian language, Journal of Voice, 31(2): 257.e1–257.e11, https://doi.org/10.1016/j.jvoice.2016.06.002.
Verde L., De Pietro G., Sannino G. (2018), Voice disorder identification by using machine learning techniques, IEEE access, 6: 16246–16255, https://doi.org/10.1109/access.2018.2816338.
Verikas A., Gelzinis A., Bacauskiene M., Uloza V. (2006), Towards a computer-aided diagnosis system for vocal cord diseases, Artificial Intelligence in Medicine, 36(1): 71–84, https://doi.org/10.1016/j.artmed.2004.11.001.
Wilson J., Webb A., Carding P., Steen I., MacKenzie K., Deary I. (2004), The voice symptom scale (VoiSS) and the vocal handicap index (VHI): A comparison of structure and content, Clinical Otolaryngology & Allied Sciences, 29(2): 169–174, https://doi.org/10.1111/j.0307-7772.2004.00775.x.
DOI: 10.24425/aoa.2024.148818