A Robust Voice Pathology Detection System Based on the Combined BiLSTM–CNN Architecture

Rimah Amami; Rim Amami; Chiraz Trabelsi; Sherin Hassan Mabrouk; Hassan A. Khalil

doi:10.13164/mendel.2023.2.202

Rimah Amami Computer Department, Deanship of Preparatory Year and Supporting Studies, Imam AbdulRahman bin Faisal University, Dammam, KSA
Rim Amami Basic Science Department, Deanship of Preparatory Year and Supporting Studies, Imam AbdulRahman bin Faisal University, Dammam, KSA
Chiraz Trabelsi Institut Montpellierain Alexander Grothendieck, UMR CNRS 5149, Place Eugene Bataillon, 34090, Montpellier, France
Sherin Hassan Mabrouk Self-Development Department, Deanship of Preparatory Year and Supporting Studies, Imam AbdulRahman bin Faisal University, Dammam, KSA
Hassan A. Khalil Department of Mathematics, Faculty of Science, Zagazig University, Zagazig 44519, Egypt

DOI: https://doi.org/10.13164/mendel.2023.2.202

Keywords: Voice Pathology Detection, Convolutional Neural Network, BiLSTM, Hybrid Systems, MEEI Voice Disorders Database

Abstract

Voice recognition systems have become increasingly important in recent years due to the growing need for more efficient and intuitive human-machine interfaces. The use of Hybrid LSTM networks and deep learning has been very successful in improving speech detection systems. The aim of this paper is to develop a novel approach for the detection of voice pathologies using a hybrid deep learning model that combines the Bidirectional Long Short-Term Memory (BiLSTM) and the Convolutional Neural Network (CNN) architectures. The proposed model uses a combination of temporal and spectral features extracted from speech signals to detect the different types of voice pathologies. The performance of the proposed detection model is evaluated on a publicly available dataset of speech signals from individuals with various voice pathologies(MEEI database). The experimental results showed that the hybrid BiLSTM-CNN model outperforms several classifiers by achieving an accuracy of 98.86\%. The proposed model has the potential to assist health care professionals in the accurate diagnosis and treatment of voice pathologies, and improving the quality of life for affected individuals.

References

Amami, R., Al Saif, S. A., Amami, R., Eleraky, H. A., Melouli, F., and Baazaoui, M. The use of an incremental learning algorithm for diagnosing covid-19 from chest x-ray images. MENDEL 28, 1 (2022), 1–7.

Amodei, D., Ananthanarayanan, S., Anubhai, R., Bai, J., Battenberg, E., Case, C., Casper, J., Catanzaro, B., Cheng, Q., Chen, G., et al. Deep speech 2: End-toend speech recognition in english and mandarin. In International conference on machine learning (2016), PMLR, pp. 173–182.

AnilKumar, V., and Reddy, R. V. S. Classification of voice pathology using different features and bi-lstm. In 2023 International Conference on Smart Systems for applications in Electrical Sciences (ICSSES) (2023), IEEE, pp. 1–4.

Chorowski, J. K., Bahdanau, D., Serdyuk, D., Cho, K., and Bengio, Y. Attention-based models for speech recognition. Advances in neural information processing systems 28 (2015).

David Sztaho, K. G., and Gabriel, T. M. Deep learning solution for pathological voice detection using lstm-based autoencoder hybrid with multi-task learning. In I14th International Joint Conference on Biomedical Engineering Systems and Technologies (2021), pp. 135–141.

Fu, D., Zhang, X., Chen, D., and Hu, W. Pathological voice detection based on phase reconstitution and convolutional neural network. Journal of Voice (2022).

Gers, F. A., Schraudolph, N. N., and Schmidhuber, J. Learning precise timing with lstm recurrent networks. Journal of machine learning research 3, Aug (2002), 115–143.

Graves, A., and Jaitly, N. Towards end-toend speech recognition with recurrent neural networks. In International conference on machine learning (2014), PMLR, pp. 1764–1772.

Graves, A., Jaitly, N., and Mohamed, A.- r. Hybrid speech recognition with deep bidirectionallstm. In 2013 IEEE workshop on automatic speech recognition and understanding (2013), IEEE, pp. 273–278.

Graves, A., Mohamed, A.-r., and Hinton, G. Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing (2013), Ieee, pp. 6645–6649.

Graves, A., and Schmidhuber, J. Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural networks 18, 5-6 (2005), 602–610.

Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E., Prenger, R., Satheesh, S., Sengupta, S., Coates, A., et al. Deep speech: Scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567 (2014).

Hema, C., and Marquez, F. P. G. Emotional speech recognition using cnn and deep learning techniques. Applied Acoustics 211 (2023), 109492.

Kim, M. H., Kim, J. H., Lee, K., and Gim, G.-Y. The prediction of covid-19 using lstm algorithms. International Journal of Networked and Distributed Computing 9, 1 (2021), 19–24.

Ksibi, A., Hakami, N. A., Alturki, N., Asiri, M. M., Zakariah, M., and Ayadi, M. Voice pathology detection using a two-level classifier based on combined cnn–rnn architecture. Sustainability 15, 4 (2023), 3204.

Minh, H. T., Anh, T. P., et al. A novel lightweight dcnn model for classifying plant diseases on internet of things edge devices. MENDEL 28, 2 (2022), 41–48.

Oord, A. v. d., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A., and Kavukcuoglu, K. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016).

Parak, R., and Juricek, M. Intelligent sampling of anterior human nasal swabs using a collaborative robotic arm. MENDEL 28, 1 (2022), 32–40.

Pittala, R. B., Tejopriya, B., and Pala, E. Study of speech recognition using cnn. In 2022 Second International Conference on Artificial Intelligence and Smart Energy (ICAIS) (2022), IEEE, pp. 150–155.

Rather, A. M. Lstm-based deep learning model for stock prediction and predictive optimization model. EURO Journal on Decision Processes 9 (2021), 100001.

Sak, H., Senior, A., and Beaufays, F. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. Interspeech 2014 (2014).

Saon, G., Soltau, H., Emami, A., and Picheny, M. Unfolded recurrent neural networks for speech recognition. In Fifteenth Annual Conference of the International Speech Communication Association (2014).

Schuler, J. P. S., Romani, S., Abdel- Nasser, M., Rashwan, H., and Puig, D. Color-aware two-branch dcnn for efficient plant disease classification. MENDEL 28, 1 (2022), 55–62.

Souli, S., Amami, R., Soltani, A., and Yahia, S. B. On the use of deep learning and scattering transform for pathological voices recognition. In 2022 8th International Conference on Control, Decision and Information Technologies (CoDIT) (2022), vol. 1, IEEE, pp. 1055–1058.

Souli, S., Amami, R., and Yahia, S. B. A robust pathological voices recognition system based on dcnn and scattering transform. Applied Acoustics 177 (2021), 107854.