Comparative Analysis of Various Transformation Techniques for Voiceless Consonants Modeling

Authors

  • Grazina Korvel Institute of Data Science and Digital Technologies, Vilnius University
  • Bozena Kostek Audio Acoustics Laboratory, Faculty of Electronics, Telecommunications and Informatics, GdaÅ„sk University of Technology
  • Olga Kurasova Institute of Data Science and Digital Technologies, Vilnius University

Keywords:

DFT, DCT, DWHT, cepstrum coefficients

Abstract

In this paper, a comparison of various transformation techniques, namely Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT) and Discrete Walsh Hadamard Transform (DWHT) is performed in the context of their application to voiceless consonant modeling. Speech features based on these transformation techniques are extracted. These features are mean and derivative values of cepstrum coefficients, derived from each transformation. Fea-ture extraction is performed on the speech signal divided into short-time seg-ments. The kNN and Naive Bayes methods are used for phoneme classification. Experiments show that DFT and DCT give better classification accuracy than DWHT. The result of DFT was not significantly different from DCT, but it was for DWHT. The same tendency was revealed for DCT. It was checked with the usage of the ANOVA test that the difference between results obtained by DCT and DWHT is significant.

References

Ali, A. M. A.; Van der Spiegel, J.; Mueller, P. (2001); Acoustic-Phonetic Features for the Automatic Classification of Stop Consonants, IEEE Transactions on Speech and Audio Processing, 9(8), 833-841, 2001. https://doi.org/10.1109/89.966086

Czyzewski, A.; Piotrowska, M.; Kostek B. (2017); Analysis of Allophones Based on Audio Signal Recordings and Parameterization, The Journal of the Acoustical Society of America, 141 (5), 3521-3521, 2017.

De Muth, J. E. (2014); Basic Statistics and Pharmaceutical Statistical Applications, 3rd edn, CRC Press, 2014.

Donnelly, D. (2006); The Fast Fourier and Hilbert-Huang Transforms: A Comparison, In- ternational Journal of Computers Communications & Control, 1 (4), 45-52, 2006.

Heinzel, G.; Rudiger; A., Schilling, R, (2002); Spectrum and Spectral Density Estimation by the Discrete Fourier Transform (DFT), Including a Comprehensive List of Window Functions and Some New Flat-top Windows, Internal Report, Max-Planck-Institut fur Grav- itationsphysik, Hannover, 2002.

Kasparaitis, P. (2005); Diphone Databases for Lithuanian Text-to-speech Synthesis. Informatica, 193-202, 2005.

Kekre, H. B., Kulkarni, V. (2011); Speaker Identification using Row Mean of DCT and Walsh Hadamard Transform, International Journal on Computer Science and Engineering, 3(3), 1295-1301, 2011

Kim C.; Stern R. M. (2016); Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition, IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 24(7), 1315-1329, 2016.

Korvel, G.; Kostek, B. (2017); Examining Feature Vector for Phoneme Recognition, Pro- ceeding of IEEE International Symposium on Signal Processing and Information Technology, ISSPIT 2017, Bilbao, Spain, 394-398, 2017.

Korvel, G.; Kostek, B. (2017); Voiceless Stop Consonant Modelling and Synthesis Framework Based on MISO Dynamic System, Archives of Acoustics, 3, 42, 375-383, 2017.

Kotsiantis, S. B. (2007); Supervised Machine Learning: A Review of Classification Techniques, Informatica, 31(3), 249-268, 2007.

Lee, S. M.; Choi J. Y.(2012); Analysis of Acoustic Parameters for Consonant Voicing Classi fication in Vlean and Telephone Speech, The Journal of the Acoustical Society of America, 131, EL197 (2012); doi: 10.1121/1.3678667 https://doi.org/10.1121/1.3678667

Manocha S.; Girolami M. A. (2007); An Empirical Analysis of the Probabilistic K-nearest Neighbour Classifier, Pattern Recognition Letters, 28, 1818-1824, 2007. https://doi.org/10.1016/j.patrec.2007.05.018

Milner, B.; Shao X. (2002); Speech Reconstruction from Mel-Frequency Cepstral Coefficients using a Source-Filter Model, 7th International Conference on Spoken Language Processing, Denver, Colorado, USA, 2421-2424, 2002.

Mitra V.; Sivaraman G.; Nam H.; Espy-Wilson C.; Saltzman E.; Tiede M. (2017); Hybrid Convolutional Neural Networks for Articulatory and Acoustic Information Based Speech Recognition, Speech Communication, 89, 103-112, 2017. https://doi.org/10.1016/j.specom.2017.03.003

Mitra, V.; Franco, H.; Graciarena, M.; Vergyri D. (2014); Medium-Duration Modulation Cepstral Feature for Robust Speech Recognition., IEEE International Conference on Acous- tics, Speech and Signal Processing (ICASSP), 1749-1753, 2014.

Noroozi, F.; Kaminska, D.; Sapinski, T.; Anbarjafari, G. (2017); Supervised Vocal-Based Emotion Recognition Using Multiclass Support Vector Machine, Random Forests, and Adaboost, Journal of the Audio Engineering Society, 65(7/8), 562-572, 2017. https://doi.org/10.17743/jaes.2017.0022

Oppenheim, A. V.; Schafer, R. W.; Buck, J. R. (1999); Prentice-Hall Signal Processing Series Discrete-Time Signal Processing, 2nd edn. Prentice Hall, Inc., New Jersey, 1999.

Pravin, S. C.; Anjana, R.; Pandiyan, T. P.; Ranganath, S. K.; Rangarajan P. (2017); ANN Based Disfluent Speech Classification, Artificial Intelligent Systems and Machine Learning, 9(4), 77-80, 2017.

Project LIEPA Homepage, https://www.rastija.lt/liepa/about-project-liepa/7596, accessed on 2018/03/02.

Pruthi T.; Espy-Wilson C. (2003); Automatic Classification of Nasals and Semivowels, ICPhS 2003-15th International Congress of Phonetic Sciences, 3061-3064, 2003

Pyz, G.; Simonyte, V.; Slivinskas, V. (2014); Developing Models of Lithuanian Speech Vowels and Semivowels, Informatica, 25 (1), 55-72, 2014. https://doi.org/10.15388/Informatica.2014.04

Rao, K. R.; Yip, P. (1990); Discrete Cosine Transform: Algorithms, Advantages, Applica- tions, 1st edn, Academic Press, 1990. https://doi.org/10.1016/B978-0-08-092534-9.50007-2

Ravanelli M.; Brakel P.; Omologo M.; Bengio Y. (2017); A Network of Deep Neural Networks for Distant Speech Recognition, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4880-4884, 2017.

Sammut C.; Webb G. I. (2011); Encyclopedia of Machine Learning. Springer Science & Business Media, Springer New York, 2011.

Smith, S. W. (1999); The Scientist and Engineer's Guide to Digital Signal Processing, 2nd edn. California Technical Publishing, San Diego, California, 1999.

Sundararajan, D. (2001); The Discrete Fourier Transform - Theory, Algorithms and Appli- cations, World Scientific, 2001.

Tamulevicius, G.; Liogiene, T. (2015); Low-Order Multi-Level Features for Speech Emotion Recognition, Baltic Journal of Modern Computing, 4(3), 234-247, 2015.

Teodorescu H.N.L. (2015), A Retrospective Assessment of Fuzzy Logic Applications in Voice Communications and Speech Analytics, International Journal of Computers Communica- tions & Control, 10 (6), 105-112, 2015. https://doi.org/10.15837/ijccc.2015.6.2077

Teodorescu H.N.L. (2015); Fuzzy Logic in Speech Technology-Introductory and Overviewing Glimpses, Fifty Years of Fuzzy Logic and its Applications, 581-608, 2015.

Thasleema T. M.; Narayanan N. K.: Consonant Classification using Decision Directed Acyclic Graph Support Vector Machine Algorithm, International Journal of Signal Pro- cessing, Image Processing and Pattern Recognition, 6(1), 59-74, 2013.

Tzinis E.; Potamianos A. (2017); Segment-Based Speech Emotion Recognition using Recurrent Neural Networks, Seventh International Conference on A ective Computing and Intelligent Interaction (ACII), 190-195, 2017.

Velican, V.; Strungaru, R.; Grigore, O. (2012); Automatic Recognition of Improperly Pronounced Initial 'r' Consonant in Romanian, Advances in Electrical and Computer Engineer- ing, 12 (3), 79-84, 2012. https://doi.org/10.4316/aece.2012.03012

Published

2018-09-29

Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.