Comparative Analysis of Various Transformation Techniques for Voiceless Consonants Modeling
Keywords:
DFT, DCT, DWHT, cepstrum coefficientsAbstract
In this paper, a comparison of various transformation techniques, namely Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT) and Discrete Walsh Hadamard Transform (DWHT) is performed in the context of their application to voiceless consonant modeling. Speech features based on these transformation techniques are extracted. These features are mean and derivative values of cepstrum coefficients, derived from each transformation. Fea-ture extraction is performed on the speech signal divided into short-time seg-ments. The kNN and Naive Bayes methods are used for phoneme classification. Experiments show that DFT and DCT give better classification accuracy than DWHT. The result of DFT was not significantly different from DCT, but it was for DWHT. The same tendency was revealed for DCT. It was checked with the usage of the ANOVA test that the difference between results obtained by DCT and DWHT is significant.References
Ali, A. M. A.; Van der Spiegel, J.; Mueller, P. (2001); Acoustic-Phonetic Features for the Automatic Classification of Stop Consonants, IEEE Transactions on Speech and Audio Processing, 9(8), 833-841, 2001. https://doi.org/10.1109/89.966086
Czyzewski, A.; Piotrowska, M.; Kostek B. (2017); Analysis of Allophones Based on Audio Signal Recordings and Parameterization, The Journal of the Acoustical Society of America, 141 (5), 3521-3521, 2017.
De Muth, J. E. (2014); Basic Statistics and Pharmaceutical Statistical Applications, 3rd edn, CRC Press, 2014.
Donnelly, D. (2006); The Fast Fourier and Hilbert-Huang Transforms: A Comparison, In- ternational Journal of Computers Communications & Control, 1 (4), 45-52, 2006.
Heinzel, G.; Rudiger; A., Schilling, R, (2002); Spectrum and Spectral Density Estimation by the Discrete Fourier Transform (DFT), Including a Comprehensive List of Window Functions and Some New Flat-top Windows, Internal Report, Max-Planck-Institut fur Grav- itationsphysik, Hannover, 2002.
Kasparaitis, P. (2005); Diphone Databases for Lithuanian Text-to-speech Synthesis. Informatica, 193-202, 2005.
Kekre, H. B., Kulkarni, V. (2011); Speaker Identification using Row Mean of DCT and Walsh Hadamard Transform, International Journal on Computer Science and Engineering, 3(3), 1295-1301, 2011
Kim C.; Stern R. M. (2016); Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition, IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 24(7), 1315-1329, 2016.
Korvel, G.; Kostek, B. (2017); Examining Feature Vector for Phoneme Recognition, Pro- ceeding of IEEE International Symposium on Signal Processing and Information Technology, ISSPIT 2017, Bilbao, Spain, 394-398, 2017.
Korvel, G.; Kostek, B. (2017); Voiceless Stop Consonant Modelling and Synthesis Framework Based on MISO Dynamic System, Archives of Acoustics, 3, 42, 375-383, 2017.
Kotsiantis, S. B. (2007); Supervised Machine Learning: A Review of Classification Techniques, Informatica, 31(3), 249-268, 2007.
Lee, S. M.; Choi J. Y.(2012); Analysis of Acoustic Parameters for Consonant Voicing Classi fication in Vlean and Telephone Speech, The Journal of the Acoustical Society of America, 131, EL197 (2012); doi: 10.1121/1.3678667 https://doi.org/10.1121/1.3678667
Manocha S.; Girolami M. A. (2007); An Empirical Analysis of the Probabilistic K-nearest Neighbour Classifier, Pattern Recognition Letters, 28, 1818-1824, 2007. https://doi.org/10.1016/j.patrec.2007.05.018
Milner, B.; Shao X. (2002); Speech Reconstruction from Mel-Frequency Cepstral Coefficients using a Source-Filter Model, 7th International Conference on Spoken Language Processing, Denver, Colorado, USA, 2421-2424, 2002.
Mitra V.; Sivaraman G.; Nam H.; Espy-Wilson C.; Saltzman E.; Tiede M. (2017); Hybrid Convolutional Neural Networks for Articulatory and Acoustic Information Based Speech Recognition, Speech Communication, 89, 103-112, 2017. https://doi.org/10.1016/j.specom.2017.03.003
Mitra, V.; Franco, H.; Graciarena, M.; Vergyri D. (2014); Medium-Duration Modulation Cepstral Feature for Robust Speech Recognition., IEEE International Conference on Acous- tics, Speech and Signal Processing (ICASSP), 1749-1753, 2014.
Noroozi, F.; Kaminska, D.; Sapinski, T.; Anbarjafari, G. (2017); Supervised Vocal-Based Emotion Recognition Using Multiclass Support Vector Machine, Random Forests, and Adaboost, Journal of the Audio Engineering Society, 65(7/8), 562-572, 2017. https://doi.org/10.17743/jaes.2017.0022
Oppenheim, A. V.; Schafer, R. W.; Buck, J. R. (1999); Prentice-Hall Signal Processing Series Discrete-Time Signal Processing, 2nd edn. Prentice Hall, Inc., New Jersey, 1999.
Pravin, S. C.; Anjana, R.; Pandiyan, T. P.; Ranganath, S. K.; Rangarajan P. (2017); ANN Based Disfluent Speech Classification, Artificial Intelligent Systems and Machine Learning, 9(4), 77-80, 2017.
Project LIEPA Homepage, https://www.rastija.lt/liepa/about-project-liepa/7596, accessed on 2018/03/02.
Pruthi T.; Espy-Wilson C. (2003); Automatic Classification of Nasals and Semivowels, ICPhS 2003-15th International Congress of Phonetic Sciences, 3061-3064, 2003
Pyz, G.; Simonyte, V.; Slivinskas, V. (2014); Developing Models of Lithuanian Speech Vowels and Semivowels, Informatica, 25 (1), 55-72, 2014. https://doi.org/10.15388/Informatica.2014.04
Rao, K. R.; Yip, P. (1990); Discrete Cosine Transform: Algorithms, Advantages, Applica- tions, 1st edn, Academic Press, 1990. https://doi.org/10.1016/B978-0-08-092534-9.50007-2
Ravanelli M.; Brakel P.; Omologo M.; Bengio Y. (2017); A Network of Deep Neural Networks for Distant Speech Recognition, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4880-4884, 2017.
Sammut C.; Webb G. I. (2011); Encyclopedia of Machine Learning. Springer Science & Business Media, Springer New York, 2011.
Smith, S. W. (1999); The Scientist and Engineer's Guide to Digital Signal Processing, 2nd edn. California Technical Publishing, San Diego, California, 1999.
Sundararajan, D. (2001); The Discrete Fourier Transform - Theory, Algorithms and Appli- cations, World Scientific, 2001.
Tamulevicius, G.; Liogiene, T. (2015); Low-Order Multi-Level Features for Speech Emotion Recognition, Baltic Journal of Modern Computing, 4(3), 234-247, 2015.
Teodorescu H.N.L. (2015), A Retrospective Assessment of Fuzzy Logic Applications in Voice Communications and Speech Analytics, International Journal of Computers Communica- tions & Control, 10 (6), 105-112, 2015. https://doi.org/10.15837/ijccc.2015.6.2077
Teodorescu H.N.L. (2015); Fuzzy Logic in Speech Technology-Introductory and Overviewing Glimpses, Fifty Years of Fuzzy Logic and its Applications, 581-608, 2015.
Thasleema T. M.; Narayanan N. K.: Consonant Classification using Decision Directed Acyclic Graph Support Vector Machine Algorithm, International Journal of Signal Pro- cessing, Image Processing and Pattern Recognition, 6(1), 59-74, 2013.
Tzinis E.; Potamianos A. (2017); Segment-Based Speech Emotion Recognition using Recurrent Neural Networks, Seventh International Conference on A ective Computing and Intelligent Interaction (ACII), 190-195, 2017.
Velican, V.; Strungaru, R.; Grigore, O. (2012); Automatic Recognition of Improperly Pronounced Initial 'r' Consonant in Romanian, Advances in Electrical and Computer Engineer- ing, 12 (3), 79-84, 2012. https://doi.org/10.4316/aece.2012.03012
Published
Issue
Section
License
ONLINE OPEN ACCES: Acces to full text of each article and each issue are allowed for free in respect of Attribution-NonCommercial 4.0 International (CC BY-NC 4.0.
You are free to:
-Share: copy and redistribute the material in any medium or format;
-Adapt: remix, transform, and build upon the material.
The licensor cannot revoke these freedoms as long as you follow the license terms.
DISCLAIMER: The author(s) of each article appearing in International Journal of Computers Communications & Control is/are solely responsible for the content thereof; the publication of an article shall not constitute or be deemed to constitute any representation by the Editors or Agora University Press that the data presented therein are original, correct or sufficient to support the conclusions reached or that the experiment design or methodology is adequate.