Comparative Analysis of Various Transformation Techniques for Voiceless Consonants Modeling

Grazina Korvel, Bozena Kostek, Olga Kurasova

Abstract


In this paper, a comparison of various transformation techniques, namely Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT) and Discrete Walsh Hadamard Transform (DWHT) is performed in the context of their application to voiceless consonant modeling. Speech features based on these transformation techniques are extracted. These features are mean and derivative values of cepstrum coefficients, derived from each transformation. Fea-ture extraction is performed on the speech signal divided into short-time seg-ments. The kNN and Naive Bayes methods are used for phoneme classification. Experiments show that DFT and DCT give better classification accuracy than DWHT. The result of DFT was not significantly different from DCT, but it was for DWHT. The same tendency was revealed for DCT. It was checked with the usage of the ANOVA test that the difference between results obtained by DCT and DWHT is significant.

Keywords


DFT, DCT, DWHT, cepstrum coefficients

Full Text:

PDF

References


Ali, A. M. A.; Van der Spiegel, J.; Mueller, P. (2001); Acoustic-Phonetic Features for the Automatic Classification of Stop Consonants, IEEE Transactions on Speech and Audio Processing, 9(8), 833-841, 2001.
https://doi.org/10.1109/89.966086

Czyzewski, A.; Piotrowska, M.; Kostek B. (2017); Analysis of Allophones Based on Audio Signal Recordings and Parameterization, The Journal of the Acoustical Society of America, 141 (5), 3521-3521, 2017.

De Muth, J. E. (2014); Basic Statistics and Pharmaceutical Statistical Applications, 3rd edn, CRC Press, 2014.

Donnelly, D. (2006); The Fast Fourier and Hilbert-Huang Transforms: A Comparison, In- ternational Journal of Computers Communications & Control, 1 (4), 45-52, 2006.

Heinzel, G.; Rudiger; A., Schilling, R, (2002); Spectrum and Spectral Density Estimation by the Discrete Fourier Transform (DFT), Including a Comprehensive List of Window Functions and Some New Flat-top Windows, Internal Report, Max-Planck-Institut fur Grav- itationsphysik, Hannover, 2002.

Kasparaitis, P. (2005); Diphone Databases for Lithuanian Text-to-speech Synthesis. Informatica, 193-202, 2005.

Kekre, H. B., Kulkarni, V. (2011); Speaker Identification using Row Mean of DCT and Walsh Hadamard Transform, International Journal on Computer Science and Engineering, 3(3), 1295-1301, 2011

Kim C.; Stern R. M. (2016); Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition, IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 24(7), 1315-1329, 2016.

Korvel, G.; Kostek, B. (2017); Examining Feature Vector for Phoneme Recognition, Pro- ceeding of IEEE International Symposium on Signal Processing and Information Technology, ISSPIT 2017, Bilbao, Spain, 394-398, 2017.

Korvel, G.; Kostek, B. (2017); Voiceless Stop Consonant Modelling and Synthesis Framework Based on MISO Dynamic System, Archives of Acoustics, 3, 42, 375-383, 2017.

Kotsiantis, S. B. (2007); Supervised Machine Learning: A Review of Classification Techniques, Informatica, 31(3), 249-268, 2007.

Lee, S. M.; Choi J. Y.(2012); Analysis of Acoustic Parameters for Consonant Voicing Classi fication in Vlean and Telephone Speech, The Journal of the Acoustical Society of America, 131, EL197 (2012); doi: 10.1121/1.3678667
https://doi.org/10.1121/1.3678667

Manocha S.; Girolami M. A. (2007); An Empirical Analysis of the Probabilistic K-nearest Neighbour Classifier, Pattern Recognition Letters, 28, 1818-1824, 2007.
https://doi.org/10.1016/j.patrec.2007.05.018

Milner, B.; Shao X. (2002); Speech Reconstruction from Mel-Frequency Cepstral Coefficients using a Source-Filter Model, 7th International Conference on Spoken Language Processing, Denver, Colorado, USA, 2421-2424, 2002.

Mitra V.; Sivaraman G.; Nam H.; Espy-Wilson C.; Saltzman E.; Tiede M. (2017); Hybrid Convolutional Neural Networks for Articulatory and Acoustic Information Based Speech Recognition, Speech Communication, 89, 103-112, 2017.
https://doi.org/10.1016/j.specom.2017.03.003

Mitra, V.; Franco, H.; Graciarena, M.; Vergyri D. (2014); Medium-Duration Modulation Cepstral Feature for Robust Speech Recognition., IEEE International Conference on Acous- tics, Speech and Signal Processing (ICASSP), 1749-1753, 2014.

Noroozi, F.; Kaminska, D.; Sapinski, T.; Anbarjafari, G. (2017); Supervised Vocal-Based Emotion Recognition Using Multiclass Support Vector Machine, Random Forests, and Adaboost, Journal of the Audio Engineering Society, 65(7/8), 562-572, 2017.
https://doi.org/10.17743/jaes.2017.0022

Oppenheim, A. V.; Schafer, R. W.; Buck, J. R. (1999); Prentice-Hall Signal Processing Series Discrete-Time Signal Processing, 2nd edn. Prentice Hall, Inc., New Jersey, 1999.

Pravin, S. C.; Anjana, R.; Pandiyan, T. P.; Ranganath, S. K.; Rangarajan P. (2017); ANN Based Disfluent Speech Classification, Artificial Intelligent Systems and Machine Learning, 9(4), 77-80, 2017.

Project LIEPA Homepage, https://www.rastija.lt/liepa/about-project-liepa/7596, accessed on 2018/03/02.

Pruthi T.; Espy-Wilson C. (2003); Automatic Classification of Nasals and Semivowels, ICPhS 2003-15th International Congress of Phonetic Sciences, 3061-3064, 2003

Pyz, G.; Simonyte, V.; Slivinskas, V. (2014); Developing Models of Lithuanian Speech Vowels and Semivowels, Informatica, 25 (1), 55-72, 2014.
https://doi.org/10.15388/Informatica.2014.04

Rao, K. R.; Yip, P. (1990); Discrete Cosine Transform: Algorithms, Advantages, Applica- tions, 1st edn, Academic Press, 1990.
https://doi.org/10.1016/B978-0-08-092534-9.50007-2

Ravanelli M.; Brakel P.; Omologo M.; Bengio Y. (2017); A Network of Deep Neural Networks for Distant Speech Recognition, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4880-4884, 2017.

Sammut C.; Webb G. I. (2011); Encyclopedia of Machine Learning. Springer Science & Business Media, Springer New York, 2011.

Smith, S. W. (1999); The Scientist and Engineer's Guide to Digital Signal Processing, 2nd edn. California Technical Publishing, San Diego, California, 1999.

Sundararajan, D. (2001); The Discrete Fourier Transform - Theory, Algorithms and Appli- cations, World Scientific, 2001.

Tamulevicius, G.; Liogiene, T. (2015); Low-Order Multi-Level Features for Speech Emotion Recognition, Baltic Journal of Modern Computing, 4(3), 234-247, 2015.

Teodorescu H.N.L. (2015), A Retrospective Assessment of Fuzzy Logic Applications in Voice Communications and Speech Analytics, International Journal of Computers Communica- tions & Control, 10 (6), 105-112, 2015.
https://doi.org/10.15837/ijccc.2015.6.2077

Teodorescu H.N.L. (2015); Fuzzy Logic in Speech Technology-Introductory and Overviewing Glimpses, Fifty Years of Fuzzy Logic and its Applications, 581-608, 2015.

Thasleema T. M.; Narayanan N. K.: Consonant Classification using Decision Directed Acyclic Graph Support Vector Machine Algorithm, International Journal of Signal Pro- cessing, Image Processing and Pattern Recognition, 6(1), 59-74, 2013.

Tzinis E.; Potamianos A. (2017); Segment-Based Speech Emotion Recognition using Recurrent Neural Networks, Seventh International Conference on A ective Computing and Intelligent Interaction (ACII), 190-195, 2017.

Velican, V.; Strungaru, R.; Grigore, O. (2012); Automatic Recognition of Improperly Pronounced Initial 'r' Consonant in Romanian, Advances in Electrical and Computer Engineer- ing, 12 (3), 79-84, 2012.
https://doi.org/10.4316/aece.2012.03012




DOI: https://doi.org/10.15837/ijccc.2018.5.3310



Copyright (c) 2018 Grazina Korvel, Bozena Kostek, Olga Kurasova

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

CC-BY-NC  License for Website User

Articles published in IJCCC user license are protected by copyright.

Users can access, download, copy, translate the IJCCC articles for non-commercial purposes provided that users, but cannot redistribute, display or adapt:

  • Cite the article using an appropriate bibliographic citation: author(s), article title, journal, volume, issue, page numbers, year of publication, DOI, and the link to the definitive published version on IJCCC website;
  • Maintain the integrity of the IJCCC article;
  • Retain the copyright notices and links to these terms and conditions so it is clear to other users what can and what cannot be done with the  article;
  • Ensure that, for any content in the IJCCC article that is identified as belonging to a third party, any re-use complies with the copyright policies of that third party;
  • Any translations must prominently display the statement: "This is an unofficial translation of an article that appeared in IJCCC. Agora University  has not endorsed this translation."

This is a non commercial license where the use of published articles for commercial purposes is forbiden. 

Commercial purposes include: 

  • Copying or downloading IJCCC articles, or linking to such postings, for further redistribution, sale or licensing, for a fee;
  • Copying, downloading or posting by a site or service that incorporates advertising with such content;
  • The inclusion or incorporation of article content in other works or services (other than normal quotations with an appropriate citation) that is then available for sale or licensing, for a fee;
  • Use of IJCCC articles or article content (other than normal quotations with appropriate citation) by for-profit organizations for promotional purposes, whether for a fee or otherwise;
  • Use for the purposes of monetary reward by means of sale, resale, license, loan, transfer or other form of commercial exploitation;

    The licensor cannot revoke these freedoms as long as you follow the license terms.

[End of CC-BY-NC  License for Website User]


INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL (IJCCC), With Emphasis on the Integration of Three Technologies (C & C & C),  ISSN 1841-9836.

IJCCC was founded in 2006,  at Agora University, by  Ioan DZITAC (Editor-in-Chief),  Florin Gheorghe FILIP (Editor-in-Chief), and  Misu-Jan MANOLESCU (Managing Editor).

Ethics: This journal is a member of, and subscribes to the principles of, the Committee on Publication Ethics (COPE).

Ioan  DZITAC (Editor-in-Chief) at COPE European Seminar, Bruxelles, 2015:

IJCCC is covered/indexed/abstracted in Science Citation Index Expanded (since vol.1(S),  2006); JCR2016: IF=1.374. .

IJCCC is indexed in Scopus from 2008 (CiteScore 2017 = 1.04; SNIP2017 = 0.616, SJR2017 =0.326):

Nomination by Elsevier for Journal Excellence Award Romania 2015 (SNIP2014 = 1.029): Elsevier/ Scopus

IJCCC was nominated by Elsevier for Journal Excellence Award - "Scopus Awards Romania 2015" (SNIP2014 = 1.029).

IJCCC is in Top 3 of 157 Romanian journals indexed by Scopus (in all fields) and No.1 in Computer Science field by Elsevier/ Scopus.