Spectro-temporal analysis of speech for spanish phoneme recognition

Javier Serrano, Jordi Carrabina

Research output: Chapter in Book/Report/Conference proceedingConference proceeding

3 Citations (Scopus)

Abstract

State of the art speech recognition systems (ASR), mostly use Mel-Frequency cepstral coefficients (MFCC), as acoustic features. In this paper, we propose a new discriminative analysis of acoustic features, based on spectrogram analysis. Both spectral and temporal variations of speech signal are considered. This has improved the recognition performance especially in case of noisy situation and phonemes with time domain modulations such as stops. In this method, the 2D Discrete Cosine Transform (DCT) is applied on small overlapped 2D Hamming windowed patches of spectrogram of Spanish phonemes and enhanced by means of bi-cubic interpolation. An adaptive strategy is proposed for the size of patches over the time to construct unique length vectors for different phonemes. These vectors are classified based on K-nearest neighbor (KNN) and linear discriminative analysis (LDA) and reduced rank LDA (RLDA). Experimental results demonstrate improvement in recognition performance for noisy speech signals and stops. © 2012 Institute of Telecommunica.
Original languageEnglish
Title of host publication2012 19th International Conference on Systems, Signals and Image Processing (IWSSIP 2012)
PublisherIEEE
Pages548-551
ISBN (Electronic)978-320002328-4
Publication statusPublished - 2012
Externally publishedYes
Event19th International Conference on Digital Signal Processing (DSP) 2012 - Vienna University of Technology , Vienna, Austria
Duration: 11 Apr 201213 Apr 2012
Conference number: 19

Conference

Conference19th International Conference on Digital Signal Processing (DSP) 2012
Abbreviated titleIWSSIP
CountryAustria
CityVienna
Period11/04/1213/04/12

Fingerprint

Acoustics
Discrete cosine transforms
Speech recognition
Interpolation
Modulation

Cite this

Serrano, J., & Carrabina, J. (2012). Spectro-temporal analysis of speech for spanish phoneme recognition. In 2012 19th International Conference on Systems, Signals and Image Processing (IWSSIP 2012) (pp. 548-551). [6208200] IEEE.

Spectro-temporal analysis of speech for spanish phoneme recognition. / Serrano, Javier; Carrabina, Jordi.

2012 19th International Conference on Systems, Signals and Image Processing (IWSSIP 2012). IEEE, 2012. p. 548-551 6208200.

Research output: Chapter in Book/Report/Conference proceedingConference proceeding

Serrano, J & Carrabina, J 2012, Spectro-temporal analysis of speech for spanish phoneme recognition. in 2012 19th International Conference on Systems, Signals and Image Processing (IWSSIP 2012)., 6208200, IEEE, pp. 548-551, 19th International Conference on Digital Signal Processing (DSP) 2012, Vienna, Austria, 11/04/12.
Serrano J, Carrabina J. Spectro-temporal analysis of speech for spanish phoneme recognition. In 2012 19th International Conference on Systems, Signals and Image Processing (IWSSIP 2012). IEEE. 2012. p. 548-551. 6208200
Serrano, Javier ; Carrabina, Jordi. / Spectro-temporal analysis of speech for spanish phoneme recognition. 2012 19th International Conference on Systems, Signals and Image Processing (IWSSIP 2012). IEEE, 2012. pp. 548-551
@inproceedings{acb4b00d580042fdaf739e7537cfe4a5,
title = "Spectro-temporal analysis of speech for spanish phoneme recognition",
abstract = "State of the art speech recognition systems (ASR), mostly use Mel-Frequency cepstral coefficients (MFCC), as acoustic features. In this paper, we propose a new discriminative analysis of acoustic features, based on spectrogram analysis. Both spectral and temporal variations of speech signal are considered. This has improved the recognition performance especially in case of noisy situation and phonemes with time domain modulations such as stops. In this method, the 2D Discrete Cosine Transform (DCT) is applied on small overlapped 2D Hamming windowed patches of spectrogram of Spanish phonemes and enhanced by means of bi-cubic interpolation. An adaptive strategy is proposed for the size of patches over the time to construct unique length vectors for different phonemes. These vectors are classified based on K-nearest neighbor (KNN) and linear discriminative analysis (LDA) and reduced rank LDA (RLDA). Experimental results demonstrate improvement in recognition performance for noisy speech signals and stops. {\circledC} 2012 Institute of Telecommunica.",
author = "Sara Sharifzadeh and Javier Serrano and Jordi Carrabina",
year = "2012",
language = "English",
pages = "548--551",
booktitle = "2012 19th International Conference on Systems, Signals and Image Processing (IWSSIP 2012)",
publisher = "IEEE",

}

TY - GEN

T1 - Spectro-temporal analysis of speech for spanish phoneme recognition

AU - Sharifzadeh, Sara

AU - Serrano, Javier

AU - Carrabina, Jordi

PY - 2012

Y1 - 2012

N2 - State of the art speech recognition systems (ASR), mostly use Mel-Frequency cepstral coefficients (MFCC), as acoustic features. In this paper, we propose a new discriminative analysis of acoustic features, based on spectrogram analysis. Both spectral and temporal variations of speech signal are considered. This has improved the recognition performance especially in case of noisy situation and phonemes with time domain modulations such as stops. In this method, the 2D Discrete Cosine Transform (DCT) is applied on small overlapped 2D Hamming windowed patches of spectrogram of Spanish phonemes and enhanced by means of bi-cubic interpolation. An adaptive strategy is proposed for the size of patches over the time to construct unique length vectors for different phonemes. These vectors are classified based on K-nearest neighbor (KNN) and linear discriminative analysis (LDA) and reduced rank LDA (RLDA). Experimental results demonstrate improvement in recognition performance for noisy speech signals and stops. © 2012 Institute of Telecommunica.

AB - State of the art speech recognition systems (ASR), mostly use Mel-Frequency cepstral coefficients (MFCC), as acoustic features. In this paper, we propose a new discriminative analysis of acoustic features, based on spectrogram analysis. Both spectral and temporal variations of speech signal are considered. This has improved the recognition performance especially in case of noisy situation and phonemes with time domain modulations such as stops. In this method, the 2D Discrete Cosine Transform (DCT) is applied on small overlapped 2D Hamming windowed patches of spectrogram of Spanish phonemes and enhanced by means of bi-cubic interpolation. An adaptive strategy is proposed for the size of patches over the time to construct unique length vectors for different phonemes. These vectors are classified based on K-nearest neighbor (KNN) and linear discriminative analysis (LDA) and reduced rank LDA (RLDA). Experimental results demonstrate improvement in recognition performance for noisy speech signals and stops. © 2012 Institute of Telecommunica.

M3 - Conference proceeding

SP - 548

EP - 551

BT - 2012 19th International Conference on Systems, Signals and Image Processing (IWSSIP 2012)

PB - IEEE

ER -