. .
.
Short Term Time Domain Processing of Speech
.
.

 

Short Term Energy Computation

 

 

The speech signal and its sampling frequency along with the frame size and frame shift are the inputs needed for computing the short term energy. Using the sampling frequency value, the number of samples for the given frame size and frame shift are computed. For instance, if the sampling frequency is 8 KHz and frame size and frame shift are 20 msec and 10 msec, respectively then the number of samples in a frame will be 160 and number of samples for frame shift will be 80 samples.

 

To compute short term energy, the input speech signal is considered in frames of 160 samples with a shift of 80 samples and the energy is computed for each frame. The short term energy values are then plotted as a function of time index.

 

The scilab code for computing the short term energy is given in Figure_1. The short term energy computed using this code is given in Figure_2. The STE contour follows the general shape of signal amplitude distribution of speech signal. The STE associated with unvoiced regions is relatively smaller compared to voiced regions. Thus STE can be therefore used for voiced/unvoiced class of speech.

 

 

 

 

Figure_1: SCI LAB Code for Computing Short Term Energy of Speech

 

 

 

 

Figure_2:Short Term energy contour of speech signal

 

 The next issue to be understood is the effect of frame size. The scilab code for computing STE for different window length, namely 30,50 and 100 msec samples are shown in Figure_3. The corresponding STE contours are given in Figure_4.

 

As the frame size increases, the smoothness of the energy contour  also increases. As a result the distinction between voiced and unvoiced regions decreases. Hence in case of speech processing, frame size for STE is about 10-30 msec is preferred.

 

 


 

 

 

Figure_3:SCI LAB code for the  computation of STE using different length, namely 30,50 and 100 msec

 

 

 

  

Figure_4: Short Term Energy contour for different window length namely 30,50 and 100 msec

 

The last issue to be understood in case of STE computation is the effect of window shape. In case of rectangular window all samples will contribute equally to the STE values. Alternatively, in hamming and hanning window cases, the center of the window samples contribute more. Accordingly, the changes in the nature of STE contour. Figure_5 and Figure_6 show the scilab code and the corresponding STE computed for different window functions. The Hamming and Hanning window function provide relatively less energy on the STE contour.

  

  

 

Figure_5: SCI LAB code for the computation of STE using different window functions namely rectangular,hamming and hanning windows.

 

 

 

  

Figure_6: Short term energy contour using rectangular,hamming,hanning window machines.

 

 

Short Term Zero Crossing Rate (ZCR)

 

 

The input speech signal can be viewed in blocks of 10-30 msec  for computing ZCR. For each block of the speech signal, the ZCR is computed using the short term ZCR relation. A scilab code for computing short term is shown in Figure_7. The corresponding output of  this program is given in Figure_8. The ZCR value is highest in unvoiced region and lowest in voiced region . In case of silence region the value lies in between of voiced and unvoiced cases.

 

 

 

Figure_7: SCI LAB code for the computation of short term zero crossing rate of a speech signal

  

 

 

Figure_8: Short Term zero cross rate for the speech signal.

 

The effect of window size can also be studied in ZCR computation. Figure_9 gives the scilab code for computing ZCR for different frame sizes. Figure_10 shows its ZCR plots for different frame sizes. As the frame size increases, the smoothening and spreading of ZCR plot takes place. As a result we may miss small transient region having high ZCR values.

  

 

Figure_9:SCI LAB code for the computation of ST ZCR using different frame sizes 

 

      

 

Figure_10:Short term Zero Cross Rate using different frame sizes namely 30,50,100 msecs

 

The ZCR value is independent of the window shape. Figure_11 gives the scilab code for computing ZCR using different window function and computed output are given in Figure_12. 

 

 

 

Figure_11:SCI LAB code for the computation of Short Term  Zero Cross Rate using different window functions

 

 

 

 

Figure_12:Short term Zero Cross Rate computed using different window functons

 

 Computation Of Short Term Autocorrelation

 

 

The scilab code for computing the short term autocorrelation is given in Figure_13. This program takes a given frame of speech and computes its short term autocorrelation. Figure_14 shows  the term autocorrelation computed for a voiced segment of speech. Figure_15 shows the short term autocorrelation computed for an unvoiced segment of speech. Both the plots are computed using the scilab code given in Figure_13.

 

 

 

 

Figure_13:SCI LAB code for the computation of Short term autocorrelation

 

 


 

 

Figure_14: Short term autocorrelation computed for a voiced segment of speech

 

 

 

 

Figure_15: Short term autocorrelation computed for a unvoiced segment of speech

 

The short term autocorrelation for the entire speech signal can be computed considering speech in  frames. Figure_16 gives scilab code for computing the short term autocorrelation sequences for the entire speech signal. The corresponding autocorrelation sequence plot is given in figure_17. As it can be observed the short term autocorrelation for the entire speech signal is a 3-D plot.

  

  

Figure_16:SCI LAB code for the computation of short term autocorrelation sequence for an entire speech signal

  

 

 

Figure_17:Short term  Autocorrelation sequence for the entire speech signal

 

In the 3-D plot , picking the largest peak and plotting the same will give information about pitch during voiced speech regions. The scilab code for estimating the pitch information is given in Figure_18. The corresponding output plot is given in Figure_19.

 

 

 

Figure_18:SCI LAB code  for  Estimation of pitch information of a speech signal

 

 

 

 

Figure_19: Pitch Contour estimated from the autocorrelation segments

 

Cite this Simulator:

.....
..... .....

Copyright @ 2024 Under the NME ICT initiative of MHRD

 Powered by AmritaVirtual Lab Collaborative Platform [ Ver 00.13. ]