. .
.
Different Sounds In Language
.
.

 

Speech generated from the speech production system consists of a sequence of basic sound units of a particular language. The need for studying the basic alphabet set (orthographic representation) of any language is to be able to express message in written form. On the similar lines we need to study the basic sound units set ( acoustic representation) of any language for producing message in oral form. Every language is provided with unique set of alphabet set and sound units set. In most of the Indian languages we have about 40-50 distinct alphabets set and also of nearly same number of sound units set.

 

The next question is, will this small set of 40-50 distinct sound units is sufficient to convey all the messages. Yes, it is indeed. Just like we form words and sentences in written form with permutations and combination of alphabet units, we can form words, phrases and sentences in oral form with the permutation and combination of different sound units. However it needs to be emphasized that all permutation and combination may not convey a valid messages.

 

The process of learning to speak in a particular languages involves getting to know about the valid combination that have some meaning and then using them in proper order, resulting in what is called speech signal. Whether the remaining is formal and informal, over time we get hold into the basic sound units of a particular language and some words that enable us to produce speech in a particular language, with respect to time the vocabulary of words in that languages increases. Thus one of the important steps in speech processing is to get a feel about the different sounds used for speech production.

 

From signal processing point of view we need to get a feel of the time domain, frequency domain and time-frequency representations of these sounds. Perceptually we have been exposed to different sounds of mother tongue day in and day out and we agree that we can discriminate them based on their perceptual difference. Now it is time to look at them from signal processing point of view. As will be evident later it is indeed observable that the different sounds have different time, frequency and time-frequency representations and hence make them perceptually distinct.

 

Classification of Sound units in Indian Languages:

 

 

Figure 1 shows the classification of sound units in Indian languages. Description of each of the categories is as follows:

 

Vowels and Consonants:

 

The sound units of most languages in India are broadly classified into two categories, namely, vowels and consonants. These two broad categories are mainly based on the shape of the vocal tract. In case of vowels, the vocal tract shape is wide open without any constriction along its length starting from the glottis till the lips and is excited by voiced excitation source. Alternatively, in case of consonants, their may be constriction in vocal tract shape some where along its length and is excited by either voiced, unvoiced and both types of excitation.

 

Figure 2 shows example of a vowel sound unit (|a|) and consonant sound unit (|s|). As can be observed the time domain representation i.e., waveform of the two signals are different. The spectra of the two cases are different. Finally the time-varying spectral representation is also different for different cases. Even though these two examples are two extreme cases of vowel and consonant, they help us in appreciating the differences between vowels and consonants.

 

 Figure 2: Comparison of vowels and consonants

 

Short vowels, Long vowels and Diphthongs

 

In most of the Indian languages, the vowel sound units are further classified into three categories as short vowels, long vowels and diphthongs. From the production process point of view there is no distinction between short and long vowels, except that the duration of production will be longer, typically nearly double that of short vowels. For instance short vowel /a/ may be almost half the duration as that of the long vowel /A/. In case of diphthongs, as the name indicates, two vowel sounds are produced in succession without any pause. The production process is such that the vocal tract shape is initially producing the first vowel and midway during the production of the first vowel it changes the shape to produce the other vowel .

 

Figure 3 shows the plots of a short vowel |a|, its long vowel |A|, short vowel |i| and diphthong |ai|. The duration of long vowel |A| is longer than that of the short vowel |a|. However, the time domain waveform and spectra are same for both the cases. Even the time varying spectral information is also same in both the cases. In case of diphthongs |ai|, the initial portion materials with |a| and the later portion material with |i|. One transition of vocal tract shape from |a| to |i| can be clearly seen in the time varying spectral plot.

 

 

Figure 3: Waveform, short-term spectra and time varying spectra of short vowels, long vowels and diphthongs

 

Stop consonants

 

Stop consonants form the major category of consonants in Indian languages. During the production of these consonants the vocal tract is completely closed at some point, somewhere along the length of the vocal tract and suddenly released. Hence the name stop consonants. The stop consonants are further classified into different cases based on two criteria, namely, place of articulation (POA) and manner of articulation(MOA). The POA gives the portion along the length of the vocal tract where it is completely closed. MOA gives the manner used for exciting the vocal tract synthesis, namely, voicing and aspiration. Table 1 gives the classification of different stop consonants in majority of the Indian languages.

 

 

 Table 1: classification of stop consonants in majority of the Indian languages

 

In majority of the Indian languages we use voicing and aspiration as manner of articulation. Accordingly we have four possibilities, unvoiced unaspirated(UVUA), unvoiced aspirated(UVA), voiced unaspirated (VUA) and voiced aspirated(VA). Hence we can group the stop consonants based on most into four categories as given in Table 1. The POA has typically five places of articulation in majority of the Indian languages. They are velar, palatal, alveolar, dental and bilabial. Among these the palatal POA stop consonants are categorized separately as affricates that will be described later. Thus we have four POA for the sounds of stop consonants. The stop consonants can also be grouped into four categories based on the POA. Accordingly in total we have about 4X4=16 stop consonants. Each of these stop consonants can be uniquely described in terms of MOA ans POA categories. For instances, the stop consonants |k| is characterized by UVUA MOA & velar POA. Further, the subset of stop consonants under the same category of MOA & POA will have same excitation characteristics and places, respectively. For instances, all stop consonants under UVUA MOA will have same MOA i.e. unvoicing & unaspiration.

 

MOA

Vibration of vocal folds is a major excitation source during speech production. However, sound units, specifically, consonants may be produced using other types of unvoiced excitation like burst and frication. Accordingly the excitation can be either voiced and unvoiced. Further, aspiration is an important MOA in majority of Indian languages. The sound units produced using these MOA have different meanings.

 

POA

 

For the production of stop consonants we will obstruct the vocal tract at different places along its length. These places are termed as POA. Stop consonants are therefore classified based on the POA also. As the different POA names given in Table 1 indicates, the POA will be in velar, alveolar, dental & bilabial regions.

 

Velar Stop Consonants

 

In this category the total constriction for the production of stop consonants occur at the velar region. In case of UVUA velar stop consonants |k|, there is no voicing and also aspiration. The only excitation is a burst of small duration, typically of about 5-20 msec. The two events that are present in the UVUA velar stop consonants is a closure region during which there is no speech activity and then the burst region where there will be sudden release of the closure. The time domain waveform spectrum and time-varying spectra are given in Figure 4(a). The closure region is a silence region, difficult to distinguish from the non-speech region followed by a burst region.

 

 

Figure 4(a):Temporal, spectral and time varying spectra of UVUA velar stop consonant /k/

 

Compare to UVUA velar stop consonants, UVA stop consonants will have three events, namely, closure & burst as in the case of UVUA & also aspiration event in addition. The closure & burst events will have similar class as in the UVUA case. The aspiration is due to the frication at the glottis region. The aspiration region can be identified as the noise like region after the burst region. Figure 4(b) shows the time domain, frequency domain & time-varying spectra for the velar UVA stop consonants.

 

 

Figure 4(b): Temporal, spectral and time varying spectra of UVA velar stop consonant /kh/

 

The VUA velar stop consonants will be same as UVUA, except that the unvoicing is replaced by a voiced excitation. Due to this there will be a low amplitude voicing bar in both the closure and burst regions. Figure 4(c) shows the time domain, frequency domain and time-varying spectra of a velar UVA stop consonants. The only difference with reference to UVUA is the presence of low amplitude nearly periodic signal in the time domain. The low level voicing bar will be manifested as the low frequency spectral energy in the frequency domain. The time-varying spectra shows two different spectral information, one representing the closure region & the other representing the burst region.

 

 

Figure 4(c): Temporal, spectral and time varying spectra of VUA velar stop consonant /g/

 

The VA velar stop consonants is same as UVA velar stop consonants, except that unvoicing is replaced by a voicing process. Due to this there will be a voicing bar in the closure, burst & aspiration regions. Figure 4(d) shows the time domain, frequency domain and time varying spectra of a velar VA stop consonant. The only difference in all the three plots is the additional information due to the voicing process.

 

 

Figure 4(d): Temporal, spectral and time varying spectra of VA velar stop consonant /gh/

 

 

Alveolar Stop Consonants

 

The total constriction of the vocal tract occurs at the alveolar ridge in this case. This category will also have four different stop consonants based on the MOA. The alveolar UVUA stop consonants production will be same as that of velar UVUA stop consonants except that the place of constriction is at the alveolar ridge. The change in the place of constriction will have effect mainly on the manifestation of burst region. The energy associated with the burst region & hence the prominence depends on the POA. The POA will in turn give knowledge how much burst oral cavity is present after release of is this cavity, then will be the energy associates with the burst & its prominence. Accordingly the burst region in case of alveolar stop consonants is relatively less prominent compared to the burst in case of velar stop consonants. In the time domain the burst region will have less energy and accordingly the spectral energy. Except for the change in the POA, there is no change in other aspects for the alveolar stop consonants. Accordingly alveolar UVA, VUA & VA stop consonants will have similar characteristics as that of their velar counterparts.

 

Figure 4(e) to 4(h) shows the time domain, frequency domain & time varying spectra of different alveolar stop consonants. At the grass level their characteristics remain same as that of respective velar stop consonants. The less prominence of burst regions can be observed by comparing respective consonants with that of velar case.

 

 

Figure 4(e):Temporal, spectral and time varying spectra of UVUA alveolar stop consonant /T/

 

 

Figure 4(f): Temporal, spectral and time varying spectra of UVA alveolar stop consonant /Th/

 

 

Figure 4(g): Temporal, spectral and time varying spectra of VUA alveolar stop consonant /D/

 

 

Figure 4(h): Temporal, spectral and time varying spectra of VA alveolar stop consonant /Dh/

 

 

Dental Stop Consonants

 

The POA will be dental region. The frontal cavity after the constriction will be further less and negligible compared to alveolar case. Hence the manifestation of burst region is very feeble in case of dental stop consonants. Apart from this there is no difference in the production compared to alveolar and velar stop consonants. Figure 4(i) to 4(l) shows different dental stop consonants in the time, frequency and time-frequency domains. The energy associated with burst region is still less compared to the alveolar case. This is reflected in the frequency domain also. The concentration of energy in the time varying spectra is in different region compared to alveolar & velar cases.

 

 

Figure 4(i): Temporal, spectral and time varying spectra of UVUA dental stop consonant /t/

 

 

Figure 4(j): Temporal, spectral and time varying spectra of UVA dental stop consonant /th/

 

 

Figure 4(k): Temporal, spectral and time varying spectra of VUA dental stop consonant /d/

 

 

Figure 4(l): Temporal, spectral and time varying spectra of VA dental stop consonant /dh/

 

Bilabial stop consonants

 

The POA is the two lips region & hence the name in the case of bilabial stop consonants. Since there is no frontal cavity after the POA, the burst region is not manifestated in case of bilabial stop consonants. However, the total closure & release of leads to the percception of bilabial stop consonants. Figure 4(m) to 4(p) shows the four bilabial stop consonants. As it can be observed there is no manifestation of burst region in time, frequency and time varying spectral plots.

 

 

Figure 4(m): Temporal, spectral and time varying spectra of UVUA bilabial stop consonant /p/

 

 

Figure 4(n): Temporal, spectral and time varying spectra of UVA bilabial stop consonant /ph/

 

 

Figure 4(o): Temporal, spectral and time varying spectra of VUA bilabial stop consonant /b/

 

 

Figure 4(p): Temporal, spectral and time varying spectra of VA bilabial stop consonant /bh/

 

Nasals

 

Nasal sounds are similar to vowels having lower formant energy compared to vowels. Nasal sounds are produced with the help of air flow in nasal cavity. The examples of nasal sound units found in Indian languages are /ng/, /nj/, /n/, /N/ and /m/ as in going, naan and ravaN respectively.

 

 

Figure 4(q): Temporal, spectral and time varying spectra of nasal /ng/

 

 

Figure 4(r): Temporal, spectral and time varying spectra of nasal /nj/

 

 

Figure 4(s): Temporal, spectral and time varying spectra of nasal /n/

 

 

Figure 4(t): Temporal, spectral and time varying spectra of nasal /N/

 

 

Figure 4(u): Temporal, spectral and time varying spectra of nasal /m/

 

 

Semivowels

 

The set of semivowels in Indian language include /y/, /r/, /l/ and /v/. The semivowels are weakly periodic as compared to the vowels and having lower energy as compared to vowels. Among the semivowels, /y/ and /r/ are aspirated and /l/ and /v/ are unaspirated.

 

 

Figure 4(v): Temporal, spectral and time varying spectra of semivowel /y/

 

 

Figure 4(w): Temporal, spectral and time varying spectra of semivowel /r/

 

 

Figure 4(x): Temporal, spectral and time varying spectra of semivowel /l/

 

 

Figure 4(y): Temporal, spectral and time varying spectra of semivowel /v/

 

 

Fricatives

 

The fricatives are the consonants produced by a narrow constriction somewhere along the length of the vocal tract. The basic difference between fricatives and stop consonants is that the closure will be partial & narrow in case of fricatives & is complete in case of stop consonants. Depending on the place of narrow constrictions, we have different fricatives. In case of most Indian languages we have |s|, |sh|, |shh| & |h| as the fricatives. |s| is a dental fricative, |sh| is an alveolar fricative, |shh| is also an alveolar fricative but with more stress and |h| is a velar fricative.

 

Figure 5(a) to 5(d) shows the four fricatives present in most of the Indian languages. The time domain nature remains same for all the cases, but differ mainly in the amplitude levels and duration. The concentration of energy will be in different regions of the spectrum & hence their difference in the perceptual quality. The time varying spectra do not show that variation in the concentration of spectral energy.

 

 

Figure 5(a): Waveform, short term spectra and time varying spectra of fricative /s/

 

 

Figure 5(b): Waveform, short term spectra and time varying spectra of fricative /sh/

 

 

Figure 5(c): Waveform, short term spectra and time varying spectra of fricative /shh/

 

 

Figure 5(d): Waveform, short term spectra and time varying spectra of fricative /h/

 

Affricates

 

The affricates are the consonants where the production involves combination of stop and fricative consonant production. Initially, the vocal tract will be completely closed somewhere all the length to create a total constrictions. After this, the constriction will be partially released to create a fricative excitations. Most of the Indian languages have |ch|, |chh|, |j| & |jh| as the affricate consonants. All these affricates are produced at the palatal region. The difference across different affricates is due to different MOA. Figure 5(e) to 5(h) shows the four affricates present in most of the Indian languages. The time domain nature remains same for all the cases, but differ mainly in the amplitude levels & duration. The concentration of energy will be in different regions of the spectrum & hence their difference. The time varying spectra do not show that variation in the concentration of spectral energy.

 

 

Figure 5(e): Waveform, short term spectra and time varying spectra of affricate /ch/

 

 

Figure 5(f): Waveform, short term spectra and time varying spectra of fricative /chh/

 

 

Figure 5(g): Waveform, short term spectra and time varying spectra of fricative /j/

 

 

Figure 5(h): Waveform, short term spectra and time varying spectra of fricative /jh/

 

 

 

Cite this Simulator:

.....
..... .....

Copyright @ 2024 Under the NME ICT initiative of MHRD

 Powered by AmritaVirtual Lab Collaborative Platform [ Ver 00.13. ]