Contribution of Temporal and Spectral Cues to Recognition of Persian
Noise-Vocoded Speech
Masoud Motasaddi-Zarandy1, Farnoush Jarollahi2, Yahya Modarresi3, Shohreh jalaie4, Amir Homayun Jafari5, Zahra Shirezhiyan6, Mahnaz Ahmadi7, Amir Salar Jafarpisheh8
1 M.D. in ENT, Department of Otolaryngology, ENT Research Center, Amiralam Hospital, Tehran, Iran. motasaddi@yahoo.com
2 Ph.D. in Audiology, Faculty member of School of Rehabilitation Sciences, Iran University of Medical Sciences (IUMS), Tehran, Iran. jarollahi.f@iums.ac.ir
3 Ph.D. in Linguistics, the Institute for Humanities and Cultural Studies Professor, Tehran, Iran. ymodarresi@yahoo.com
4 Ph.D in Biostatistics, Rehabilitation School, Tehran University of Medical Sciences, Tehran, Iran. jalaeish@tums.ac.ir
5 Ph.D in Biomedical Engineering, Department of Biomedical Engineering and Biophysics, Research Center for Biomedical Technologies & Robotics (RCBTR), Tehran University of Medical Sciences, Tehran, Iran. amir_j73@yahoo.com
6 PhD student in Biomedical Engineering, Research Center for Biomedical Technologies & Robotics (RCBTR), Tehran University of Medical Sciences, Tehran, Iran.
z.shirjiyan@gmail.com
7 Ph.D in Audiology, University of Southern California, US. mahnazahmadi@gmail.com
8 Ph.D candidate of Biomedical Engineering, Research Center for BiomedicalTechnologies & Robotics (RCBTR), Tehran University of Medical Sciences, Tehran, Iran.
jafarpisheh@gmail.com
Introduction: Spectral cues are necessary but not essential for the accurate speech recognition. In a vocoded speech signal, one or more carrier bands replace a speech signal, which causes the original signal to keep its amplitude envelope but lose its spectral data. Such a signal provides valuable information about the important role of envelope in the recognition of speech signal. Since each language has got its own particular acoustic, phonological and lexical characteristics, research into vocoded signals in Persian language provides distinctive information as to the auditory transmission of Persian speech stimuli in normal-hearing and deaf subjects.
Materials and methods: Nine normal (5 female and 4 male) subjects speaking standard Persian, with a mean age of 29 ± 5.75, were presented 28 Persian phonemes in nonwords, vocoded in 80 different conditions using the MATLAB software. Each sound was presented randomly, through the earphone/headphone, and was repeated six times. With putting check marks on a computer screen by the participants, their responses were recorded, and the recognition rates were described and analyzed in percent.
Results: It was possible, though difficult, to recognize Persian vocoded phonemes in the native listeners (with no training and familiarity with such signals). The recognition of Persian phoneme with low-pass filter cutoff frequency of upto 16 Hz did not change. The frequency division type of bands had the greatest impact on the Persian phonemes’ recognition (rate), and the chance of recognition in logarithmic division was 3.3 times more than the linear division. With raising the bandwidth, 51% less than the lower band, the recognition increased. Front vowels as well as fricative and obstructive consonants were easy to recognize with minimal spectral data available. Recognition rate of vowels was 1.26 times as many as that of the consonants.
Conclusion: The frequency division types and the number of bands were the most influential factors affecting the recognition of Persian phonemes. Front vowels as well as fricative and obstructive consonants were able to be recognized with minimal spectral data