Exploiting Resources from Closely-related Languages for Automatic Speech Recognition in Low-resource Languages from Malaysia

Exploiting Resources from Closely-related Languages for Automatic Speech Recognition in Low-resource Languages from Malaysia
Author :
Publisher :
Total Pages : 0
Release :
ISBN-10 : OCLC:949273828
ISBN-13 :
Rating : 4/5 ( Downloads)

Book Synopsis Exploiting Resources from Closely-related Languages for Automatic Speech Recognition in Low-resource Languages from Malaysia by : Sarah Flora Samson Juan

Download or read book Exploiting Resources from Closely-related Languages for Automatic Speech Recognition in Low-resource Languages from Malaysia written by Sarah Flora Samson Juan and published by . This book was released on 2015 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Languages in Malaysia are dying in an alarming rate. As of today, 15 languages are in danger while two languages are extinct. One of the methods to save languages is by documenting languages, but it is a tedious task when performed manually.Automatic Speech Recognition (ASR) system could be a tool to help speed up the process of documenting speeches from the native speakers. However, building ASR systems for a target language requires a large amount of training data as current state-of-the-art techniques are based on empirical approach. Hence, there are many challenges in building ASR for languages that have limited data available.The main aim of this thesis is to investigate the effects of using data from closely-related languages to build ASR for low-resource languages in Malaysia. Past studies have shown that cross-lingual and multilingual methods could improve performance of low-resource ASR. In this thesis, we try to answer several questions concerning these approaches: How do we know which language is beneficial for our low-resource language? How does the relationship between source and target languages influence speech recognition performance? Is pooling language data an optimal approach for multilingual strategy?Our case study is Iban, an under-resourced language spoken in Borneo island. We study the effects of using data from Malay, a local dominant language which is close to Iban, for developing Iban ASR under different resource constraints. We have proposed several approaches to adapt Malay data to obtain pronunciation and acoustic models for Iban speech.Building a pronunciation dictionary from scratch is time consuming, as one needs to properly define the sound units of each word in a vocabulary. We developed a semi-supervised approach to quickly build a pronunciation dictionary for Iban. It was based on bootstrapping techniques for improving Malay data to match Iban pronunciations.To increase the performance of low-resource acoustic models we explored two acoustic modelling techniques, the Subspace Gaussian Mixture Models (SGMM) and Deep Neural Networks (DNN). We performed cross-lingual strategies using both frameworks for adapting out-of-language data to Iban speech. Results show that using Malay data is beneficial for increasing the performance of Iban ASR. We also tested SGMM and DNN to improve low-resource non-native ASR. We proposed a fine merging strategy for obtaining an optimal multi-accent SGMM. In addition, we developed an accent-specific DNN using native speech data. After applying both methods, we obtained significant improvements in ASR accuracy. From our study, we observe that using SGMM and DNN for cross-lingual strategy is effective when training data is very limited.


Exploiting Resources from Closely-related Languages for Automatic Speech Recognition in Low-resource Languages from Malaysia Related Books

Exploiting Resources from Closely-related Languages for Automatic Speech Recognition in Low-resource Languages from Malaysia
Language: en
Pages: 0
Authors: Sarah Flora Samson Juan
Categories:
Type: BOOK - Published: 2015 - Publisher:

DOWNLOAD EBOOK

Languages in Malaysia are dying in an alarming rate. As of today, 15 languages are in danger while two languages are extinct. One of the methods to save languag
Advances in Electronics Engineering
Language: en
Pages: 332
Authors: Zahriladha Zakaria
Categories: Technology & Engineering
Type: BOOK - Published: 2019-12-16 - Publisher: Springer Nature

DOWNLOAD EBOOK

This book presents the proceedings of ICCEE 2019, held in Kuala Lumpur, Malaysia, on 29th–30th April 2019. It includes the latest advances in electrical engin
Advances in Computational Intelligence Techniques
Language: en
Pages: 271
Authors: Shruti Jain
Categories: Technology & Engineering
Type: BOOK - Published: 2020-02-20 - Publisher: Springer Nature

DOWNLOAD EBOOK

This book highlights recent advances in computational intelligence for signal processing, computing, imaging, artificial intelligence, and their applications. I
Automatic Speech Recognition for Low-resource Languages and Accents Using Multilingual and Crosslingual Information
Language: en
Pages: 203
A Study on Reusing Resources of Speech Synthesis for Closely-related Languages
Language: en
Pages: 0
Authors: Nur Hana Samsudin
Categories:
Type: BOOK - Published: 2018 - Publisher:

DOWNLOAD EBOOK

This thesis describes research on building a text-to-speech (TTS) framework that can accommodate the lack of linguistic information of under-resource languages