Published on February 25, 2014
Acoustic Modeling using DBN For Bangla ASR Presented by Mahtab Ahmed Roll- 0907006 Kaidul islam Roll-0907016 Supervised by Md. Faijul Amin
Introduction • Speech recognition and understanding of spontaneous speech have been a goal of research since 1970. • Several researches have been undergone for English ASR, perhaps not that much for Bangla yet.
Acoustic Model • An acoustic model contains statistical representations of each of the distinct sounds that makes up a word.
Acoustic Model • Pronunciation dictionary is “HOUSE” can be– – – – HOUSAND HOUSDEN HOUSE HOUSE'S [HOUSAND] [HOUSDEN] [HOUSE] [HOUSE'S] hh aw s ax n d hh aw s d ax n hh aw s hh aw s ix z • Each Phoneme is associated with a HMM.
Why DBN? • GMMs has been used along with EM Algorithm in the context of speech recognition. • It can model at any accuracy except it can not model data that lie on the non linear state space. • DBN that have many hidden layers and trained by backpropagation can outperform GMM.
RBM • Stochastic neural nets. • Only one layer of hidden units • Bipartite graph
RBM • It is An energy based model. • Can be learned by gradient descent on the log likelihood of the training data.
Training RBM • If the data are binary data then the updating of units is as follows. • Speech data are Gaussian data. Then updating becomes
Stack Of RBM
Stack Of RBM • First a GRBM is trained to model a window of frames. • Then the binary units of GRBM is used as data for training the next RBM and the procedure is repeated as many time it required. • So after learning a DBN by training a stack of RBMs Reverse weight is used to form it as Feed forward DNN
Contrastive Divergence Algorithm • Very surprising short-cut • Start with a training vector on the visible units • Updates all hidden units in parallel. • Update visible units to get a reconstruction
Speech as Multinomial data
Generative Pretraining • The idea is to learn one layer of feature detectors at a time with the states of the feature detectors in one layer acting as the data for the next layer. • Multiple layers of feature detectors can be learned by undirected model. • It uses a joint distribution
Simulation by program Training Data & Reconstruction
Simulation by program Training Data (blue) & Reconstruction (red) Superimposed
Problem with Speech Recognition • difficult to identify phonemes perfectly in noisy speech. • ambiguous. WRECK A NICE BEACH Can be pronounced as RECOGNIZE SPEECH • Which words are likely to come next have to be known.
Overall Timeline and progress so far
... speech recognition ... Modeling of Bangla Words using Deep Belief Network ... proposed method to develop acoustic modeling for Bangla SR in ...
Speech Recognition Using Deep Learning ... acoustic modeling approach for speech recognition in the ... deep belief networks (DBNs) for speech recognition.
mance of automatic speech recognition systems can be ... Deep Belief Network Acoustic Modeling ... ular type of deep neural networks, for acoustic modeling. A
chine learning problems and this paper applies DBNs to acoustic modeling ... art Automatic Speech Recognition ... using Deep Belief Networks ...
UNDERSTANDING HOW DEEP BELIEF NETWORKS PERFORM ACOUSTIC MODELLING ... Deep Belief Networks ... Although automatic speech recognition ...
Acoustic Modeling Using Deep Belief Networks. ... training of deep belief networks for speech recognition. ... context for automatic speech recognition.
... Deep Learning for Automatic Speech Recognition ... of deep belief networks for speech recognition ... of deep neural network acoustic models using
Deep Recurrent Neural Networks for Acoustic Modelling ... model for acoustic modelling in Automatic Speech Recognition ... a Deep Belief Network ...
Speech Recognition and Deep ... to recognize your speech. Using neural networks for speech ... 2 Acoustic Modeling using Deep Belief Networks, ...
Large scale deep neural network acoustic modeling with ... Modeling for Automatic Speech Recognition ... Speech Processing and Speech ...