Published on April 22, 2008
Dynamic Match Lattice Spotting: Dynamic Match Lattice Spotting Spoken Term Detection Evaluation Queensland University of Technology Roy Wallace, Robbie Vogt, Kishan Thambiratnam, Prof Sridha Sridharan Presented by Roy Wallace Overview: Overview Phonetic-based index open-vocabulary Based on lattice-spotting technique Two-tier database Dynamic-match rules Algorithmic optimisations NOTE: Patented technology Concept: Concept greasy ? Phone decomposition Concept: Concept Target sequence: Observed sequences: Costs Dynamic matching ax ih Indexing: Indexing Feature Extraction Segmentation Speech Recognition Sequence Generation Lattices Sequence DB Hyper- Sequence Generation Hyper- Sequence DB Audio Hyper-sequence Mapping: Hyper-sequence Mapping Map individual phones to “parent” classes We use Vowels, Fricatives, Glides, Stops and Nasals Simple example Parent classes: Vowels, Consonants Map each phone to parent class to create hyper-sequence Sequence DB Hyper- Sequence DB Hyper-sequence Mapping: Hyper-sequence Mapping Hyper-sequence DB Search term: Hyper-sequence: Sequence DB Searching: Searching Term Sequence DB Hyper- Sequence DB Results Dynamic Matching Keyword Verification Hyper- mapping Phone decomp. Split long terms Merge long terms Dynamic Matching: Dynamic Matching Minimum Edit Distance (MED) i.e. Levenshtein Distance Insertions, deletions, substitutions Finds minimum cost of transformation Dynamic Matching: Dynamic Matching Substitution costs Derived from phone confusion statistics Optimisations: Optimisations Prefix sequence optimisation Early stopping optimisation Linearised MED search approximation Long Term Merging: Long Term Merging olympic sites Search Search Merge Results Keyword Verification: Keyword Verification Acoustic Use acoustic score from lattice to boost occurrences with high confidence Neural Network Produce a confidence score by fusing MED score and Acoustic score Term phone length Term phone classes Results: Results Maximum Term-Weighted Value on EvalSet terms Conclusion: Conclusion Open-vocabulary and phone-based Patented technology utilises sequence and hyper-sequence databases optimisations for rapid searches Advantages Other languages Economy of scale Conclusion: Conclusion Limitations Indexing speed and size Need to split long sequences Future work Keyword Verification Word-level information (e.g. LVCSR) Acoustic features (e.g. prosody) Indexing/searching frameworks Spoken Document Retrieval and other semantic applications References: References A. J. K. Thambiratnam, “Acoustic keyword spotting in speech with applications to data mining”, Ph.D. dissertation, Queensland University of Technology, Qld, March 2005 K. Thambiratnam and S. Sridharan, “Rapid Yet Accurate Speech Indexing Using Dynamic Match Lattice Spotting”, IEEE Transactions on Audio, Speech and Language Processing : Accepted for future publication CMU Speech group (1998). The Carnegie Mellon Pronouncing Dictionary. [Online]. Available: http://www.speech.cs.cmu.edu/cgi-bin/cmudict S. J. Young, P.C. Woodland, W.J. Byrne (2002). “HTK: Hidden Markov Model Toolkit V3.2”, Cambridge University Engineering Department, Speech Group and Entropic Research Laboratories Inc. V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals”, Soviet Physics Doklady, 10(8), 1966, pp. 707-710.
Spoken term detection ALBAYZIN 2014 evaluation: overview, systems, results, ... QUT [12, 13], JHU [14 – ... NIST, Evaluation Toolkit (STDEval) ...