Speech Recognition Technology

50 %
50 %
Information about Speech Recognition Technology

Published on March 5, 2014

Author: asertseminar



Speech recognition is the process of converting an acoustic signal, captured by a microphone or a telephone, to a set of words.

Visit to Download

Introduction • Speech recognition is the process of converting an acoustic signal, captured by a microphone or a telephone, to a set of words. • The recognized words can be an end in themselves, as for applications such as commands & control, data entry, and document preparation. • They can also serve as the input to further linguistic processing in order to achieve speech understanding. • It is also known as Automatic Speech Recognition (ASR) ,computer speech recognition, speech to text (STT).

History • Around since the 1960s, ASR has seen steady, incremental improvement over the years. • It has benefited greatly from increased processing speed of computers in the last decade, entering the marketplace in the mid-2000s. • Early systems were acoustic phonetics-based and worked with small vocabularies to identify isolated words. • Over the years, vocabularies have grown while ASR systems have become statistics-based • They now have large vocabularies and can recognize continuous speech.

Basic Structure

Digital Sampling • When you speak, you create vibrations in the air. The analog-to-digital converter (ADC) translates this analog wave into digital data that the computer can understand. • To do this, it samples, or digitizes, the sound by taking precise measurements of the wave at frequent intervals. • The system filters the digitized sound to remove unwanted noise, and sometimes to separate it into different bands of frequency.

Acoustic model • Next the signal is divided into small segments as short as a few hundredths of a second, or even thousandths in the case of plosive consonant sounds -- consonant stops produced by obstructing airflow in the vocal tract -- like "p" or "t." • The program then matches these segments to known phonemes in the appropriate language. • A phoneme is the smallest element of a language -- a representation of the sounds we make and put together to form meaningful expressions.

Language model • The program examines phonemes in the context of the other phonemes around them. • It runs the contextual phoneme plot through a complex statistical model and compares them to a large library of known words, phrases and sentences. • The program then determines what the user was probably saying and either outputs it as text or issues a computer command.

Statistical Modeling Systems • These systems use probability and mathematical functions to determine the most likely outcome. • The two models that dominate the field today are the Hidden Markov Model and Neural Networks. • These methods involve complex mathematical functions, but essentially, they take the information known to the system to figure out the information hidden from it.

Hidden Markov Model (HMM) • In this model, each phoneme is like a link in a chain, and the completed chain is a word. • The chain branches off in different directions as the program attempts to match the digital sound with the phoneme that's most likely to come next. • During this process, the program assigns a probability score to each phoneme, based on its built-in dictionary and user training.

Markov Model

Neural Networks A class of statistical models may be called "neural" if they consist of • sets of adaptive weights, i.e. numerical parameters that are tuned by a learning algorithm, and • are capable of approximating non-linear functions of their inputs. The adaptive weights are conceptually connection strengths between neurons, which are activated during training and prediction.

Each circular node represents an artificial neuron and an arrow represents a connection from the output of one neuron to the input of another.

Program Training • The process is more complicated for phrases and sentences -- the system has to figure out where each word stops and starts. • The statistical systems need lots of exemplary training data to reach their optimal performance. • Sometimes on the order of thousands of hours of human-transcribed speech and hundreds of megabytes of text. • The training data are used to create acoustic models of words, word lists and multi-word probability networks. • The details can make the difference between a well-performing system and a poorly-performing system -- even when using the same basic algorithm.

Applications • Transcription • dictation, information retrieval • Command and control • data entry, device control, navigation, call routing • Information access • airline schedules, stock quotes, directory assistance • Problem solving • travel planning, logistics

Weaknesses and Flaws • Low signal-to-noise ratio - The program needs to "hear" the words spoken distinctly, and any extra noise introduced into the sound will interfere with this. • Overlapping speech- Current systems have difficulty separating simultaneous speech from multiple users. • Intensive use of computer power. • Homonyms e.g. "There" and "their," "air" and "heir," "be" and "bee"

Major Challenges • Making a system that can flawlessly handle roadblocks like slang, dialects, accents and background noise. • The different grammatical structures used by languages can also pose a problem. For example, Arabic sometimes uses single words to convey ideas that are entire sentences in English.

The Future of Speech Recognition • The Defense Advanced Research Projects Agency (DARPA) has three teams of researchers working on Global Autonomous Language Exploitation (GALE), a program that will take in streams of information from foreign news broadcasts and newspapers and translate them. • It hopes to create software that can instantly translate two languages with at least 90 percent accuracy. • "DARPA is also funding an R&D effort called TRANSTAC to enable the soldiers to communicate more effectively with civilian populations in nonEnglish-speaking countries.

Conclusion At some point in the future, speech recognition may become speech understanding. The statistical models that allow computers to decide what a person just said may someday allow them to grasp the meaning behind the words. Although it is a huge leap in terms of computational power and software sophistication, some researchers argue that speech recognition development offers the most direct line from the computers of today to true artificial intelligence.

References • • • • • • •

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

Speech recognition - Wikipedia, the free encyclopedia

Speech recognition (SR) is the inter-disciplinary sub-field of computational linguistics which incorporates knowledge and research in the linguistics ...
Read more

Speech Recognition Technology - AHDI

Speech recognition technology (SRT), also known as automated speech recognition (ASR), continuous speech recognition (CSR) or voice recognition (VR ...
Read more

Recognition Technologies, Inc. -- Speaker Recognition

Recognition Technologies, Inc., established in 2003 and located in White Plains, New York, is a biometrics research organization which is involved in ...
Read more

The Business Benefits Of Speech Technology & Voice ...

In business, it is always necessary to try to stay ahead, use new technology and be innovative. Looking for ways to improve service and provide a better ...
Read more

Speech and Dialog Research Group - Microsoft Research

Research in speech recognition, language modeling, language understanding, spoken language systems and dialog systems. Overview Our goal is to fundamentall
Read more

Speech Recognition Technology -- The New Mouse For The ...

It used to be the mouse and then the trackpad, but speech and voice recognition technology is becoming available in more products and it could ...
Read more

Speech-Recognition Technology | Voxygen

An automatic speech recognition system is a complex software which, from audio signal and using statistical models, renders the sentence that was pronounced.
Read more

Speech Recognition Technology | LinkedIn

View 1316 Speech Recognition Technology posts, presentations, experts, and more. Get the professional knowledge you need on LinkedIn.
Read more

Recent Advances in Robust Speech Recognition Technology ...

eBook Shop: Recent Advances in Robust Speech Recognition Technology als Download. Jetzt eBook herunterladen & bequem mit Ihrem Tablet oder eBook Reader lesen.
Read more

Speech Technology - TTS and ASR So Good You Won't Believe It

Text-to-Speech and Speech Recognition Technology at a quality you have to experience. Unlike anything you have heard before. This is the technology advance you
Read more