STKU5 Larson

50 %
50 %
Information about STKU5 Larson
Product-Training-Manuals

Published on June 19, 2007

Author: Mentor

Source: authorstream.com

Tutorial:  Tutorial Developing and Deploying Multimodal Applications James A. Larson Larson Technical Services jim @ larson-tech.com SpeechTEK West February 23, 2007 Developing and Deploying Multimodal Applications:  Developing and Deploying Multimodal Applications What applications should be multimodal? What is the multimodal application development process? What standard languages can be used to develop multimodal applications? What standard platforms are available for multimodal applications? Capturing Input from the User:  Capturing Input from the User Medium Input Device Mode Capturing Input From the User:  Capturing Input From the User Multimodal Acoustic Tactile Visual Microphone Keypad Keyboard Pen Joystick Scanner Still camera RFID Speech Key Ink GUI Photograph Gaze tracking Gesture reco Mouse Medium Input Device Mode Electronic Video camera Biometric GPS Digital data Presenting Output to the User:  Presenting Output to the User Acoustic Visual Speaker Display Speech Text Photograph Movie Medium Output Device Mode Tactile Joystick Pressure Presenting Output to the User:  Presenting Output to the User Acoustic Visual Speaker Display Speech Text Photograph Movie Medium Output Device Mode Tactile Joystick Pressure Multimedia Multimodal and Multimedia Application Benefits:  Multimodal and Multimedia Application Benefits Provide a natural user interface by using multiple channels for user interactions Simplify interaction with small devices with limited keyboard and display, especially on portable devices Leverage advantages of different modes in different contexts Decrease error rates and time required to perform tasks Increase accessibility of applications for special users Enable new kinds of applications Exercise 1:  Exercise 1 What new multimodal applications would be useful for your work? What new multimodal applications would be entertaining to you, your family, or friends? Voice as a “Third Hand”:  Voice as a 'Third Hand' Game Commander 3 http://www.gamecommander.com/ Voice-Enabled Games:  Voice-Enabled Games Scansoft’s VoCon Games Speech SDK http://www.scansoft.com/games/ PlayStation® 2 Nintendo® GameCube™ http://www.omnipage.com/games/poweredby/ Education:  Education Tucker Maxon School of Oral Education http://www.tmos.org/ Education:  Education Reading Tutor Project http://cslr.colorado.edu/beginweb/reading/reading.html Multimodal Applications Developed by PSU and OHSU Students:  Multimodal Applications Developed by PSU and OHSU Students Hands-busy Troubleshooting a car’s motor Repairing a leaky faucet Tune musical instruments Construction Complex origami artifact Project book for children Cooking—Talking recipe book Entertainment Child’s fairy tale book Audio-controlled juke box Games (Battleship, Go) Multimodal Applications Developed by PSU and OHSU Students (continued):  Multimodal Applications Developed by PSU and OHSU Students (continued) Data collection Buy a car Collect health data Buy movie tickets Order meals from a restaurant Conduct banking business Locate a business Order a computer Choose homeless pets from an animal shelter Authoring Photo album tour Education Flash cards—Addition tables Download Opera and the speech plug-in Go to www.larson-tech.com/mm-Projects/Demos.htm New Application Classes:  New Application Classes Active listening Verbal VCR controls: start, stop, fast forward, rewind, etc. Virtual assistants Listen for requests and immediately perform them - Violin tuner - TV Controller - Environmental controller - Family-activity coordinator Synthetic experiences Synthetic interviews Speech-enabled games Education and training Authoring content Two General Uses of Multiple Modes of Input:  Two General Uses of Multiple Modes of Input Redundancy—One mode acts as backup for another mode In noisy environments, use keypad instead of speech input. In cold environments, use speech instead of keypad. Complementary—One mode supplements another mode Voice as a third hand 'Move that (point) to there (point)' (late fusion) Lip reading = video + speech (early fusion) Potential Problems with Multimodal Applications:  Potential Problems with Multimodal Applications Voice may make an application 'noisy.' Privacy and security concerns Noise pollution Sometimes speech and handwriting recognition systems fail. False expectations of users wanting to use natural language. Potential Problems with Multimodal Applications:  Potential Problems with Multimodal Applications Voice may make an application 'noisy.' Privacy and security concerns Noise pollution Sometimes speech and handwriting recognition systems fail. False expectations of users wanting to use natural language. Full natural language processing requires: Knowledge of outside world History of the user-computer interaction Sophisticated understanding of language structure 'Natural language-like' simulates natural language for a small domain, short history, and specialized language structures Potential Problems with Multimodal Applications:  Potential Problems with Multimodal Applications Voice may make an application 'noisy.' Privacy and security concerns Noise pollution Sometimes speech and handwriting recognition systems fail. False expectations of users wanting to use natural language. Full 'natural language' processing requires: Knowledge of outside world History of the user-computer interaction Sophisticated understanding of language structure 'Natural language-like' simulates natural language for a small domain, short history, and specialized language structures. Possible only on Star Trek Incorrectly called 'NLP' Adding a New Mode to an Application:  Adding a New Mode to an Application Only if… The new mode enables new features not previously possible. The new modes dramatically improves the usability Always…. Redesign the application to take advantage of the new mode. Provide backup for the new mode. Test, test, and test some more. Exercise 2:  Exercise 2 Where will multimodal applications be used? A. At home B. At work C. 'On the road' D. Other? Developing and Deploying Multimodal Applications:  Developing and Deploying Multimodal Applications What applications should be multimodal? What is the multimodal application development process? What standard languages can be used to develop multimodal applications? What standard platforms are available for multimodal applications? The Playbill—Who’s Who on the Team :  The Playbill— Who’s Who on the Team  Users—Their lives will be improved by using the multimodal application Interaction designer—Designs the dialog—when and how the user and system interchange requests and information Multimodal programmer—Implements VUI  Voice talent—Records spoken prompts and messages Grammar writer—Specifies words and phrases the user may speak in response to a prompt TTS specialist—Specifies verbal and audio sounds and inflections Quality assurance specialist—Performs tests to validate the application is both useful and usable Customer—Pays the bills Program manager—Organizes the work and makes sure it is completed according to schedule and under budget Development Process:  Development Process Investigation Stage Design Stage Development Stage Testing Stage Sustaining Stage Each stage involves users Iterative refinement Development Process:  Development Process Investigation Stage Design Stage Development Stage Testing Stage Sustaining Stage Identify the Application Conduct ethnography studies Identify candidate applications Conduct focus groups Select the application Slide26:  Exercise 3:  Exercise 3 What will be the 'killer' consumer multimodal applications? Development Process:  Development Process Investigation Stage Design Stage Development Stage Testing Stage Sustaining Stage Specify the Application Construct the conceptual model Construct scenarios Specify performance and and#xB; preference requirements Specify Performance and Preference Requirements:  Specify Performance and Preference Requirements Is the application useful? Is the application enjoyable? Performance Preference Measure what the users actually accomplished. Validate that the users achieved success. Measure users’ likes and dislikes. Validate that the users enjoyed the application and will use it again again. Performance Metrics:  Performance Metrics Exercise 4:  Exercise 4 Specify performance metrics for the multimodal email application Preference Metrics:  Preference Metrics Exercise 5:  Exercise 5 Specify preference metrics for the multimodal email application Preference Metrics (Open-ended Questions):  Preference Metrics (Open-ended Questions) What did you like the best about this voice-enabled application? (Do not change these features.) What did you like the least about this voice-enabled application? (Consider changing these features.) What new features would you like to have added? (Consider adding these features in this or a later release.) What features do you think you will never use? (Consider deleting these features.) Do you have any other comments and suggestions? (Pay attention to these responses. Callers frequently suggest very useful ideas.) Development Process:  Development Process Investigation Stage Design Stage Development Stage Testing Stage Sustaining Stage Develop the Application Specify the persona Specify the modes and and#xB; modalities Specify the dialog script UI Design Guidelines:  UI Design Guidelines Guidelines for Voice User Interfaces Bruce Balentine and David P. Morgan. How to Build a Speech Recognition Application, Second Edition. http://www.eiginc.com Guidelines for Graphical User Interfaces Research-Based Web Design and Usability Guidelines. U.S. Department of Health and Human Services. http://www.usability.gov/pdfs/guidelines.html Guidelines for Graphical User Interfaces Common Sense Guidelines for Developing Multimodal User Interfaces.W3C Working Group Note. 19 April 2006 http://www.w3.org/2002/mmi/Group/2006/Guidelines/ Common-sense Suggestions1. Satisfy Real-World Constraints :  Common-sense Suggestions 1. Satisfy Real-World Constraints Task-oriented Guidelines 1.1. Guideline: For each task, use the easiest mode available on the device. Physical Guidelines 1.2. Guideline: If the user’s hands are busy, then use speech. 1.3. Guideline: If the user’s eyes are busy, then use speech. 1.4. Guideline: If the user may be walking, use speech for input. Environmental Guidelines 1.5. Guideline: If the user may be in a noisy environment, then use a pen, keys or mouse. 1.6. Guideline: If the user’s manual dexterity may be impaired, then use speech. Exercise 6:  Exercise 6 What input mode(s) should be used for each of the following tasks? A. Selecting objects B. Entering text C. Entering symbols D. Enter sketches or illustrations Common-sense Suggestions2. Communicate Clearly, Concisely, and Consistently with Users :  Common-sense Suggestions 2. Communicate Clearly, Concisely, and Consistently with Users Consistency Guidelines 2.1. Phrase all prompts consistently. 2.2. Enable the user to speak keyword utterances rather than natural language sentences. 2.3. Switch presentation modes only when the information is not easily presented in the current mode. 2.4. Make commands consistent. 2.5. Make the focus consistent across modes. Organizational Guidelines 2.6. Use audio to indicate the verbal structure. 2.7. Use pauses to divide information into natural 'chunks.' 2.8. Use animation and sound to show transitions. 2.9. Use voice navigation to reduce the number of screens. 2.10. Synchronize multiple modalities appropriately. 2.11. Keep the user interface as simple as possible. Common-sense Suggestions3. Help Users Recover Quickly and Efficiently from Errors :  Common-sense Suggestions 3. Help Users Recover Quickly and Efficiently from Errors Conversational Guidelines 3.1. Users tend to use the same mode that was used to prompt them. 3.2. If privacy is not a concern, use speech as output to provide commentary or help. 3.3. Use directed user interfaces, unless the user is always knowledgeable and experienced in the domain. 3.4 Always provide context-sensitive help for every field and command. Common-sense Suggestions3. Help Users Recover Quickly and Efficiently from Errors (Continued):  Common-sense Suggestions 3. Help Users Recover Quickly and Efficiently from Errors (Continued) Reliability Guidelines Operational status 3.5. The user always should be able to determine easily if the device is listening to the user. 3.6. For devices with batteries, users always should be able to determine easily how much longer the device will be operational. 3.8. Support at least two input modes so one input mode can be used when the other cannot. Visual feedback 3.8. Present words recognized by the speech recognition system on the display, so the user can verify they are correct. 3.9. Display the n-best list to enable easy speech recognition error correction 3.10. Try to keep response times less than 5 seconds. Inform the user of longer response times. Common-sense Suggestions4. Make Users Comfortable :  Common-sense Suggestions 4. Make Users Comfortable Listening mode 4.1. Speak after pressing a speak key. which automatically releases after the user finishes speaking. System Status 4.2. Always present the current system status to the user. Human-memory Constraints 4.3. Use the screen to ease stress on the user’s short-term memory. Common-sense Suggestions4. Make Users Comfortable (Continued):  Common-sense Suggestions 4. Make Users Comfortable (Continued) Social Guidelines 4.4. If the user may need privacy, use a display rather than render speech. 4.5. If the user may need privacy, use a pen or keys. 4.6. If the device may be used during a business meeting, then use a pen or keys (with the keyboard sounds turned off). Advertising Guidelines 4.7. Use animation and sound to attract the user’s attention. 4.8. Use landmarks to help the know where he is. Common-sense Suggestions4. Make Users Comfortable (continued):  Common-sense Suggestions 4. Make Users Comfortable (continued) Ambience 4.9 Use audio and graphic design to set the mood and convey emotion in games and entertainment applications. Accessibility 4.10 For each traditional output technique, provide an alternative output technique. 4.11. Enable users to adjust the output presentation. Books:  Books Ramon Lopez-Cozar Delgado and Masahiro Araki. Spoken, Multilingual and Multimodal Dialog Systems—Development and Assessment. West Sussex, England: Wiley, 2005. Julie A. Jacko and Andrew Sears (Editors) The Human-Computer Interaction Handbook—Fundamentals, Evolving technologies, and Emerging Applications. Mahwah, New Jersey: Lawrence Erlbaum Associates, 2003. Development Process:  Development Process Investigation Stage Design Stage Development Stage Testing Stage Sustaining Stage Test The Application Component test Usability test Stress test Field test Testing Resources:  Testing Resources Jeffrey Rubin. Handbook of Usability Testing. New York: Wiley Technical Communication Library, 1994. Peter and David Leppik. Gourmet Customer Service. Eden Prairie, MN: VocalLabs, 2005. sales@vocalabs.com Development Process:  Development Process Investigation Stage Design Stage Development Stage Testing Stage Sustaining Stage Deploy and Monitor the Application User Survey Usage reports from log files User feedback and comments Developing and Deploying Multimodal Applications:  Developing and Deploying Multimodal Applications What applications should be multimodal? What is the multimodal application development process? What standard languages can be used to develop multimodal applications? What standard platforms are available for multimodal applications? W3C Multimodal Interaction Framework:  W3C Multimodal Interaction Framework Recognition Grammar Semantic Interpretation Extended Multimodal Annotation (EMMA) Speech Synthesis Interaction Managers General description of speech application components and how they relate W3C Multimodal Interaction Framework:  Interaction Manager Application Functions Telephony Properties W3C Multimodal Interaction Framework Input Output Slide52:  ASR Semantic Interpretation Information Integration Interaction Manager TTS Language Generation Application Functions User Ink Media Planning Audio Telephony Functions W3C Multimodal Interaction Framework Display Slide53:  W3C Multimodal Interaction Framework ASR Semantic Interpretation Information Integration Interaction Manager TTS Language Generation Application Functions User Ink Media Planning Audio Telephony Functions Display SRGS: Describe what the user may say at each point in the dialog Speech Recognition Engines:  Speech Recognition Engines   Speech Recognition Engines:  Speech Recognition Engines   Switch vocabularies Grammars:  Grammars Describe what the user may say or handwrite at a point in the dialog Enable the recognition engine to work faster and more accurately Two types of grammars: Structured Grammar Statistical Grammar (N-grams) Structured Grammars:  Structured Grammars Specifies words that a user may speak or write Two representation formats 1. Backus-Naur format (ABNF) Production Rules Single_digit ::= zero | one | two | … | nine Zero_thru_ten ::= Single_digit | ten 2. XML format Can be processed by XML validater Example XML Grammar:  Example XML Grammar andlt;grammar mode = 'voice' type = 'application/srgs+xml' root = 'zero_to_ten'andgt; andlt;rule id = 'zero_to_ten'andgt;        andlt;one-ofandgt;               andlt;ruleref uri = '#single_digit'/andgt;               andlt;itemandgt; ten andlt;/itemandgt;         andlt;/one-ofandgt; andlt;/ruleandgt;      andlt;rule id = 'single_digit'andgt;           andlt;one-ofandgt;                andlt;itemandgt; zero andlt;/itemandgt;                andlt;itemandgt; one andlt;/itemandgt;                andlt;itemandgt; two andlt;/itemandgt;                andlt;itemandgt; three andlt;/itemandgt;                andlt;itemandgt; four andlt;/itemandgt;                andlt;itemandgt; five andlt;/itemandgt;                andlt;itemandgt; six andlt;/itemandgt;                andlt;itemandgt; seven andlt;/itemandgt;                andlt;itemandgt; eight andlt;/itemandgt;               andlt;itemandgt; nine andlt;/itemandgt;           andlt;/one-ofandgt;      andlt;/ruleandgt; andlt;/grammarandgt; Exercise 7:  Exercise 7 Write a grammar that recognizes the digits zero through nineteen (Hint: Modify the previous page) Reusing Existing Grammars:  Reusing Existing Grammars   andlt;grammar type = 'application/srgs+xml' root = 'size ' src = 'http://www.example.com/size.grxml'/andgt; Exercise 8:  Exercise 8 Write a grammar for positive responses to a yes/no question (i.e., 'yes,' 'sure,' 'affirmative,' and so forth) When Is a Grammar Too Large?:  When Is a Grammar Too Large? Word Coverage Response Slide63:  W3C Multimodal Interaction Framework ASR Semantic Interpretation Information Integration Interaction Manager TTS Language Generation Application Functions User Ink Media Planning Audio Telephony Functions Display SISR: A procedural JavaScript-like language for interpreting the text strings returned by the speech synthesis engine Semantic Interpretation:  Semantic Interpretation Semantic scripts employ ECMAScript Advantages: Translate aliases to vocabulary words Perform calculations Produces a rich structure rather than a text string Semantic Interpretation:  Semantic Interpretation Recognizer Conversation Manager Large white t-shirt Big white t-shirt Grammar Semantic Interpretation:  Semantic Interpretation Recognizer Grammar with Semantic Interpretation Scripts Semantic Interpretation Processor Conversation Manager andlt;rule id = 'action'andgt; andlt;one-ofandgt;and#xB;     andlt;itemandgt; small andlt;tagandgt; out.size = 'small'; andlt;/tagandgt; andlt;/itemandgt;and#xB;        andlt;itemandgt; medium andlt;tagandgt; out.size = 'medium'; andlt;/tagandgt; andlt;/itemandgt; and#xB; andlt;itemandgt; large andlt;tagandgt; out.size = 'large'; andlt;/tagandgt; andlt;/itemandgt; and#xB; andlt;itemandgt; big andlt;tagandgt; out.size = 'large'; andlt;/tagandgt; andlt;/itemandgt;     andlt;/one-ofandgt;and#xB; andlt;one-ofandgt;and#xB;     andlt;itemandgt; green andlt;tagandgt; out.color = 'green'; andlt;/tagandgt; andlt;/itemandgt;and#xB;        andlt;itemandgt; blue   andlt;tagandgt; out.color = 'blue'; andlt;/tagandgt;  andlt;/itemandgt;and#xB;        andlt;itemandgt; white andlt;tagandgt; out.color = 'white'; andlt;/tagandgt;  andlt;/itemandgt;and#xB;    andlt;/one-ofandgt;and#xB;andlt;/ruleandgt; Big white t-shirt { size: large color: white } Exercise 9 Modify this rule to return only “yes”:  Exercise 9 Modify this rule to return only 'yes' andlt;grammar type = 'application/srgs+xml' root = 'yes' mode = 'voice'andgt; andlt;rule id = 'yes'andgt;        andlt;one-ofandgt;               andlt;itemandgt; yes andlt;/itemandgt;               andlt;itemandgt; sure andlt;/itemandgt; andlt;itemandgt; affirmative andlt;/itemandgt; …   andlt;/one-ofandgt; andlt;/ruleandgt; andlt;/grammarandgt; Slide68:  W3C Multimodal Interaction Framework ASR Semantic Interpretation Information Integration Interaction Manager TTS Language Generation Application Functions User Ink Media Planning Audio Telephony Functions Display EMMA: A language for representing the semantic content from speech recognizers, handwriting recognizers, and other input devices EMMA:  EMMA Extensible MultiModal Annotation markup language Canonical structure semantic interpretations for a variety of inputs including: Speech Natural language text GUI Ink EMMA:  EMMA Keyboard Interpretation Speech Recognition Merging/ Unification Speech Keyboard EMMA EMMA EMMA Grammar + Semantic Interpretation Instructions Interpretation Instructions Applications EMMA:  EMMA Keyboard Interpretation Speech Recognition Merging/ Unification Speech Keyboard EMMA EMMA EMMA Grammar + Semantic Interpretation Instructions Interpretation Instructions Applications andlt;interpretation mode = 'speech'andgt; andlt;travelandgt; andlt;to hook='ink'/andgt; andlt;from hook='ink'/andgt; andlt;dayandgt; Tuesday andlt;/dayandgt; andlt;/travelandgt; andlt;/interpretationandgt; EMMA:  EMMA Keyboard Interpretation Speech Recognition Merging/ Unification Speech Keyboard EMMA EMMA EMMA Grammar + Semantic Interpretation Instructions Interpretation Instructions Applications andlt;interpretation mode = 'speech'andgt; andlt;travelandgt; andlt;to hook='ink'/andgt; andlt;from hook='ink'/andgt; andlt;dayandgt; Tuesday andlt;/dayandgt; andlt;/travelandgt; andlt;/interpretationandgt; andlt;interpretation mode = 'ink'andgt; andlt;travelandgt; andlt;toandgt;Las Vegas andlt;/toandgt; andlt;fromandgt;Portland andlt;/fromandgt; andlt;/travelandgt; andlt;/interpretationandgt; EMMA:  andlt;interpretation mode = 'speech'andgt; andlt;travelandgt; andlt;to hook='ink'/andgt; andlt;from hook='ink'/andgt; andlt;dayandgt; Tuesday andlt;/dayandgt; andlt;/travelandgt; andlt;/interpretationandgt; andlt;interpretation mode = 'ink'andgt; andlt;travelandgt; andlt;toandgt;Las Vegas andlt;/toandgt; andlt;fromandgt;Portland andlt;/fromandgt; andlt;/travelandgt; andlt;/interpretationandgt; EMMA Keyboard Interpretation Speech Recognition Merging/ Unification Speech Keyboard EMMA EMMA EMMA Grammar + Semantic Interpretation Instructions Interpretation Instructions Applications andlt;interpretation mode = 'interp1'andgt; andlt;travelandgt; andlt;toandgt; Las Vegas andlt;/toandgt; andlt;fromandgt; Portland andlt;/fromandgt; andlt;dayandgt; Tuesday andlt;/dayandgt; andlt;/travelandgt; andlt;/interpretationandgt; Exercise 10:  Exercise 10 andlt;interpretation mode = 'speech'andgt; andlt;moneyTransferandgt; andlt;sourceAcct hook='ink'/andgt; andlt;targetAcct hook='ink'/andgt; andlt;amountandgt; 300 andlt;/amountandgt; andlt;/moneyTransferandgt; andlt;/interpretationandgt; andlt;interpretation mode = 'ink'andgt; andlt;moneyTransferandgt; andlt;sourceAcctandgt; savings andlt;/sourceAcctandgt; andlt;targetAcctandgt; checkingandlt;/targetAcctandgt; andlt;/moneyTransferandgt; andlt;/interpretationandgt; Given the following two EMMA specifications, what is the unified EMMA specification? andlt;interpretation mode ='intp1'andgt; andlt;moneyTransferandgt; andlt;sourceAcctandgt; ______ andlt;/sourceAcctandgt; andlt;targetAcctandgt; _______andlt;/targetAcctandgt; andlt;amountandgt; ______ andlt;/amountandgt; andlt;/moneyTransferandgt; andlt;/interpretationandgt; Unified EMMA specification: Slide75:  W3C Multimodal Interaction Framework ASR Semantic Interpretation Information Integration Interaction Manager TTS Language Generation Application Functions User Ink Media Planning Audio Telephony Functions Display SSML: A language for rendering text as synthesized speech Speech Synthesis Markup Language:  Speech Synthesis Markup Language Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: emphasis, break, prosody Non-markup behavior: automatically generate prosody through analysis of document structure and sentence syntax Markup support: phoneme, sayas Non-markup behavior: look up in pronunciation dictionary Markup support: sayas for dates, times, etc. Non-markup behavior: automatically identify and convert constructs Markup support: paragraph, sentence Non-markup behavior: infer structure by automated text analysis Speech Synthesis Markup LanguageExamples:  Speech Synthesis Markup Language Examples andlt;phoneme alphabet='ipa' ph='wɪnɛfɛks'andgt; WinFX andlt;/phonemeandgt; is a great platform andlt;prosody pitch = 'x-low'andgt;      Who’s been sleeping in my bed? andlt;/prosodyandgt;   said papa bear. andlt;prosody pitch = 'medium'andgt;      Who’s been sleeping in my bed? andlt;/prosodyandgt; said momma bear.   andlt;prosody pitch = 'x-high'andgt;      Who’s been sleeping in my bed? andlt;/prosodyandgt; said baby bear. Popular Strategy:  Popular Strategy Develop dialogs using SSML Usability test dialogs Extract prompts Hire voice talent to record prompts Replace andlt;promptandgt; with andlt;audioandgt; Slide79:  W3C Multimodal Interaction Framework ASR Semantic Interpretation Information Integration Interaction Manager TTS Language Generation Application Functions User Ink Media Planning Audio Telephony Functions Display VXML: A language for controlling the exchange of information and commands between the user and the system Developing and Deploying Multimodal Applications:  Developing and Deploying Multimodal Applications What applications should be multimodal? What is the multimodal application development process? What standard languages can be used to develop multimodal applications? What standard platforms are available for multimodal applications? Speech APIs and SDKs:  Speech APIs and SDKs JSAPI—Java Speech Application Program Interface http://java.sun.com/products/java-media/speech/ http://developer.mozilla.org/en/docs/JSAPI_Reference Nuance Mobil Speech Platform http://www.nuance.com/speechplatform/components.asp VSAPI—Voice Signal API http://www.voicesignal.com/news/articles/2006-06-21-SymbianOne.htm SALT http://www.saltforum.org/ Interaction Manager Approaches:  Interaction Manager Approaches Interaction Manager (XHTML) VoiceXML 2.0 Modules Interaction Manager (C#) SAPI 5.3 X+V Object- oriented Interaction Manager (SCXML) XHTML VoiceXML 3.0 InkML W3C Interaction Manager Approaches:  Interaction Manager Approaches X+V Interaction Manager (SCXML) XHTML VoiceXML 3.0 InkML W3C Interaction Manager (XHTML) VoiceXML 2.0 Modules Interaction Manager (C#) SAPI 5.3 Object- oriented SAPI 5.3 & Windows Vista™Speech Synthesis:  SAPI 5.3 andamp; Windows Vista™ Speech Synthesis W3C Speech Synthesis Markup Language 1.0 andlt;speakandgt; andlt;phoneme alphabet='ipa' ph='wɪnɛfɛks'andgt; WinFX andlt;/phonemeandgt; is a great platform andlt;/speakandgt; Microsoft proprietary PromptBuilder myPrompt.AppendTextWithPronunciation ('WinFX', 'wɪnɛfɛks'); myPrompt.AppendText('is a great platform.'); Interaction Manager (C#) SAPI 5.3 Object- oriented SAPI 5.3 & Windows Vista™Speech Recognition:  SAPI 5.3 andamp; Windows Vista™ Speech Recognition W3C Speech Recognition Grammar Specification 1.0 andlt;grammar type='application/srgs+xml' root= 'city' mode='voice'andgt; andlt;rule id = 'city'andgt; andlt;one-ofandgt; andlt;itemandgt; New York City andlt;/itemandgt; andlt;itemandgt; New York andlt;/itemandgt; andlt;itemandgt; Boston andlt;/itemandgt; andlt;/one-ofandgt; andlt;/ruleandgt; andlt;/grammarandgt; Microsoft proprietary Grammar Builder Choices cityChoices = new Choices(); cityChoices.AddPhrase ('New York City'); cityChoices.AddPhrase ('New York'); cityChoices.AddPhrase ('Boston'); Grammar pizzaGrammar = new Grammar (new GrammarBuilder(pizzaChoices)); SAPI 5.3 & Windows Vista™Semantic Interpretation:  SAPI 5.3 andamp; Windows Vista™ Semantic Interpretation Augment SRGS grammar with Jscript® for semantic interpretation andlt;grammar type='application/srgs+xml' root= 'city' mode='voice'andgt; andlt;rule id = 'city'andgt; andlt;one-ofandgt; andlt;itemandgt; New York City andlt;tagandgt; city='JFK' andlt;/tagandgt;andlt;/itemandgt; andlt;itemandgt; New York andlt;tagandgt; city = 'JFK' andlt;/tagandgt; andlt;/itemandgt; andlt;itemandgt; Portland andlt;tagandgt; city = 'PDX' andlt;/tagandgt;andlt;/itemandgt; andlt;/one-ofandgt; andlt;/ruleandgt; andlt;/grammarandgt; User-Specified 'Shortcuts' recognizer replaces 'shortcut word' by expanded string User says: my address System: 1033 Smith Street, Apt. 7C, Bloggsville 00000 SAPI 5.3 & Windows Vista™Dialog :  SAPI 5.3 andamp; Windows Vista™ Dialog Introduce the System Speech.Recognition namespace Instantiate a SpeechRecognizer object Build a grammar Attach an event handler Load the grammar into the recognizer When the recognizer hears something that fits the grammar, the SpeechRecognized event handler is invoked, which accesses the Result object and works with the recognized text SAPI 5.3 & Windows Vista™Dialog:  SAPI 5.3 andamp; Windows Vista™ Dialog using System; using System.Windows.Forms; using System.ComponentModel; using System.Collections.Generic; using System.Speech.Recognition; namespace Reco_Sample_1 { public partial class Form1 : Form { //create a recognizer SpeechRecognizer _recognizer = new SpeechRecognizer(); public Form1() { InitializeComponent(); } private void Form1_Load(object sender, EventArgs e) //Create a pizza grammar Choices pizzaChoices = new Choices(); pizzaChoices.AddPhrase('I'd like a cheese pizza'); pizzaChoices.AddPhrase('I'd like a pepperoni pizza'); { pizzaChoices.AddPhrase('I'd like a large pepperoni pizza'); pizzaChoices.AddPhrase( 'I'd like a small thin crust vegetarian pizza'); Grammar pizzaGrammar = new Grammar(new GrammarBuilder(pizzaChoices)); //Attach an event handler pizzaGrammar.SpeechRecognized += new EventHandlerandlt;RecognitionEventArgsandgt;( PizzaGrammar_SpeechRecognized); _recognizer.LoadGrammar(pizzaGrammar); } void PizzaGrammar_SpeechRecognized( object sender, RecognitionEventArgs e) { MessageBox.Show(e.Result.Text); } } } SAPI 5.3 & Windows Vista™References:  SAPI 5.3 andamp; Windows Vista™ References Speech API Overview http://msdn2.microsoft.com/en- us/library/ms720151.aspx#API_Speech_Recognition Microsoft Speech API (SAPI) 5.3 http://msdn2.microsoft.com/en-us/library/ms723627.aspx 'Exploring New Speech Recognition And Synthesis APIs In Windows Vista' by Robert Brown http://msdn.microsoft.com/msdnmag/issues/06/01/ speechinWindowsVista/default.aspx#Resources Interaction Manager Approaches:  Interaction Manager Approaches X+V Interaction Manager (SCXML) XHTML VoiceXML 3.0 InkML W3C Interaction Manager (XHTML) VoiceXML 2.0 Modules Interaction Manager (C#) SAPI 5.3 Object- oriented Step 1: Start with Standard VoiceXML and Standard XHTML:  Step 1: Start with Standard VoiceXML and Standard XHTML VoiceXML andlt;form id='topform'andgt; andlt;field name='city'andgt; andlt;promptandgt;Say a nameandlt;/promptandgt; andlt;grammar src='city.grxml'/andgt; andlt;/fieldandgt; andlt;/formandgt; XHTML andlt;formandgt; Result: andlt;input type='text' name='in1'/andgt; andlt;/formandgt; W3C grammar language Step 2: Combine:  Step 2: Combine andlt;html xmlns='http://www.w3.org/1999/xhtml'andgt; andlt;headandgt; andlt;form id='topform'andgt; andlt;field name='city'andgt; andlt;promptandgt;Say a nameandlt;/vxml:promptandgt; andlt;grammar src ='city.grxml'/andgt; andlt;/fieldandgt; andlt;/formandgt; andlt;/headandgt; andlt;body andlt;formandgt; Result: andlt;input type='text' name='in1'/andgt; andlt;/formandgt; andlt;/bodyandgt; andlt;/htmlandgt; Step 3: Insert vxml Namespace:  Step 3: Insert vxml Namespace andlt;html xmlns='http://www.w3.org/1999/xhtml' xmlns:vxml='http://www.w3.org/2001/vxml'andgt; andlt;headandgt; andlt;vxml:form id='topform'andgt; andlt;vxml:field name='city'andgt; andlt;vxml:promptandgt;Say a nameandlt;/vxml:promptandgt; andlt;vxml:grammar ='city.grxml'/andgt; andlt;/vxml:fieldandgt; andlt;/vxml:formandgt; andlt;/headandgt; andlt;bodyandgt; andlt;formandgt; Result: andlt;input type='text' name='in1'/ andlt;/formandgt; andlt;/bodyandgt; andlt;/htmlandgt; Step 4: Insert event:  Step 4: Insert event andlt;html xmlns=http://www.w3.org/1999/xhtml xmlns:vxml=http://www.w3.org/2001/vxml xmlns:ev='http://www.w3.org/2001/xml-events'andgt; andlt;headandgt; andlt;vxml:form id='topform'andgt; andlt;vxml:field name='city'andgt; andlt;vxml:promptandgt;Say a nameandlt;/vxml:promptandgt; andlt;vxml:grammar src ='city.grxml'/andgt; andlt;/vxml:fieldandgt; andlt;/vxml:formandgt; andlt;/headandgt; andlt;body andlt;form ev:event='load' ev:handler='#topform'andgt; Result: andlt;input type='text' name='in1'/andgt; andlt;/formandgt; andlt;/bodyandgt; andlt;/htmlandgt; Step 5: Insert <sync>:  Step 5: Insert andlt;syncandgt; andlt;html xmlns=http://www.w3.org/1999/xhtml xmlns:vxml=http://www.w3.org/2001/vxml xmlns:ev=http://www.w3.org/2001/xml-events xmlns:xv='http://www.w3.org/2002/xhtml+voice'andgt; andlt;headandgt; andlt;xv:sync xv:input='in1' xv:field='#result'/andgt; andlt;vxml:form id='topform'andgt; andlt;vxml:field name='city' xv:id='result'andgt; andlt;vxml:promptandgt;Say a nameandlt;/vxml:promptandgt; andlt;vxml:grammar src ='city.grxml'/andgt; andlt;/vxml:fieldandgt; andlt;/vxml:formandgt; andlt;/headandgt; andlt;body andlt;form ev:event='load' ev:handler='#topform'andgt; Result: andlt;input type='text' name='in1'/andgt; andlt;/formandgt; andlt;/bodyandgt; andlt;/htmlandgt; XHTML plus Voice (X+V) References:  XHTML plus Voice (X+V) References Available on ACCESS Systems’ NetFront Multimodal Browser for PocketPC 2003 http://www-306.ibm.com/software/pervasive/ multimodal/?Openandamp;ca=daw-prod-mmb Opera Software Multimodal Browser for Sharp Zaurus http://www-306.ibm.com/software/pervasive/ multimodal/?Openandamp;ca=daw-prod-mmb Opera 9 for Windows http://www.opera.com/ Programmers Guide ftp://ftp.software.ibm.com/software/pervasive/info/multimodal /XHTML_voice_programmers_guide.pdf For a variety of small illustrative applications http://www.larson-tech.com/MM-Projects/Demos.htm Exercise 11:  Exercise 11 Specify the X+V notation for integrating the following VoiceXML and XHTML code by completing the code on the next page VoiceXML andlt;form id='stateForm'andgt; andlt;field name='state'andgt; andlt;promptandgt;Say a state nameandlt;/promptandgt; andlt;grammar src='city.grxml'/andgt; andlt;/fieldandgt; andlt;/formandgt; XHTML andlt;formandgt; Result: andlt;input type='text' name='in1'/andgt; andlt;/formandgt; Exercise 11 (continued):  Exercise 11 (continued) andlt;html xmlns='http://www.w3.org/1999/xhtml' xmlns:vxml='http://www.w3.org/2001/vxml' xmlns:ev='http://www.w3.org/2001/xml-events' xmlns:xv='http://www.w3.org/2002/xhtml+voice'andgt; andlt;headandgt; andlt;xv:sync xv:input='_______' xv:field='________'/andgt; andlt;vxml:form id='________'andgt; andlt;vxml:field name='state' xv:id='________'andgt; andlt;vxml:promptandgt;Say a state nameandlt;/vxml:promptandgt; andlt;vxml:grammar src ='state.grxml'/andgt; andlt;/vxml:fieldandgt; andlt;/vxml:formandgt; andlt;/headandgt; andlt;body andlt;form ev:event='load' ev:handler='#________'andgt; Result: andlt;input type='text' name='_______'/andgt; andlt;/formandgt; andlt;/bodyandgt; andlt;/htmlandgt; Interaction Manager Approaches:  Interaction Manager Approaches X+V Interaction Manager (SCXML) XHTML VoiceXML 3.0 InkML W3C Interaction Manager (XHTML) VoiceXML 2.0 Modules Interaction Manager (C#) SAPI 5.3 Object- oriented MMI Architecture—4 Basic Components:  MMI Architecture—4 Basic Components Runtime Framework or Browser— initializes application and interprets the markup Interaction Manager—coordinates modality components and provides application flow Modality Components—provide modality capabilities such as speech, pen, keyboard, mouse Data Model—handles shared data Interaction Manager (SCXML) XHTML VoiceXML 3.0 InkML Data Model Multimodal Architecture and Interfaces:  Multimodal Architecture and Interfaces A loosely-coupled, event-based architecture for integrating multiple modalities into applications All communication is event-based Based on a set of standard life-cycle events Components can also expose other events as required Encapsulation protects component data Encapsulation enhances extensibility to new modalities Can be used outside a Web environment XHTML VoiceXML 3.0 InkML Interaction Manager (SCXML) Data Model Specify Interaction Manager Using Harel State Charts:  Specify Interaction Manager Using Harel State Charts Extension of state transition systems States Transitions Nested state-transition systems Parallel state-transition systems History Prepare State Start State WaitState EndState FailState Prepare Response (success) Start Response Done Success StartFail DoneFail Prepare Response (fail) Example State Transition System :  Example State Transition System State Chart XML (SCXML) … andlt;state id='PrepareState'andgt; andlt;send event='prepare' contentURL='hello.vxml'/andgt; andlt;transition event='prepareResponse' cond='status='success'' target='StartState'/andgt; andlt;transition event='prepareResponse' cond='status='failure'' target='FailState'/andgt; andlt;/stateandgt; … Prepare State Start State WaitState EndState FailState Prepare Response (success) Start Response Done Success StartFail DoneFail Prepare Response (fail) Example State Chart with Parallel States :  Example State Chart with Parallel States Prepare Voice Start Voice Wait Voice End Voice Fail Voice Prepare Response Success Start Response Done Success Start Fail Done Fail Prepare GUI Start GUI Wait GUI End GUI Fail GUI Prepare Response Success Start Response Done Success Start Fail Done Fail Prepare Response Fail Prepare Response Fail The Life Cycle Events:  The Life Cycle Events More Life Cycle Events:  More Life Cycle Events Interaction Manager GUI VUI newContextRequest newContextRequest newContextResponse newContextResponse Interaction Manager GUI VUI data data Interaction Manager GUI done Interaction Manager GUI VUI clearContext clearContext Synchronization Using the Lifecycle Data Event:  Synchronization Using the Lifecycle Data Event Intent-based events Capture the underlying intent rather than the physical manifestation of user-interaction events Independent of the physical characteristics of particular devices Data/reset Reset one or more field values to null Data/focus Focus on another field Data/change Field value has changed Interaction Manager GUI VUI data data Lifecycle Events between Interaction Manager and Modality :  Interaction Manager Lifecycle Events between Interaction Manager and Modality Modality Prepare State Start State WaitState EndState FailState Prepare Response Success) Start Response Done Success Start Fail DoneFail Prepare Response Fail prepare prepare response (success) start start response (success) data done prepare response (failure) start response (failure) MMI Architecture Principles:  MMI Architecture Principles Runtime Framework communicates with Modality Components through asynchronous events Modality Components don’t communicate directly with each other, but indirectly through the Runtime Framework Components must implement basic life cycle events, may expose other events Modality components can be nested (e.g. a Voice Dialog component like a VoiceXML andlt;formandgt;) Components need not be markup-based EMMA communicates users’ inputs to the Interaction Manager Modalities:  Modalities GUI Modality (XHTML) Adapter converts Lifecycle events to XHTML events XHTML events converted to lifecycle events XHTML VoiceXML 3.0 Interaction Manager (SCXML) Data Model Voice Modality (VoiceXML 3.0) Lifecyle events are embedded into VoiceXML 3.0 Exercise 12:  Exercise 12 What should VoiceXML do when it receives each of the following events? Reset Change Focus Modalities:  Modalities VoiceXML 3.0 will support lifecycle events. andlt;formandgt; andlt;catch name='change'andgt; andlt;assign name='city' value='data'/andgt; andlt;/catchandgt; … andlt;field name = 'city'andgt; andlt;promptandgt; Blah andlt;/promptandgt; andlt;grammar src='city.grxml'/andgt; andlt;filledandgt; andlt;send event='data.change' data='city'/andgt; andlt;/filledandgt; andlt;/fieldandgt; andlt;/formandgt; XHTML VoiceXML 3.0 Interaction Manager (SCXML) Data Model Exercise 13:  Exercise 13 What should HTML do when it receives each of the following events? Reset Change Focus Modalities:  Modalities XHTML is extended to support lifecycle events sent to a modality. andlt;headandgt; … andlt;ev:Listener ev:event='onChange' ev:observer='app1' ev:handler='onChangeHandler()';andgt; … andlt;scriptandgt; {function onChangeHandler() post ('data', data='city') } andlt;/scriptandgt; andlt;/headandgt; … andlt;body id='app1'? andlt;input type='text' id=city 'value= ' '/andgt; andlt;/bodyandgt; … XHTML VoiceXML 3.0 Interaction Manager (SCXML) Data Model Modalities:  Modalities XHTML is extended to support lifecycle events sent to the interaction manager andlt;headandgt; … andlt;handler type='text/javascript' ev:event='data' if (event='change' {document.app1.city.value='data.city'} andlt;/handlerandgt; … andlt;/headandgt; … andlt;body id='app1'? andlt;input type='text' id='city' value=' '/andgt; andlt;/bodyandgt; … XHTML VoiceXML 3.0 Interaction Manager (SCXML) Data Model References:  References SCXML Second working draft available at http://www.w3.org/TR/2006/WD-scxml-20060124/ Open Source available from http://jakarta.apache.org/commons/sandbox/scxml/ Multimodal Architecture and Interfaces Working draft available at http://www.w3.org/TR/2006/WD-mmi-arch-20060414/ Voice Modality First working draft VoiceXML 3.0 scheduled for November 2007 XHTML Full recommendation Adapters must be hand-coded Other modalities TBD Comparison:  Comparison Object- oriented X+V W3C Standard Languages SRGS VoiceXML SCXML SISR SRGS SRGS SSML SSML VoiceXML SISR SSML XHTML SISR XHTML EMMA CCXML Interaction Manager C# XHTML SCXML Modes GUI GUI GUI Speech Speech Speech Ink … Availability:  Availability SAPI 5.3 Microsoft Windows Vista® X+V ACCESS Systems’ NetFront Multimodal Browser for PocketPC 2003 http://www-306.ibm.com/software/pervasive/ multimodal/?Openandamp;ca=daw-prod-mmb Opera Software Multimodal Browser for Sharp Zaurus http://www-306.ibm.com/software/pervasive/ multimodal/?Openandamp;ca=daw-prod-mmb Opera 9 for Windows http://www.opera.com/ W3C First working draft of VoiceXML 3.0 not yet available Working drafts of SCXML are available; some open-source implementations are available Proprietary APIs Available from vendor Discussion Question:  Discussion Question Should a developer insert SALT tags or X+V modules into an existing Web page without redesigning the Web page? Conclusion:  Conclusion Multimodal applications offer benefits over today’s traditional GUIs. Only use multimodal if there is a clear benefit. Standard languages are available today to develop multimodal applications. Don’t reinvent the wheel. Creativity and lots of usability testing are necessary to create world-class multimodal applications. Web Resources:  Web Resources http://www.w3.org/voice Specification of grammar, semantic interpretation, and speech synthesis languages http://www.w3.org/2002/mmi Specification of EMMA and InkML languages http:/www.microsoft.com (and query SALT) SALT specification and download instructions for adding SALT to Internet Explorer http://www-306.ibm.com/software/pervasive/multimodal/ X+V specification; download Opera and ACCESS browsers http://www.larson-tech.com/SALT/ReadMeFirst.html Student projects using SALT to develop multimodal applications http://www.larson-tech.com/MMGuide.html or http://www.w3.org/2002/mmi/Group/2006/Guidelines/ User interface guidelines for multimodal applications Status of W3C Multimodal Interface Languages:  Working Draft Recommendation Status of W3C Multimodal Interface Languages Proposed Recommendation Candidate Recommendation Last Call Working Draft Requirements Voice XML 2.0 Speech Recog- nition Grammar Format (SRGS) 1.0 Speech Synthesis Markup Language (SSML) 1.0 Extended Multi- modal Interaction (EMMA) 1.0 Semantic Interpret- ation of Speech Recog- nition (SISR) 1.0 State Chart XML (SCXML) 1.0 InkXL 1.0 Voice XML 2.1 Questions:  Questions ? Answer to Exercise 5:  Answer to Exercise 5 Answer to Exercise 7Write a grammar for zero to nineteen:  Answer to Exercise 7 Write a grammar for zero to nineteen andlt;grammar type = 'application/srgs+xml' root = 'zero_to_19' mode = 'voice'andgt; andlt;rule id = 'zero_to_19'andgt;        andlt;one-ofandgt;               andlt;ruleref uri = '#single_digit'/andgt;         andlt;ruleref uri ='#teens'andgt; andlt;/one-ofandgt; andlt;/ruleandgt;  andlt;rule id = 'single_digit'andgt;         andlt;one-ofandgt;                andlt;itemandgt; zero andlt;/itemandgt;                andlt;itemandgt; one andlt;/itemandgt;                andlt;itemandgt; two andlt;/itemandgt;                andlt;itemandgt; three andlt;/itemandgt;                andlt;itemandgt; four andlt;/itemandgt;                andlt;itemandgt; five andlt;/itemandgt;                andlt;itemandgt; six andlt;/itemandgt;                andlt;itemandgt; seven andlt;/itemandgt;                andlt;itemandgt; eight andlt;/itemandgt;               andlt;itemandgt; nine andlt;/itemandgt;          andlt;/one-ofandgt; andlt;/ruleandgt; andlt;rule id = '#teens'andgt;   andlt;one-ofandgt;              andlt;itemandgt; tenandlt;/itemandgt;  andlt;itemandgt; eleven andlt;/itemandgt;          andlt;itemandgt; twelve andlt;/itemandgt;          andlt;itemandgt; thirteen andlt;/itemandgt;          andlt;itemandgt; fourteen andlt;/itemandgt;              andlt;itemandgt; fifteen andlt;/itemandgt;              andlt;itemandgt; sixteen andlt;/itemandgt;              andlt;itemandgt; seventeen andlt;/itemandgt;              andlt;itemandgt; eighteen andlt;/itemandgt;              andlt;itemandgt; nineteen andlt;/itemandgt;     andlt;/one-ofandgt; andlt;/ruleandgt; andlt;/grammarandgt; Answer to Exercise 8:  Answer to Exercise 8 andlt;grammar type = 'application/srgs+xml' root = 'yes' mode = 'voice'andgt; andlt;rule id = 'yes'andgt;         andlt;one-ofandgt;               andlt;itemandgt; yes andlt;/itemandgt;               andlt;itemandgt; sure andlt;/itemandgt; andlt;itemandgt; affirmative andlt;/itemandgt; …   andlt;/one-ofandgt; andlt;/ruleandgt; andlt;/grammarandgt; Answer to Exercise 9:  Answer to Exercise 9 andlt;grammar type = 'application/srgs+xml' root = 'yes' mode = 'voice'andgt; andlt;rule id = 'yes'andgt;        andlt;one-ofandgt;               andlt;itemandgt; yes andlt;/itemandgt;               andlt;itemandgt; sure andlt;tagandgt; out = 'yes' andlt;/tagandgt; andlt;/itemandgt; andlt;itemandgt; affirmative andlt;tagandgt; out = 'yes' andlt;/tagandgt; andlt;/itemandgt; … andlt;/one-ofandgt; andlt;/ruleandgt; andlt;/grammarandgt; Answer to Exercise 10:  Answer to Exercise 10 andlt;interpretation mode = 'speech'andgt; andlt;moneyTransferandgt; andlt;sourceAcct hook='ink'/andgt; andlt;targetAcct hook='ink'/andgt; andlt;amountandgt; 300 andlt;/amountandgt; andlt;/moneyTransferandgt; andlt;/interpretationandgt; andlt;interpretation mode = 'ink'andgt; andlt;moneyTransferandgt; andlt;sourceAcctandgt; savings andlt;/sourceAcctandgt; andlt;targetAcctandgt; checkingandlt;/targetAcctandgt; andlt;/moneyTransferandgt; andlt;/interpretationandgt; Given the following two EMMA specifications, what is the unified EMMA specification? andlt;interpretation mode = 'intp1'andgt; andlt;moneyTransferandgt; andlt;sourceAcctandgt; savings andlt;/sourceAcctandgt; andlt;targetAcctandgt; checkingandlt;/targetAcctandgt; andlt;amountandgt; 300 andlt;/amountandgt; andlt;/moneyTransferandgt; andlt;/interpretationandgt; Answer to Exercise 11:  Answer to Exercise 11 andlt;html xmlns= 'http://www.w3.org/1999/xhtml' xmlns:vxml= 'http://www.w3.org/2001/vxml' xmlns:ev= 'http://www.w3.org/2001/xml-events' xmlns:xv='http://www.w3.org/2002/xhtml+voice'andgt; andlt;headandgt; andlt;xv:sync xv:input='in4' xv:field='#answer'/andgt; andlt;vxml:form id= 'stateForm'andgt; andlt;vxml:field name= 'state' xv:id= 'answer'andgt; andlt;vxml:promptandgt;Say a state nameandlt;/vxml:promptandgt; andlt;vxml:grammar src = 'state.grxml'/andgt; andlt;/vxml:fieldandgt; andlt;/vxml:formandgt; andlt;/headandgt; andlt;body andlt;form ev:event='load' ev:handler='#stateForm'andgt; Result: andlt;input type='text' name='in4'/andgt; andlt;/formandgt; andlt;/bodyandgt; andlt;/htmlandgt; Exercise 12:  Exercise 12 What should HTML do when it receives each of the following events? Reset Reset the value Change Change the value Focus Prompt for the value now in focus Exercise 13:  Exercise 13 What should HTML do when it receives each of the following events? Reset Reset the value Author decides if cursor should be moved to the reset value Change Change the value Author decides if cursor should be moved to the reset value Focus Move the cursor to the item in focus

Add a comment

Related presentations

Related pages

Multimedia and Games in Computer Adaptive Testing | PPT ...

Multimedia and Games in Computer Adaptive Testing. ... http://www.speechtek.com/West2007/Friday/STKU5_Larson.ppt. Preview. Download. Filesize: 5164 KB ...
Read more

Nuance Overview and Tutorial | PPT Directory

Nuance Overview and Tutorial The Nuance system contains… Speech recognition, voice authentication and text-to-speech engines; API:s to create
Read more