C102 Bergallo

67 %
33 %
Information about C102 Bergallo
Entertainment
asr

Published on October 3, 2007

Author: Pravez

Source: authorstream.com

Applying the Pronunciation Lexicon Specification to ASR & TTS:  Monday, August 20, 2007 SpeechTEK ASTS - Advances in Text-to-Speech Processing Applying the Pronunciation Lexicon Specification to ASR & TTS Patrizio Bergallo Agenda:  Agenda Loquendo Today Introduction to PLS Reference Scenario Pronunciation Lexicons International Phonetic Alphabet Overview of PLS How does TTS use PLS? How does ASR use PLS? Examples of Use Latest Improvements Loquendo Today:  Loquendo Today Global company of the Telecom Italia group, leader in Europe and South America in the Speech Technologies market Company founded in 2001 from Telecom Italia Labs, benefiting from know-how gained from more than 30 years research experience Complete set of Multilingual speech technologies on a wide spectrum of devices; 25 patents, 50 voices and 20 languages Full support for international standards (MRCPv1/v2, VoiceXML 2.0/2.1, CCXML, SSML, SRGS, SISR) Company ready for challenging future scenarios: Multimodality, Security 100 employees, and displayed strong growth throughout 2007 HQ in Turin, Offices in US, Spain, Germany and France, and a Worldwide Network of Partners Reference Scenario:  Reference Scenario Many speech applications need to specify pronunciation for words and phrases Surnames, locations, company names Acronyms Names in specific contexts (restaurants, sports, movie titles, etc.) Foreign words, mixed languages Pronunciation is critical both for TTS and ASR Improves reading of prompts by TTS Improves ASR performance VoiceXML 2.0/2.1 applications are the reference scenario Prompts are based on SSML 1.0 (or in future SSML 1.1) Recognition grammars are based on SRGS 1.0 Pronunciation Lexicons:  Pronunciation Lexicons Pronunciation Lexicon a mapping between words (or short phrases), their written representations, and their pronunciations suitable for use by an ASR engine or a TTS engine Pronunciation lexicons are not only useful for voice browsers They have also proven effective mechanisms to support accessibility for the differently able as well as greater usability for all users They are used to good effect in screen readers and user agents supporting multimodal interfaces The W3C Pronunciation Lexicon Specification (PLS) Version 1.0 is designed to enable interoperable specification of pronunciation lexicons Pronunciation Lexicon Specification:  Pronunciation Lexicon Specification W3C specification status Second Last Call Working Draft (26 October, 2006) Currently the Implementation Report Plan and the Disposition of Comments are under development (all public comments were addressed) Candidate Recommendation expected 3Q07 Part of first version of the Speech Interface Framework (Larson, 2000) W3C Recommendation W3C Last Call Working Draft International Phonetic Alphabet:  International Phonetic Alphabet Pronunciation is represented by a phonetic alphabet Standard phonetic alphabets International Phonetic Alphabet (IPA) Well known phonetic alphabet SAMPA - ASCII based (simple to write) Pinyin (Chinese Mandarin), JEITA (Japanese), etc. Proprietary phonetic alphabets International Phonetic Alphabet (IPA) Created by International Phonetic Association (active since 1896), collaborative effort by all the major phoneticians around the world Universally agreed system of notation for sounds of languages Covers all languages Requires UNICODE to write it Normatively referenced by PLS Overview of PLS:  Overview of PLS A PLS document is a container (<lexicon>) of several lexical entries (<lexeme>) Each lexical entry contains One or more spellings (<grapheme>) One or more pronunciations (<phoneme>) or substitutions (<alias>) Each PLS document is related to a single unique language (xml:lang) SSML 1.0 and SRGS 1.0 documents can reference one or more PLS documents Current version doesn’t include morphological, syntactic and semantic information associated with pronunciations PLS Example:  PLS Example <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2005/01/pronunciationlexicon http://www.w3.org/TR/2007/CR-pronunciation-lexicon2007@@@@/pls.xsd" alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>Sepulveda</grapheme> <phoneme>səˈpʌlvɪdə</phoneme> </lexeme> <lexeme> <grapheme>W3C</grapheme> <alias>World Wide Web Consortium</alias> </lexeme> </lexicon> How does TTS use PLS?:  How does TTS use PLS? SSML 1.0 <?xml version="1.0" encoding="UTF-8"?> <speak version="1.0" … xml:lang="en-US"> <lexicon uri="http://www.example.com/SSMLexample.pls"/> The title of the movie is: "La vita è bella" (Life is beautiful), which is directed by Benigni. </speak> PLS 1.0 <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" … alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>La vita è bella</grapheme> <phoneme>ˈlɑ ˈviːɾə ˈʔeɪ ˈbɛlə</phoneme> </lexeme> <lexeme> <grapheme>Benigni</grapheme> <phoneme>bɛˈniːnji</phoneme> </lexeme> </lexicon> How does ASR use PLS?:  How does ASR use PLS? SRGS 1.0 <?xml version="1.0" encoding="UTF-8"?> <grammar version="1.0" … xml:lang="en-US” root="movies" mode="voice"> <lexicon uri="http://www.example.com/SRGSexample.pls"/> <rule id="movies" scope="public"> <one-of> <item>Terminator 2: Judgment Day</item> <item>Pluto's Judgement Day</item> </one-of> </rule> </grammar> PLS 1.0 <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" … alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>judgment</grapheme> <grapheme>judgement</grapheme> <phoneme>ˈdʒʌdʒ.mənt</phoneme> </lexeme> </lexicon> Examples of Use:  Examples of Use Multiple pronunciations for the same orthography Multiple orthographies Homophones Homographs Acronyms, Abbreviations, etc. Multiple pronunciations for the same orthography :  Multiple pronunciations for the same orthography Multiple pronunciations are represented by more than one <phoneme> or <alias> element <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" … alphabet="ipa" xml:lang="en-GB"> <lexeme> <grapheme>Newton</grapheme> <phoneme>ˈnjuːtən</phoneme> <phoneme>ˈnuːtən</phoneme> </lexeme> </lexicon> Multiple orthographies:  Multiple orthographies Alternative textual representations for the same word or phrase are represented by more than one <grapheme> inside the same <lexeme> All the pronunciations given within the <lexeme> apply to each and every <grapheme> within the <lexeme> <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" … alphabet="ipa" xml:lang="jp"> <lexeme> <grapheme>nihongo</grapheme> <grapheme>日本語</grapheme> <grapheme>にほんご</grapheme> <phoneme>ɲihoŋo</phoneme> </lexeme> </lexicon> Homophones:  Homophones Words with the same pronunciation but different meanings are represented as different lexemes <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" … alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>cede</grapheme> <phoneme>siːd</phoneme> </lexeme> <lexeme> <grapheme>seed</grapheme> <phoneme>siːd</phoneme> </lexeme> </lexicon> Homographs (1/2):  Homographs (1/2) Words with the same spelling but pronounced in different ways are represented using the role attribute of the <lexeme> element This mechanism allows for the referencing of defined taxonomies of word classes (part of speech, meaning, etc.) <lexicon version="1.0“ xmlns:claws=“http://www.example.com/claws7tags” alphabet="x-myorganization-pinyin" xml:lang="zh-CN"> <lexeme role="claws:VV0"> <!-- base form of lexical verb --> <grapheme>处</grapheme> <phoneme>chu3</phoneme> <!-- pinyin string is: "chǔ" in 处罚 处置 --> </lexeme> <lexeme role="claws:NN"> <!-- common noun, neutral for number --> <grapheme>处</grapheme> <phoneme>chu4</phoneme> <!-- pinyin string is: "chù" in 处所 妙处 --> </lexeme> </lexicon> Homographs (2/2):  Homographs (2/2) <speak version="1.1“ xmlns:claws="http://www.example.com/claws7tags" xml:lang="zh-CN"> <lexicon uri="http://www.example.com/lexicon.pls“ type="application/pls+xml“ xml:id="mylex"/> <lookup ref="mylex"> 他这个人很不好相<w role="claws:VV0">处</w>。 此<w role="claws:NN">处</w>不准照相。 </lookup> </speak> SSML 1.1 will support the role attribute Currently PLS doesn’t define/mandate any taxonomy PLS generally defines role values as qualified names (QNames) Acronyms, Abbreviations, etc.:  Acronyms, Abbreviations, etc. Pronunciations expressed as a sequence of other orthographies (acronyms, abbreviations, etc.) are represented by the <alias> element <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" … alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>W3C</grapheme> <alias>World Wide Web Consortium</alias> </lexeme> <lexeme> <grapheme>101</grapheme> <alias>one hundred and one</alias> </lexeme> </lexicon> Latest Improvements:  Latest Improvements W3C Last Call Working Draft stage allows public comments to be addressed Large majority were clarifications New functionalities were deferred to a future version of PLS specification Major clarifications were about <alias> recursion Multiple pronunciations Changes are subject to a formal approval by the Working Group Next Steps PLS 1.0 is very close to Candidate Recommendation stage SSML 1.1 will provide a more complete support of PLS 1.0 <alias> recursion:  <alias> recursion Pronunciations of the <alias> element contents MUST be generated by the processor, using pronunciations described by the <phoneme> element of any constituent graphemes in the PLS document, and without invoking recursive access to the PLS document on the <alias> elements of any constituent graphemes <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" … alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>GNU</grapheme> <alias>GNU is Not Unix</alias> <phoneme>gəˈnuː</phoneme> </lexeme> <lexeme> <grapheme>Unix</grapheme> <grapheme>UNIX</grapheme> <alias>a multiplexed information and computing service</alias> <phoneme>ˈjuːnɪks</phoneme> </lexeme> </lexicon> GNU is pronounced: gəˈnuː is Not ˈjuːnɪks Multiple pronunciations (1/2):  Multiple pronunciations (1/2) ASR If more than one pronunciation for a given <lexeme> is specified, an ASR processor MUST consider each of them as valid pronunciations for the <grapheme> TTS If more than one pronunciation for a given <lexeme> is specified, a TTS processor MUST use the first one in document order that has the prefer attribute set to "true“ If none of the pronunciations has prefer set to "true", the TTS processor MUST use the first one in document order unless the TTS processor is documented as having a method of selecting pronunciations, in which case the processor MUST use any one of the pronunciations Multiple pronunciations (2/2):  Multiple pronunciations (2/2) An ASR processor will recognize both pronunciations, whereas a TTS processor will only use the first one (because it is the first in document order that has prefer set to "true"). <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" … alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>lead</grapheme> <alias prefer="true">led</alias> <phoneme prefer="true">liːd</phoneme> </lexeme> <lexeme> <grapheme>led</grapheme> <phoneme>led</phoneme> </lexeme> </lexicon> References:  References PLS 1.0 Second Last Call Working Draft (26 October, 2006) http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20061026/ Voice Browser Activity Page (VoiceXML, SSML, SRGS, …) http://www.w3.org/Voice/ International Phonetic Association http://www.arts.gla.ac.uk/IPA/ VoiceXML Forum http://www.voicexml.org/ Final Remarks:  Final Remarks THANK YOU For more information please Visit Loquendo’s booth #509 Keep an eye on: www.loquendo.com Contact us: patrizio.bergallo@loquendo.com

Add a comment

Related presentations

Related pages

www.regione.liguria.it

[Content_Types].xmlhttp://schemas.openxmlformats.org/package/2006/content-types rels application/vnd.openxmlformats-package.relationships+xml xml ...
Read more

La nuova presidente in gonnella è Elena Bergallo, ... urn:uuid:004b002d-7371-c102-fa7b-f694760e88a1 2010-11-01T20:52:43+01:00 Ora tocca a Roberto Barzanti.
Read more