KevynCollins Thompson June2003

50 %
50 %
Information about KevynCollins Thompson June2003
Education

Published on January 10, 2008

Author: Soffia

Source: authorstream.com

Prosody Models for Automatically Derived Focus Words In Narrative Text:  Prosody Models for Automatically Derived Focus Words In Narrative Text Kevyn Collins-Thompson Speech Seminar June 13, 2003 Prosody Models for Automatically Derived Focus Words:  Prosody Models for Automatically Derived Focus Words When the farmer saw the dark clouds in the eastern sky… When the farmer saw the dark clouds in the eastern sky… Story Text Automatically extract focus features Map word features to pitch and duration ..the farmer saw… What are ‘focus’ words ?:  What are ‘focus’ words ? Focus words are words that are given special attention by a speaker. Changes may be in pitch, timing, duration, energy Why give a word special attention? Topic words that are important to the story Difficult, unexpected, or novel words Modifier words Stories are centered around topic words:  Stories are centered around topic words ‘… When the farmer saw the dark clouds in the eastern sky he knew that rain was coming. The fields badly needed more rain. There had been no rain all summer. Last summer, his crops had plenty of water. But this summer had been very dry…’ We can try to find topic words automatically using relative likelihoods:  We can try to find topic words automatically using relative likelihoods Compare, for each word W in the story: log PS(W) in the story log PB(W) in general ‘background’ English corpus Choose words such that: F(W) = log PS(W) - log PB(W) > T … where T = 5 in my code Topic words are often uncommon words, but not always. Word difficulty is estimated from a general English model:  Word difficulty is estimated from a general English model Calculate P(W) in British Corpus of English 100 million tokens from various genres Word difficulty is estimated by log P(W). We can also use custom language models to estimate difficulty For example, words that most 4-th graders know Difficult or novel words become more familiar with time:  Difficult or novel words become more familiar with time ‘…our food is a product of photosynthesis, the process that converts energy in sunlight to chemical forms of energy. Photosynthesis is carried out by many different organisms. The best known form of photosynthesis …’ Change in difficulty & novelty over time is modeled with an ‘S’ curve:  Change in difficulty & novelty over time is modeled with an ‘S’ curve Repetitions in Time Novelty Factor Modifiers gain importance as a focus word is repeated:  Modifiers gain importance as a focus word is repeated ‘…The fields badly needed more rain. There had been no rain all summer. Last summer, his crops had plenty of water. But this summer had been very dry…’ Derivation: 1. First word of noun phrases containing focus words 2. Very common words (‘a’, ‘the’, …) ignored ‘He picked some plants…’ Word duration is modeled as a mixture of several factors:  Word duration is modeled as a mixture of several factors Word stretch T(W) combines focus F(W) and difficulty D(W) Individual phoneme stretching For infant speech, vowels may be more extended Also ‘close’ consonants like M vs. N, R vs. L Table-based via customized Festival module in C. Segment time: Determining pitch contour for a single syllable:  Determining pitch contour for a single syllable Overall word pitch is highly correlated with focus properties Focus words have highest pitch peaks Several other factors to consider: Stressed / unstressed syllable First / last syllable Sentence type: Question, Exclamation Source: Fernald & Mazzie (1991), ‘Prosody and Focus in Speech to Infants and Adults’. Developmental Psychology Vol. 27, No. 2, 209 - 221 Final pitch contour is derived from syllable peaks:  Final pitch contour is derived from syllable peaks An F0 value f(S) is calculated for each syllable S in word W: fBASE is speaker’s base F0 level (e.g. 90 Hz) F(W) is focus level of word W R(S) = 1 if S is stressed Focus stretch α = 15, stress emphasis β = 0.5 Final F0 contour is piecewise linear This is OK, perceptually close to smooth Source: J ‘t Hart et al. (1990), ‘A Perceptual Study of Intonation’. Cambridge University Press, Cambridge UK. Sample Pitch Contour:  Sample Pitch Contour Is that a dog in the park ? 100 Hz 200 Hz 150 Hz Focus F(W) 0.01 0.58 0.21 4.50 0.12 1.26 4.15 245 Hz Phoneme duration trace: ‘in the park’:  Phoneme duration trace: ‘in the park’ word stretch 0.87 seg name: ih word stretch 0.87 seg name: n word stretch 0.84 seg name: dh word stretch 0.84 seg name: ax word stretch 1.12 seg name: p word stretch 1.12 seg name: aa word stretch 1.12 seg name: r Phoneme: r has avg stretch 2.07, and this stretch is 1.97, so final local stretch = 2.20837 word stretch 1.12 seg name: k Time for a story!:  Time for a story! Synthesis uses basic diphone voice to highlight duration and pitch changes Slide16:  Default synthesis Focus-based prosody How the project was implemented:  How the project was implemented The story text is parsed with the Apple Pie Parser. Vocabulary patterns are analyzed with unigram language models in Perl. Perl script creates a Scheme list of Word utterances Features for topic words, difficulty, etc. Festival is invoked with a customized duration module and Scheme intonation function Ideas for improvement:  Ideas for improvement More accurate modeling of dialogue and other word interaction Include variation in energy levels Customize the language profile for each listener Questions?:  Questions? The End http://www.cs.cmu.edu/~kct/sounds/

Add a comment

Related presentations