Published on March 19, 2008
Minimally Supervised Learning of Semantic Knowledge from Query Logs: Minimally Supervised Learning of Semantic Knowledge from Query Logs IJCNLP-08, Hyderabad, India 2008/3/17 Mamoru Komachi(†) and Hisami Suzuki(‡) (†) Nara Institute of Science and Technology, Japan (‡) Microsoft Research, USA Task: Task 2008/3/17 2 Learn semantic categories from web search query logs by bootstrapping with minimal supervision Semantic category: a set of words which are interrelated Named entities, technical terms, paraphrases, … Can be useful for search ads, etc… Darjeeling Chai (Indian tea) Kombucha (Japanese tea) similar similar Approach: Approach 2008/3/17 3 Our Contribution: Our Contribution 2008/3/17 4 Table of Contents: Table of Contents 2008/3/17 5 Bootstrapping: Bootstrapping Iteratively conduct pattern induction and instance extraction starting from seed instances Can fertilize small set of seed instances Instances Contextual patterns Query log (Corpus) vaio Compare vaio laptop Compare # laptop Compare toshiba satellite laptop Compare HP xb3000 laptop Toshiba satellite HP xb3000 #:slot Instance lookup and pattern induction: Instance lookup and pattern induction 2008/3/17 7 ANA 予約 ANA # 予約 query log instance extracted pattern Restaurant reservation? Flight reservation? Generic patterns Broad coverage, Noisy patterns Instance/Pattern Scoring Metrics: Instance/Pattern Scoring Metrics 2008/3/17 8 P: patterns in corpus I: instances in corpus PMI: pointwise mutual information r: reliability score Reliability of an instance and a pattern is mutually defined PMI is normalized by the maximum of all P and I Problems of Espresso: Problems of Espresso 2008/3/17 9 The Tchai Algorithm: The Tchai Algorithm 2008/3/17 10 Comparison of methods: Comparison of methods 2008/3/17 11 Experiments: Experiments 2008/3/17 12 Results: Results Travel Finance 2008/3/17 13 Due to the ambiguity of hand labeling (e.g. Tokyo Disney Land) Include common nouns related to Travel (e.g. Rental car) High precision (92.1%) Learned 251 novel words Sample of Instances (Travel category): Sample of Instances (Travel category) 2008/3/17 14 Able to learn several sub-categories in which no seed words given Impact of Pattern Induction: Impact of Pattern Induction 2008/3/17 15 Effect of each modification: Effect of each modification 2008/3/17 16 Scaling factor has the most impact Filtering outperforms no-filtering constantly System Performance: System Performance Travel Finance 2008/3/17 17 Relative Recall (Pantel et al., 2004) High precision and recall High precision but low relative recall due to strict filtering Cumulative precision: Travel: Cumulative precision: Travel 2008/3/17 18 Tchai achieved the best precision Cumulative precision: Finance: Cumulative precision: Finance 2008/3/17 19 Both Basilisk and Espresso suffered from acquiring generic pattern in early stages of iteration Sample Extracted Patterns: Sample Extracted Patterns 2008/3/17 20 Basilisk and Espresso extracted location names as context patterns, which may be too generic for Travel domain Tchai found context patterns that are characteristic to the domain Conclusion and future work: Conclusion and future work 2008/3/17 21 Thank you for listening! : Thank you for listening! 2008/3/17 22 Tchai
A user's search experience may be enhanced by providing additional content based upon an understanding of the user's intent. Query tagging, the assigning ...
5k Backlinks-august-2015 Complex Url Compatible- Feb 18 2010. by jado-mony