Semantic Searchmonkey

67 %
33 %
Information about Semantic Searchmonkey
Technology

Published on March 5, 2009

Author: ptarjan

Source: slideshare.net

Description

Semantic Search + SeachMonkey talk given at Yahoo! Hacku event.

http://developer.yahoo.com/hacku
http://developer.yahoo.com/searchmonkey

Monkey with the Semantic Web

SearchMonkey Presentation by: Paul Tarjan, Chief Technical Monkey (ptarjan@yahoo-inc.com) Online at: http://www.slideshare.net/ptarjan/semantic-searchmonkey

The web was / is fragmented Funny pictures Super secret military site Friend’s website University Cool event page bookmarks

So we added search to find stuff Google Yahoo Super Funny secret pictures military site Friend’s University Cool website event page bookmarks

But there are many similar sites Facebook Events Evite Events Upcoming Events Youtube Metacafe Vimeo Digg Reddit Technorati Let’s treat these as “views” onto “objects”

Wouldn’t it be cool if you could do: •  object:video creator:”Paul Tarjan” length<=60s

Wouldn’t it be cool if you could do: •  object:video creator:http://paulisageek.com/ length<=60s

Wouldn’t it be cool if you could do: •  object:game name:”Desktop Tower Defense” version:1.5 publishdate:”May 2 2005”

Wouldn’t it be cool if you could do: •  object:video author:”The Escapist” game:”Left 4 Dead”

It gets even cooler

Aggregation: •  object:review type:camera make:canon model:D40

Aggregation: •  object:event date:”May 16, 2008” type:party price<$5

Aggregation: •  object:photo person:“Paul Tarjan”

Aggregation: •  object:photo person:http://paulisageek.com

The Semantic What? •  Web pages are views of data for people to read •  Search Engines are a hack •  They treat pages as a bucket of words •  Lets turn the web into a database •  APIs are good, but there is no “web” of APIs •  If you figure out a good way of doing that, let me know 

Ok, I want to do it. Now what?

Recommendation: µF •  If there is a microformat for your data, use it –  hcard –  hreview –  hresume –  hcalendar –  rel-tag –  rel-licence –  xfn –  hatom –  geo

µF in a nutshell •  Change your @class to something that is known •  <div> –  <span class=“name”>Paul Tarjan</span> –  <span class=‘email’>spam@paulisageek.com</span> •  </div> •  BECOMES •  <div class=“vcard”> –  <span class=“fn”>Paul Tarjan</span> –  <span class=“email”>spam@paulisageek.com</span> •  </div>

Recommendation: RDFa •  If you have data that doesn’t really fit in a µF •  Examples: –  Markup APIs (YUI, javadoc, etc) –  Media (Audios, Videos, Games, Presentations) –  Job Postings

RDFa in a nutshell •  Make a namespace •  Use @property, @rel and @resource •  For DATA: @property makes the node contents into the value •  For URLs: @rel makes the @resource into the value

Normal HTML •  <html> … <div class=quot;private”> private static String <strong>_createCookieHash </strong> (hash) …

RDFa: example •  <html xmlns:yui=quot;http://yuilibrary.com/rdf/ 1.0/yui.rdf#quot;> … <div class=quot;private” rel=quot;yui:methodquot; resource=quot;#method__createCookieHashquot;> private static String <strong property=quot;yui:namequot;> _createCookieHash </strong> (hash) …

That’s it! •  Automatically picked up by semantic parsers / crawlers •  Can build a SearchMonkey app on it •  Can make a mashup way easier than screen scraping •  Can get the data from Yahoo! BOSS

What is SearchMonkey? an open platform for using structured data to build more useful and relevant search results Before After

Enhanced Result: Zagat Image Links Key/Value Pairs or Abstract

Infobar: Wikipedia Preview Summary Blob

Part of the puzzle Semantic vocabularies Semantic markup on web pages SearchMonkey

Vocabularies •  Need to speak the same language •  I like to see girls of that... caliber. •  English, French, Spanish, Esparanto? •  URLs to the rescue –  Dublin Core (http://purl.org/dc/elements/1.1/) –  Friend of a Friend (http://xmlns.com/foaf/0.1/) –  X-Friend Network (http://gmpg.org/xfn/11/) –  … (many more)

Syntax •  Nouns, Verbs, and Adjectives, oh my! •  All phrases become lots of triples •  (Subject, Verb / Adj. / Prep. / etc, Object) •  Key / Value pairs ++ –  Everything is a URL or String –  Subject doesn’t have to be the document

Syntax 2 •  Key / Value pair –  Title = Awesome SearchMonkey Presentation –  Homepage = http://search.yahoo.com/searchmonkey •  Triples –  (self, http://purl.org/dc#title, “Awesome SearchMonkey Presentation”) –  (self, http://vcard#url, http://search.yahoo.com/searchmonkey)

Decompose to triples •  My friend “Bob” is an idiot. –  (self, http://xmlns.com/foaf/0.1/knows, genid:Ui__152310312_366) –  (genid:Ui__152310312_366, http:// www.w3.org/2001/vcard-rdf/3.0#fn, “Bob”) –  (genid:Ui__152310312_366, http:// example.org/ptarjan/isInstanceOf, http:// example.org/ptarjan/idiot) •  Unnamed nodes are O.K.

Writing URLs takes a lot of work! •  xmlns:foaf=http://xmlns.com/foaf/0.1/ •  xmlns:vcard=http://www.w3.org/2001/vcard-rdf/ 3.0# •  xmlns:junk=http://example.org/ptarjan/ •  My friend “Bob” is an idiot. –  (self, foaf:knows, genid:Ui__152310312_366) –  (genid:Ui__152310312_366, vcard:fn, “Bob”) –  (genid:Ui__152310312_366, junk:isInstanceOf, junk:idiot) •  Unnamed nodes are O.K.

RDFa •  <html xmlns:foaf=“http://xmlns.com/foaf/0.1” xmlns:vcard=http://www.w3.org/2001/vcard-rdf/ 3.0# xmlns:junk=http://example.org/ptarjan/> <div rel=“foaf:knows”> <span property=“vcard:fn”>Bob</span> <span rel=“junk:isInstanceOf” resource=“junk:idiot” /> </div> </html>

•  </SemanticWeb> •  Questions?

Innards of SearchMonkey •  You build a web-service inside our framework •  When a search page renders –  We check which SM apps are enabled –  We call them • 50ms for in-page • Long time for AJAX –  They return data in our template –  We render them (and cache)

Prototyping with XSLT •  What if I don’t have structured data? –  I don’t own the site –  I do own the site, but I want to prototype first •  Build an XSLT custom data service first –  Write some XSLT to extract the data and transform it into DataRSS –  Mostly about finding the right XPath (use Firebug or XPather ) –  Quick to implement, but brittle –  Can’t do a good Enhanced Result

Do it for real •  Demo

Examples •  Rubic’s cube •  VTA Bus •  API Monkey •  BugMeNot •  RetailMeNot •  Amazon

questions?

Add a comment

Related presentations

Related pages

Yahoo! SearchMonkey - Wikipedia, the free encyclopedia

Yahoo! SearchMonkey was selected as one of the top 10 Semantic Web Products ... Paul Tarjan's Semantic / SearchMonkey presentation; Yahoo! Advertising: APT ...
Read more

The Semantic Web is Here? XML, Calais and SearchMonkey

When we talk about the Semantic Web we mean more meta-information hidden in the page code, but derived from the content itself, with the aim of letting Web ...
Read more

SearchMonkey Rides with the Semantic Web Gang ...

For those of you who are interested in learning more about how structured data fits in with SearchMonkey and Yahoo! Search strategy, please tune in to the ...
Read more

Yahoo SearchMonkey | Strategies for Building Semantic Web ...

Currently Yahoo SearchMonkey accepts Semantic Web data only if provided as RDFa or via the Yahoo-specific dataRSS feed format.
Read more

RDF – notizBlog

Veröffentlicht in Open Web | Tags: BOSS, Microformats, RDF, RDFa, Searchmonkey, Semantic Search, Yahoo | Schreibe ein Kommentar APML 1.0 Initial Draft.
Read more

Talk:Yahoo! SearchMonkey - Wikipedia, the free encyclopedia

Talk:Yahoo! SearchMonkey Search Monkey was selected as one of ... Search Monkey was selected as one of the top 10 Semantic Web Products of 2008.
Read more

SearchMonkey - Site Owner Overview - YDN - Yahoo Developer ...

SearchMonkey®: Site Owner Overview. SearchMonkey is fundamentally about transforming the way search results are displayed. By sharing structured data with ...
Read more

SearchMonkey - Yahoo! Search Blog

SearchMonkey and the structured Web We’ve just announced an all-new Yahoo! Search experience, with many new features powered by SearchMonkey data.
Read more

Yahoo Developer Network

Yahoo Mobile Developer Suite for your apps Measure, monetize, advertise and improve your apps with Yahoo tools.
Read more