SearchMonkey

60 %
40 %
Information about SearchMonkey
Technology

Published on October 6, 2008

Author: ptarjan

Source: slideshare.net

Description

This is the basis for our "Intro to SearchMonkey" talks

Monkey with Yahoo! Search

SearchMonkey Presentation by: Paul Tarjan, Chief Technical Monkey (ptarjan@yahoo-inc.com) Online at: http://www.slideshare.net/ptarjan/searchmonkey-presentation 2 | http://developer.yahoo.com/searchmonkey

What is SearchMonkey? an open platform for using structured data to build more useful and relevant search results Before After 3 | http://developer.yahoo.com/searchmonkey

Enhanced Result: Zagat Image Links Key/Value Pairs or Abstract 4 | http://developer.yahoo.com/searchmonkey

Infobar: Wikipedia Preview Summary Blob 5 | http://developer.yahoo.com/searchmonkey

Part of the puzzle 6 | http://developer.yahoo.com/searchmonkey

Vocabularies • Need to speak the same language • I like to see girls of that... caliber. • English, French, Spanish, Esparanto? • URLs to the rescue – Dublin Core (http://purl.org/dc/elements/1.1/) – Friend of a Friend (http://xmlns.com/foaf/0.1/) – X-Friend Network (http://gmpg.org/xfn/11/) – … (many more) 7 | http://developer.yahoo.com/searchmonkey

Syntax • Nouns, Verbs, and Adjectives, oh my! • All phrases become lots of triples • (Subject, Verb / Adj. / Prep. / etc, Object) • Key / Value pairs ++ – Everything is a URL or String – Subject doesn’t have to be the document 8 | http://developer.yahoo.com/searchmonkey

Syntax 2 • Key / Value pair – Title = Awesome SearchMonkey Presentation – Homepage = http://search.yahoo.com/searchmonkey • Triples – (self, http://purl.org/dc#title, “Awesome SearchMonkey Presentation”) – (self, http://vcard#url, http://search.yahoo.com/searchmonkey) 9 | http://developer.yahoo.com/searchmonkey

Decompose to triples • I like to eat red candy – (self, http://example.com/likeEating, http://example.org/temp/redcandy) – (http://example.org/temp/redcandy, http://example.com/isColored, http://example.org/colors/red) – (http://example.org/temp/redcandy, http://example.com/isInstanceOf, http://example.org/food/candy) • Unnamed nodes are O.K. 10 | http://developer.yahoo.com/searchmonkey

How to get data to SearchMonkey? Humans see: • name • picture of a person • current job • industry, … Computers see: an undifferentiated blob of HTML Can we make computers smarter? 11 | http://developer.yahoo.com/searchmonkey

Artificial intelligence is hard. Plus … 12 | http://developer.yahoo.com/searchmonkey

How does it work? site owners/publishers share structured data with Yahoo!. 1 site owners & third-party developers build SearchMonkey apps. 2 consumers customize their search experience with Enhanced Results or Infobars 3 Page Extraction RDF/Microformat Markup Acme.com’s Web Pages Index DataRSS feed Web Services Acme.com’s database 13 | http://developer.yahoo.com/searchmonkey

Innards of SearchMonkey • You build a web-service inside our framework • When a search page renders – We check which SM apps are enabled – We call them • 50ms for in-page • Long time for AJAX – They return data in our template – We render them (and cache) 14 | http://developer.yahoo.com/searchmonkey

Inside SM Developer Developer Publisher 15 | http://developer.yahoo.com/searchmonkey

Data Sources: RDF and Microformats Name Cached Open Mode Notes Yahoo! Index yes yes Passive Old-School Y! Index data RDFa, eRDF yes yes Passive Vocab + markup decoupled Microformats yes yes Passive Vocab + markup coupled DataRSS feed yes no Active Atom + metadata XSLT no no Active Good for prototyping Web Service no no Active Brings in remote data 16 | http://developer.yahoo.com/searchmonkey

Approach #1: Embedded RDF <?xml version=quot;1.0quot; encoding=quot;UTF-8quot;?> <!DOCTYPE html PUBLIC quot;-//W3C//DTD XHTML+RDFa 1.0//EN” quot;http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtdquot;> <html xmlns=http://www.w3.org/1999/xhtml xmlns:dc=http://purl.org/dc/elements/1.1/ xmlns:foaf=http://xmlns.com/foaf/0.1/ • Cached data lang=quot;enquot; xml:lang=quot;enquot;> <head> • allows Enhanced Results <title>The Amazing Home Page of Joe Smith</title> </head> • but not for dynamic data <body> <h1 property=quot;dc:titlequot;>Joe's Home Page</h1> • Reuse existing markup <div rel=quot;foaf:makerquot;> • but requires site redesign <h2 property=quot;foaf:namequot;>Joe Smith</h2> <div rel=quot;foaf:depictionquot; • Open approach resource=quot;http://joesmith.org/images/jsmith.pngquot;> <img src=quot;/images/jsmith.pngquot; • everyone can use alt=quot;Smiling headshot of Joequot; /> <p property=quot;dc:rightsquot;>Creative Commons • Passive, crawled by Y! Attribution 3.0 Unported</p> </div> • less bureaucracy to set up </div> … 17 | http://developer.yahoo.com/searchmonkey

Approach #2: Embedded Microformats <div id=quot;hcard-Joe-Smithquot; class=quot;vcardquot;> <span class=quot;fnquot;>Joe Smith</span> <div class=quot;adrquot;> <div class=quot;street-addressquot;>123 Murphy Avenue</div> <span class=quot;localityquot;>Sunnyvale</span>, • Cached data <span class=quot;regionquot;>California</span> <span class=quot;postal-codequot;>94086</span> • allows Enhanced Results </div> <div class=quot;telquot;>(408) 555-1234</div> • but not for dynamic data </div>… • Reuse existing markup • but requires site redesign • Open approach • everyone can use • Passive, crawled by Y! • less bureaucracy to set up 18 | http://developer.yahoo.com/searchmonkey

Approach #3: DataRSS Feed <?profile http://search.yahoo.com/searchmonkey-profile ?> <feed xmlns:xsi=quot;http://www.w3.org/2001/XMLSchema-instancequot; xsi:schemaLocation=quot;http://www.w3.org/2005/Atom ../xsd/datarss.xsdquot; xmlns:dc=quot;http://purl.org/dc/terms/” xmlns=quot;http://www.w3.org/2005/Atomquot; xmlns:commerce=quot;http://search.yahoo.com/searchmonkey/commerce/quot; • Cached data xmlns:y=quot;http://search.yahoo.com/datarss/quot;> <id>http://local.yahoo.com/datarss/</id> • allows Enhanced Results <author><name>Peter Mika (pmika@yahoo-inc.com)</name></author> • but not for dynamic data <title>Example data feed for Local</title> <updated>2008-07-16T04:05:06+07:00</updated> Generate feed from DB • <entry> • and maintain afterwards <title>Parcel 104</title> <id>http://local.yahoo.com/info-21583016-parcel-104-santa-clara</id> • Closed approach <updated>2008-07-16T04:05:06+07:00</updated> <content type=quot;application/xmlquot;> • only Yahoo! gets data <y:adjunct version=quot;1.0quot; name=quot;com.yahoo.local”> • Actively provide a feed <y:item rel=quot;dc:subjectquot;> <y:type typeof=quot;vcard:VCard commerce:Restaurant”> • <y:meta property=quot;commerce:hoursOfOperationquot;> coord w/Yahoo! to set up Breakfast daily, Lunch Mon.-Fri., Dinner Mon.-Sat. 19 | http://developer.yahoo.com/searchmonkey

Approach #4: Extract with XSLT <?xml version=quot;1.0quot;?> <xsl:stylesheet xmlns:xsl=quot;http://www.w3.org/1999/XSL/Transformquot; version=quot;1.0quot;> <xsl:template match=quot;/quot;> <adjunctcontainer> <adjunct id=quot;smid:{$smid}quot; version=quot;1.0quot;> <item rel=quot;rel:Photo” • Generally not cached resource=quot;{//div[@class='hresume']//div[@class='image']/img/@src}quot;/> <item rel=quot;rel:Cardquot;> • too slow, infobar only <meta property=quot;vcard:fnquot;> • but good for dynamic <xsl:value-of select=quot;//div[@class='hresume']//span[contains(@class,'fn')]quot;/> data </meta> Scrape page with XSLT • <meta property=quot;vcard:titlequot;> <xsl:value-of select=quot;//div[@class='hresume']//ul[@class='current']/liquot;/> • operates on cleaned up </meta> version of the DOM </item> </adjunct> • watch out for template </adjunctcontainer> changes </xsl:template> </xsl:stylesheet> • Easy to prototype 20 | http://developer.yahoo.com/searchmonkey

Prototyping with XSLT • What if I don’t have structured data? – I don’t own the site – I do own the site, but I want to prototype first • Build an XSLT custom data service first – Write some XSLT to extract the data and transform it into DataRSS – Mostly about finding the right XPath (use Firebug or XPather ) – Quick to implement, but brittle – Can’t do a good Enhanced Result 21 | http://developer.yahoo.com/searchmonkey

Approach #5: Call a Web Service <?xml version=quot;1.0quot;?> <xsl:stylesheet xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance xmlns:xsl=quot;http://www.w3.org/1999/XSL/Transformquot; version=quot;1.0” xmlns:h=http://www.w3.org/1999/xhtml xmlns:y=quot;urn:yahoo:srch” xsi:schemaLocation=quot;urn:yahoo:srch • Generally not cached http://api.search.yahoo.com/SiteExplorerService/V1/PageDataResponse.xsdquot;> <xsl:template match=quot;/quot;> • too <adjunctcontainer xmlns:my=quot;http://example.com/ns/1.0quot;> slow, infobar only <adjunct id=quot;smid:{$smid}quot; version=quot;1.0quot;> • but good for dynamic data <meta property=quot;my:link1quot;> • <xsl:value-of select=quot;//y:Result[1]/y:Urlquot;/> Call a Remote Web Service </meta> • allows SearchMonkey <meta property=quot;my:result1quot;> <xsl:value-of select=quot;//y:Result[1]/y:Titlequot;/> apps to glue together </meta> • can handle OpenSearch </adjunct> XML natively </adjunctcontainer> </xsl:template> </xsl:stylesheet> 22 | http://developer.yahoo.com/searchmonkey

Creating an Infobar • Infobar advantages – Annotate someone else’s site – Use links and images from other domains • Mash up info from multiple sites • Affiliate / coupon links? Hmmm… – Can act on *, all websites • But these apps can be annoying if poorly designed • Key design principles – Put something useful in the summary – Be creative with the HTML 23 | http://developer.yahoo.com/searchmonkey

Resources • Main: – http://developer.yahoo.com/searchmonkey • Lists and forums: – searchmonkey-developers@yahoogroups.com – http://suggestions.yahoo.com/searchmonkey • RDF and Microformats: – http://microformats.org – http://www.w3.org/TR/xhtml-rdfa-primer/ 24 | http://developer.yahoo.com/searchmonkey

Do it for real • Demo 25 | http://developer.yahoo.com/searchmonkey

Ninja Coding Techniques: Enter the Monkey 26 | http://developer.yahoo.com/searchmonkey

Typical SearchMonkey PHP code $ret['title'] = Data::get('com.yahoo.uf.hresume/dc:subject/resume:contact/vcard:title’ ; // Image $ret['image']['src'] = Data::get('com.yahoo.uf.hcard/rel:Card/vcard:photo/@resource'); $ret['image']['alt'] = SMDEFAULT; $ret['image']['title'] = SMDEFAULT; $ret['image']['allowResize'] = true; // Key Value pairs - up to 4 $ret['dict'][0]['key'] = quot;Affiliationquot;; $ret['dict'][0]['value'] = Data::get('com.yahoo.uf.hresume/resume:affiliation/vcard:org/vcard:organization-name'); $ret['dict'][1]['key'] = quot;Contactquot;; $ret['dict'][1]['value'] = Data::get('com.yahoo.uf.hresume/dc:subject/resume:contact/@resource'); 27 | http://developer.yahoo.com/searchmonkey

Your first mistake may be your last! 28 | http://developer.yahoo.com/searchmonkey

True ninjas leave no room for error // Get the list of businesses. If we // get at least one, extract the // address and telephone number $appNodeList = Data::xpath(quot;/*/adjunct/item[@rel='rel:Listing']quot;); $yd = $appNodeList->item(0); $adr = $tel = quot;”; $nodeList = Data::xpath(quot;item[@rel='rel:Business']quot;, $yd); if ($nodeList->length != 0) { $nd = $nodeList->item(0); $adr = Data::xpathString(quot;meta[@property='vcard:adr']quot;, $nd); $tel = Data::xpathString(quot;meta[@property='vcard:tel']quot;, $nd); } if ($r_rating != quot;quot;) { $ratingstr = Data::getStarsFromNum($r_rating); if ($r_summary != quot;quot;) { $ratingstr = $ratingstr . quot; quot; . $r_summary; 29 | http://developer.yahoo.com/searchmonkey

Useful conditional tricks • Check for empty data like this: – if (‘’==trim($var)) • Watch out for $a.’–’.$b.’-’.$c – What happens if these variables are empty? • You can create helper functions! – getOutput() must return an array, but there’s no reason not to create other functions – Call using self::function() instead of just function() 30 | http://developer.yahoo.com/searchmonkey

Development (test, debug, collaborate) • Your two best friends: input and output • Collaborative development – Create a shared Y!ID for your organization – Export and import apps from the dashboard • Bellwethers – Start with just one or two, for simplicity – Once app is working, hit “autofind” and look at all ten, see what breaks – Always set the #1 bellwether to something that’s high-ranking; that’s your Gallery preview 31 | http://developer.yahoo.com/searchmonkey

Image Helper Functions • Data::getStars(string $data_get_path) – i.e. Data::getStars(“smid:Jk8/review:rating”) • Data::getStarsFromNum(float $rating) – Must scale $rating to fall between 0-5 inclusive • Data::getImage(string $name) – Adds icons to your app • Data::getImage(“information”) • Data::getImage(“email”) • Data::getImage(“edit”) •… 32 | http://developer.yahoo.com/searchmonkey

XML functions • NodeList Data::xpath($string query [, DOMNode $contextnode) – More complicated than Data::get() – Can count, iterate, find children – Can fetch all vcard:fn, regardless where they are – Can find a node and grab 1st four children • string Data::xpathString($string query [, DOMNode $contextnode) – Convenience function if you don’t need to do further DOM manipulation 33 | http://developer.yahoo.com/searchmonkey

Infobar Design: Party like it’s 1999 • Sadly, can’t use CSS – and the default stylesheet strips off most style – thus lists won’t even display bullets or numbers, you have to fake this • Layout: use tables (remember tables?) • Fonts: can use <font color>, <font face>, <big>, <small> • Make good use of images and links • PRO TIP: Use PHP HEREDOC (<<<) 34 | http://developer.yahoo.com/searchmonkey

Let Infobars be Infobars • Make use of the real estate 35 | http://developer.yahoo.com/searchmonkey

Let Infobars be Infobars • Or be minimal • But don’t do an Infobar that’s really just an Enhanced Result in disguise – Use the blob and summary – Don’t use the thumbnail, key/value pairs, … 36 | http://developer.yahoo.com/searchmonkey

Triggering on * • This can be annoying for general audiences – but it’s hard to abort an infobar before 50ms – and you can’t do this in the PHP layer if you depend on an extractor or web service – Data has to be provided by a feed or by structured markup • For specialized audiences a “*” infobar might be ok 37 | http://developer.yahoo.com/searchmonkey

Triggering on * 38 | http://developer.yahoo.com/searchmonkey

Triggering on * • Trigger on structured markup – Ex: Creative Commons Infobar • Use feeds to annotate the URLs you want • Instead of *, do a comma-separated list of sites: – www.uiuc.edu/*, www.stanford.edu/*, www.berkeley.edu/*, www.cmu.edu/*, … 39 | http://developer.yahoo.com/searchmonkey

XSLT Extractors • Use the Firebug extension for Firefox – And Xpather, an extension for Firefox • Typical pattern: a skeleton of DataRSS, into which you plug some Xpath – For more complex XSL: • Use <xsl:template> • <xsl:for-each> is clumsier • Find a good ID to cling to – Compare arxiv.org (easy) to acm.org (harder) 40 | http://developer.yahoo.com/searchmonkey

Examples • Rubic’s cube • VTA Bus • API Monkey • BugMeNot • RetailMeNot • Amazon 41 | http://developer.yahoo.com/searchmonkey

questions? 42 | http://developer.yahoo.com/searchmonkey

Add a comment

Related presentations

Related pages

Home [searchmonkey.embeddediq.com]

Searchmonkey. Write the radioactivity amount, the irrespective of age with the the treatment of OCD was of a mixture of 250 placebo-controlled study with ...
Read more

Yahoo Developer Network

Yahoo Mobile Developer Suite for your apps Measure, monetize, advertise and improve your apps with Yahoo tools.
Read more

searchmonkey download | SourceForge.net

searchmonkey download. searchmonkey 2013-05-02 18:48:19 free download. searchmonkey Power searching without the pain. Perform powerful desktop ...
Read more

SearchMonkey - Site Owner Overview - YDN

SearchMonkey®: Site Owner Overview. SearchMonkey is fundamentally about transforming the way search results are displayed. By sharing structured data with ...
Read more

Yahoo! SearchMonkey - Wikipedia, the free encyclopedia

Yahoo! SearchMonkey (often misspelled Search Monkey) was a Yahoo! service which allowed developers and site owners to use structured data to make Yahoo!
Read more

SearchMonkey - Facebook

SearchMonkey. 197 likes. SearchMonkey is Yahoo! Search's open platform. With SearchMonkey, you can build apps to enhance and mash-up search results...
Read more

SearchMonkey - Yahoo! Search Blog

SearchMonkey and the structured Web We’ve just announced an all-new Yahoo! Search experience, with many new features powered by SearchMonkey data.
Read more

SearchMonkey Media - Affiliate and Social Media Marketing

SearchMonkey Media provides PPC Management, SEM Marketing, SEO, Social Media Marketing, Affiliate Marketing, Content Monetization, Site Development ...
Read more

Searchmonkey: jetzt schon 4 Fragen ... - forum.ubuntuusers.de

Hallo! Habe gestern das Suchwerkzeug "Searchmonkey" entdeckt, mit dem man mit Hilfe von regulären Ausdrücken nach Dateien und -inhalten suchen kann.
Read more

searchmonkey (@searchmonkey) | Twitter

The latest Tweets from searchmonkey (@searchmonkey). Yahoo Open Search Platform. Mission College, Santa Clara
Read more