60 %
40 %
Information about SearchMonkey

Published on October 6, 2008

Author: ptarjan



This is the basis for our "Intro to SearchMonkey" talks

Monkey with Yahoo! Search

SearchMonkey Presentation by: Paul Tarjan, Chief Technical Monkey ( Online at: 2 |

What is SearchMonkey? an open platform for using structured data to build more useful and relevant search results Before After 3 |

Enhanced Result: Zagat Image Links Key/Value Pairs or Abstract 4 |

Infobar: Wikipedia Preview Summary Blob 5 |

Part of the puzzle 6 |

Vocabularies • Need to speak the same language • I like to see girls of that... caliber. • English, French, Spanish, Esparanto? • URLs to the rescue – Dublin Core ( – Friend of a Friend ( – X-Friend Network ( – … (many more) 7 |

Syntax • Nouns, Verbs, and Adjectives, oh my! • All phrases become lots of triples • (Subject, Verb / Adj. / Prep. / etc, Object) • Key / Value pairs ++ – Everything is a URL or String – Subject doesn’t have to be the document 8 |

Syntax 2 • Key / Value pair – Title = Awesome SearchMonkey Presentation – Homepage = • Triples – (self,, “Awesome SearchMonkey Presentation”) – (self, http://vcard#url, 9 |

Decompose to triples • I like to eat red candy – (self,, – (,, – (,, • Unnamed nodes are O.K. 10 |

How to get data to SearchMonkey? Humans see: • name • picture of a person • current job • industry, … Computers see: an undifferentiated blob of HTML Can we make computers smarter? 11 |

Artificial intelligence is hard. Plus … 12 |

How does it work? site owners/publishers share structured data with Yahoo!. 1 site owners & third-party developers build SearchMonkey apps. 2 consumers customize their search experience with Enhanced Results or Infobars 3 Page Extraction RDF/Microformat Markup’s Web Pages Index DataRSS feed Web Services’s database 13 |

Innards of SearchMonkey • You build a web-service inside our framework • When a search page renders – We check which SM apps are enabled – We call them • 50ms for in-page • Long time for AJAX – They return data in our template – We render them (and cache) 14 |

Inside SM Developer Developer Publisher 15 |

Data Sources: RDF and Microformats Name Cached Open Mode Notes Yahoo! Index yes yes Passive Old-School Y! Index data RDFa, eRDF yes yes Passive Vocab + markup decoupled Microformats yes yes Passive Vocab + markup coupled DataRSS feed yes no Active Atom + metadata XSLT no no Active Good for prototyping Web Service no no Active Brings in remote data 16 |

Approach #1: Embedded RDF <?xml version=quot;1.0quot; encoding=quot;UTF-8quot;?> <!DOCTYPE html PUBLIC quot;-//W3C//DTD XHTML+RDFa 1.0//EN” quot;;> <html xmlns= xmlns:dc= xmlns:foaf= • Cached data lang=quot;enquot; xml:lang=quot;enquot;> <head> • allows Enhanced Results <title>The Amazing Home Page of Joe Smith</title> </head> • but not for dynamic data <body> <h1 property=quot;dc:titlequot;>Joe's Home Page</h1> • Reuse existing markup <div rel=quot;foaf:makerquot;> • but requires site redesign <h2 property=quot;foaf:namequot;>Joe Smith</h2> <div rel=quot;foaf:depictionquot; • Open approach resource=quot;;> <img src=quot;/images/jsmith.pngquot; • everyone can use alt=quot;Smiling headshot of Joequot; /> <p property=quot;dc:rightsquot;>Creative Commons • Passive, crawled by Y! Attribution 3.0 Unported</p> </div> • less bureaucracy to set up </div> … 17 |

Approach #2: Embedded Microformats <div id=quot;hcard-Joe-Smithquot; class=quot;vcardquot;> <span class=quot;fnquot;>Joe Smith</span> <div class=quot;adrquot;> <div class=quot;street-addressquot;>123 Murphy Avenue</div> <span class=quot;localityquot;>Sunnyvale</span>, • Cached data <span class=quot;regionquot;>California</span> <span class=quot;postal-codequot;>94086</span> • allows Enhanced Results </div> <div class=quot;telquot;>(408) 555-1234</div> • but not for dynamic data </div>… • Reuse existing markup • but requires site redesign • Open approach • everyone can use • Passive, crawled by Y! • less bureaucracy to set up 18 |

Approach #3: DataRSS Feed <?profile ?> <feed xmlns:xsi=quot;; xsi:schemaLocation=quot; ../xsd/datarss.xsdquot; xmlns:dc=quot;” xmlns=quot;; xmlns:commerce=quot;; • Cached data xmlns:y=quot;;> <id></id> • allows Enhanced Results <author><name>Peter Mika (</name></author> • but not for dynamic data <title>Example data feed for Local</title> <updated>2008-07-16T04:05:06+07:00</updated> Generate feed from DB • <entry> • and maintain afterwards <title>Parcel 104</title> <id></id> • Closed approach <updated>2008-07-16T04:05:06+07:00</updated> <content type=quot;application/xmlquot;> • only Yahoo! gets data <y:adjunct version=quot;1.0quot; name=quot;”> • Actively provide a feed <y:item rel=quot;dc:subjectquot;> <y:type typeof=quot;vcard:VCard commerce:Restaurant”> • <y:meta property=quot;commerce:hoursOfOperationquot;> coord w/Yahoo! to set up Breakfast daily, Lunch Mon.-Fri., Dinner Mon.-Sat. 19 |

Approach #4: Extract with XSLT <?xml version=quot;1.0quot;?> <xsl:stylesheet xmlns:xsl=quot;; version=quot;1.0quot;> <xsl:template match=quot;/quot;> <adjunctcontainer> <adjunct id=quot;smid:{$smid}quot; version=quot;1.0quot;> <item rel=quot;rel:Photo” • Generally not cached resource=quot;{//div[@class='hresume']//div[@class='image']/img/@src}quot;/> <item rel=quot;rel:Cardquot;> • too slow, infobar only <meta property=quot;vcard:fnquot;> • but good for dynamic <xsl:value-of select=quot;//div[@class='hresume']//span[contains(@class,'fn')]quot;/> data </meta> Scrape page with XSLT • <meta property=quot;vcard:titlequot;> <xsl:value-of select=quot;//div[@class='hresume']//ul[@class='current']/liquot;/> • operates on cleaned up </meta> version of the DOM </item> </adjunct> • watch out for template </adjunctcontainer> changes </xsl:template> </xsl:stylesheet> • Easy to prototype 20 |

Prototyping with XSLT • What if I don’t have structured data? – I don’t own the site – I do own the site, but I want to prototype first • Build an XSLT custom data service first – Write some XSLT to extract the data and transform it into DataRSS – Mostly about finding the right XPath (use Firebug or XPather ) – Quick to implement, but brittle – Can’t do a good Enhanced Result 21 |

Approach #5: Call a Web Service <?xml version=quot;1.0quot;?> <xsl:stylesheet xmlns:xsi= xmlns:xsl=quot;; version=quot;1.0” xmlns:h= xmlns:y=quot;urn:yahoo:srch” xsi:schemaLocation=quot;urn:yahoo:srch • Generally not cached;> <xsl:template match=quot;/quot;> • too <adjunctcontainer xmlns:my=quot;;> slow, infobar only <adjunct id=quot;smid:{$smid}quot; version=quot;1.0quot;> • but good for dynamic data <meta property=quot;my:link1quot;> • <xsl:value-of select=quot;//y:Result[1]/y:Urlquot;/> Call a Remote Web Service </meta> • allows SearchMonkey <meta property=quot;my:result1quot;> <xsl:value-of select=quot;//y:Result[1]/y:Titlequot;/> apps to glue together </meta> • can handle OpenSearch </adjunct> XML natively </adjunctcontainer> </xsl:template> </xsl:stylesheet> 22 |

Creating an Infobar • Infobar advantages – Annotate someone else’s site – Use links and images from other domains • Mash up info from multiple sites • Affiliate / coupon links? Hmmm… – Can act on *, all websites • But these apps can be annoying if poorly designed • Key design principles – Put something useful in the summary – Be creative with the HTML 23 |

Resources • Main: – • Lists and forums: – – • RDF and Microformats: – – 24 |

Do it for real • Demo 25 |

Ninja Coding Techniques: Enter the Monkey 26 |

Typical SearchMonkey PHP code $ret['title'] = Data::get('’ ; // Image $ret['image']['src'] = Data::get(''); $ret['image']['alt'] = SMDEFAULT; $ret['image']['title'] = SMDEFAULT; $ret['image']['allowResize'] = true; // Key Value pairs - up to 4 $ret['dict'][0]['key'] = quot;Affiliationquot;; $ret['dict'][0]['value'] = Data::get(''); $ret['dict'][1]['key'] = quot;Contactquot;; $ret['dict'][1]['value'] = Data::get(''); 27 |

Your first mistake may be your last! 28 |

True ninjas leave no room for error // Get the list of businesses. If we // get at least one, extract the // address and telephone number $appNodeList = Data::xpath(quot;/*/adjunct/item[@rel='rel:Listing']quot;); $yd = $appNodeList->item(0); $adr = $tel = quot;”; $nodeList = Data::xpath(quot;item[@rel='rel:Business']quot;, $yd); if ($nodeList->length != 0) { $nd = $nodeList->item(0); $adr = Data::xpathString(quot;meta[@property='vcard:adr']quot;, $nd); $tel = Data::xpathString(quot;meta[@property='vcard:tel']quot;, $nd); } if ($r_rating != quot;quot;) { $ratingstr = Data::getStarsFromNum($r_rating); if ($r_summary != quot;quot;) { $ratingstr = $ratingstr . quot; quot; . $r_summary; 29 |

Useful conditional tricks • Check for empty data like this: – if (‘’==trim($var)) • Watch out for $a.’–’.$b.’-’.$c – What happens if these variables are empty? • You can create helper functions! – getOutput() must return an array, but there’s no reason not to create other functions – Call using self::function() instead of just function() 30 |

Development (test, debug, collaborate) • Your two best friends: input and output • Collaborative development – Create a shared Y!ID for your organization – Export and import apps from the dashboard • Bellwethers – Start with just one or two, for simplicity – Once app is working, hit “autofind” and look at all ten, see what breaks – Always set the #1 bellwether to something that’s high-ranking; that’s your Gallery preview 31 |

Image Helper Functions • Data::getStars(string $data_get_path) – i.e. Data::getStars(“smid:Jk8/review:rating”) • Data::getStarsFromNum(float $rating) – Must scale $rating to fall between 0-5 inclusive • Data::getImage(string $name) – Adds icons to your app • Data::getImage(“information”) • Data::getImage(“email”) • Data::getImage(“edit”) •… 32 |

XML functions • NodeList Data::xpath($string query [, DOMNode $contextnode) – More complicated than Data::get() – Can count, iterate, find children – Can fetch all vcard:fn, regardless where they are – Can find a node and grab 1st four children • string Data::xpathString($string query [, DOMNode $contextnode) – Convenience function if you don’t need to do further DOM manipulation 33 |

Infobar Design: Party like it’s 1999 • Sadly, can’t use CSS – and the default stylesheet strips off most style – thus lists won’t even display bullets or numbers, you have to fake this • Layout: use tables (remember tables?) • Fonts: can use <font color>, <font face>, <big>, <small> • Make good use of images and links • PRO TIP: Use PHP HEREDOC (<<<) 34 |

Let Infobars be Infobars • Make use of the real estate 35 |

Let Infobars be Infobars • Or be minimal • But don’t do an Infobar that’s really just an Enhanced Result in disguise – Use the blob and summary – Don’t use the thumbnail, key/value pairs, … 36 |

Triggering on * • This can be annoying for general audiences – but it’s hard to abort an infobar before 50ms – and you can’t do this in the PHP layer if you depend on an extractor or web service – Data has to be provided by a feed or by structured markup • For specialized audiences a “*” infobar might be ok 37 |

Triggering on * 38 |

Triggering on * • Trigger on structured markup – Ex: Creative Commons Infobar • Use feeds to annotate the URLs you want • Instead of *, do a comma-separated list of sites: –*,*,*,*, … 39 |

XSLT Extractors • Use the Firebug extension for Firefox – And Xpather, an extension for Firefox • Typical pattern: a skeleton of DataRSS, into which you plug some Xpath – For more complex XSL: • Use <xsl:template> • <xsl:for-each> is clumsier • Find a good ID to cling to – Compare (easy) to (harder) 40 |

Examples • Rubic’s cube • VTA Bus • API Monkey • BugMeNot • RetailMeNot • Amazon 41 |

questions? 42 |

Add a comment

Related presentations

Related pages

Home []

Searchmonkey. Write the radioactivity amount, the irrespective of age with the the treatment of OCD was of a mixture of 250 placebo-controlled study with ...
Read more

Yahoo Developer Network

Yahoo Mobile Developer Suite for your apps Measure, monetize, advertise and improve your apps with Yahoo tools.
Read more

searchmonkey download |

searchmonkey download. searchmonkey 2013-05-02 18:48:19 free download. searchmonkey Power searching without the pain. Perform powerful desktop ...
Read more

SearchMonkey - Site Owner Overview - YDN

SearchMonkey®: Site Owner Overview. SearchMonkey is fundamentally about transforming the way search results are displayed. By sharing structured data with ...
Read more

Yahoo! SearchMonkey - Wikipedia, the free encyclopedia

Yahoo! SearchMonkey (often misspelled Search Monkey) was a Yahoo! service which allowed developers and site owners to use structured data to make Yahoo!
Read more

SearchMonkey - Facebook

SearchMonkey. 197 likes. SearchMonkey is Yahoo! Search's open platform. With SearchMonkey, you can build apps to enhance and mash-up search results...
Read more

SearchMonkey - Yahoo! Search Blog

SearchMonkey and the structured Web We’ve just announced an all-new Yahoo! Search experience, with many new features powered by SearchMonkey data.
Read more

SearchMonkey Media - Affiliate and Social Media Marketing

SearchMonkey Media provides PPC Management, SEM Marketing, SEO, Social Media Marketing, Affiliate Marketing, Content Monetization, Site Development ...
Read more

Searchmonkey: jetzt schon 4 Fragen ... -

Hallo! Habe gestern das Suchwerkzeug "Searchmonkey" entdeckt, mit dem man mit Hilfe von regulären Ausdrücken nach Dateien und -inhalten suchen kann.
Read more

searchmonkey (@searchmonkey) | Twitter

The latest Tweets from searchmonkey (@searchmonkey). Yahoo Open Search Platform. Mission College, Santa Clara
Read more