Implementing Site Search in CQ5 / AEM

52 %
48 %
Information about Implementing Site Search in CQ5 / AEM

Published on March 5, 2014

Author: rtpaem



Site search is one of the core functionality of any website. This talk provides an overview of internal workings of CQ5 search, its limitations for implementing site search functionality and discusses design patterns & challenges for integrating various 3rd party search providers with CQ5/AEM.


 Session Outline  Importance of site search functionality  CQ5 internal search workings & limitations  Integrating CQ5 with external search engines & challenges  Indexing patterns for integrating with external search engines  Q&A

 Site Search is one of the core functionality of most websites  Browse v/s Search: Alternate methods of allowing visitors to find the information they need quickly and easily  “90 percent of companies report that search is the No.1 means of navigation on their site” -- Forrester Research “82 percent of visitors use site search to find the information they need” -- Juniper Research Advances in search features, which allows site visitors to:     Auto complete/auto correct search terms Build advanced queries, Filter results by facets, Search results refined by location, preferences, previous history, etc “Visitors who used site search were “more likely to convert from browsers to buyers”.” -- Juniper Research

• Jackrabbit internally uses Lucene to Index repository content • Whenever any content is modified, along with it getting stored in repository, lucene index is also updated • Index Location: <crx-quickstart>/repository: • repository/index • workspaces/crx.default/index • Index Configuration: • Repository.xml & workspaces.xml <SearchIndex> block • tika-config.xml in workspaces folder • Changes in new version of Jackrabbit (3.x / Oak)

• Jackrabbit • JCR Spec 1.0: Support for XPATH & JCR SQL1 • JCR Spec 2.0: Support for JCR SQL2. Support for XPATH deprecated in JCR 2.0 but Jackrabbit still supports it • Both SQL & XPATH queries are translated to same search tree • Query Builder is an API to build queries for a query engine • CQ providers several OOTB components & extensions which leverages QueryBuilder API for full text or predicate based searches • OOTB Search Component provides support for full text query and enhanced search features: similar pages, facets support, pagination, etc

 Use Case: Non CQ Content Sources    Use Case: Author v/s Visitor Search Patterns    CQ generates one index per server Author and visitor search patterns and requirements are typically different Performance & Architecture Considerations     Larger sites with more than one source of content and assets. Difficult to index non-CQ content ‘n’ number of queries and search variations – making it difficult to utilize CQ caching architecture Jackrabbit layer on top of Lucene may slow down search and query performance Scaling of search architecture dependent upon CQ architecture Customizations    Utilizing different content parsers, index tuning, etc (mitigated in 5.6.1) Can I use newer version of Lucene? How can I extend Jackrabbit search implementation?

 External Search Platforms  Search Providers with Crawlers (examples): ▪ Google Search Appliance ▪ Microsoft FAST  Non-crawler Search Providers (examples): ▪ Endeca ▪ Lucene/Solr  Enables independent scaling of search platform  Supports more than one content sources  Configuration & customization of search application is decoupled from CQ5 application  May provide more advanced search features (faceted search, geospatial search, personalization, etc)

 Challenges building & managing search indexes  Building Site Index: Crawl or Query & Inject?  How often should index be rebuilt?  How to ensure that content & metadata between content sources and search index is always in sync?  In case of multiple data sources, how to manage duplicates, index structure and common metadata model?  Challenges querying & building search results  Should search results page be hosted on the provider’s platform or within CQ?  Does search provider offer extended API to query and build search results within the application?

 Integration Notes:  GSA, FAST Site Crawler, Endeca’s Plugin for CRX Indexing, Solr via open Source crawlers (Nutch, etc)  May require custom service which returns data (for example for Solr, Endeca)  Pros:  Ease of implementation  Indexes rendered version of the pages  Cons:  Lag between content publishing and index update process may result in out of sync search results experience. Also, what happens to deleted content?  Larger index crawl and build times  Search index doesn’t have complete set of meta-data

 Example – CQ / FAST connector (available via service pack)  Pros: ⁻ Search index always in sync with content repository ⁻ Ability to send metadata with content ⁻ Customizable data formats and allows for partial indexing of page  Cons: ⁻ Will require custom development efforts ⁻ Indexing of content instead of rendered version of the pages ⁻ System Performance / Event Handling

 Pros: ⁻ Search index (mostly) in sync with content repository ⁻ Ability to send metadata with content ⁻ Customizable data formats and allows for partial indexing of page ⁻ Minimal replication event processing  Cons: ⁻ Will require custom development efforts ⁻ Search index may get out of sync with content repository (but for a shorter duration only) ⁻ Indexing of content instead of rendered version of the pages

 Handling initial content load & index creation  In case of content push approach, how will initial index be generated? May need to create initial baseline via site crawl or custom service  In case of content pull approach, how will index reflect deleted, moved, site pages?  Permission sensitive site pages & assets  Option 1: Export ACLs to Search Provider (example: CQ/FAST Connector)  Option 2: Check user permission via CQ at run time (similar to how CQ handles delivery of content incase of closed user groups)  Referenced assets, content pages and promos  Option: Query referenced pages and index. May cause performance (& recursive index) issue though.  Option: Selective content indexing (Index parts of page instead of entire page)

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

Adobe CQ5 | AEM Consulting, CQ5 Development Adobe ...

Organizations Implementing Adobe CQ5. ... Interwoven TeamSite, Jahia, dotCMS, Alfresco, Vignette and custom CMS to Adobe CQ5. AEM ... search, deliver and ...
Read more

Implementing Site Search in CQ5 / AEM - Technology

Site search is one of the core functionality of any website. This talk provides an overview of internal workings of CQ5 search, its limitations for ...
Read more

Cq5 / Aem | LinkedIn

CQ5 AEM: CQ5 developers with 5+ years of CMS application development experience in Adobe Day ... Implementing Site Search in CQ5 / AEM. 10,914 Views.
Read more

cq5 - Query Builder API Intiation - Stack Overflow

I am implementing a Java module to fetch the ... Tour Start here for a quick overview of the site ... cq5 aem query-builder. share | ...
Read more

cq5 - Application level caching in AEM - Stack Overflow

We are working on a site in AEM 6.1 which has news and events ... Application level caching in AEM. ... Aim of implementing the caching on dispatcher is ...
Read more

Cq5, Aem | LinkedIn

Adobe AEM/CQ5 developer and administrator ( AEM , CQ5.4 ) at Sapient Nitro at AEM Developer at SapientNitro ... Adobe CQ5 / AEM Developer at Roche, ...
Read more

CQ5 WCM Developer's Guide -

CQ5 WCM Developer's Guide Page iv of 72 ... 7.6.7. Search Component ... In particular, when implementing
Read more

Adobe Cq5 Aem Job in Bangalore – Adobe Cq5 Aem Jobs ...

Search & apply for the Adobe Cq5 Aem fresher and ... Strong experience implementing adobe aem/cq5 and other WCM platforms ... (Omniture ,Site ...
Read more