Published on February 18, 2014

Author: Jahia

Source: slideshare.net


Thomas Delerm and Adrien Di Mascio from Logibal will explain the interest of web semantics in modern web applications for the best use of your data.

They’ll give the recipes that make Jahia an appropriate CMS for the semantic and linked data web, a.k.a "web 3.0"

SEMANTIC WEB WITH JAHIA February 2014 www.sigma.fr

SUMMARY • WHY ? • Background • Web 2.0 is not enough • WHAT ? • Definitions • It’s real • HOW ? • JAHIA fits • Integration www.sigma.fr

WHY ? • Background • Web 2.0 is not enough www.sigma.fr

Background : who we are ? Thomas Delerm and Adrien Di Mascio from Logilab will explain the interest of web semantics in modern web applications for the best use of your data. They’ll give the recipes that make Jahia an appropriate CMS for the semantic and linked data web, a.k.a. "web 3.0"  Adrien DI MASCIO - Semantic Web Director Company : Logilab  Thomas DELERM - Web Architect Company : SIGMA Worked in cell and IPTV content startups www.sigma.fr

How the web evolved  Web « 1 » was about documents and links  Web « 2.0 » is about social and users https://web.archive.org/web/19991116151216/http://www4.yahoo.com/ www.sigma.fr

WHY ? • Background • Web 2.0 is not enough www.sigma.fr

Failures of Web 2.0  All the databases and APIs are in “silo”  searches are limited  Results are documents, not objects  Are my results up to date and reliable ? Example : Renault : Too many combinations when you want to buy a car : more than 10^20 [1] [1] http://www.semweb.pro/talk/2474 www.sigma.fr

Failures of Web 2.0  Web 2.0 is far from perfect :  User tag – Different orthography – Different meanings for the same orthography (Hollande) – No relationships between tags  You cannot (in one request) answer complex queries like “List on my website 10 products whose producer is Samsung and price under $50” www.sigma.fr

We have a solution  There is always a technical evolution – From PC to Web : WWW and links – From Web to Web 2.0 : AJAX (dynamic web sites) – From Web 2.0 to Web 3.0 : Semantic properties and Linked data So let’s learn what the semantic web is ! www.sigma.fr

WHAT ? • Definitions • It’s real www.sigma.fr

Semantic Web – (Anti)definitions Today, Semantic Web is not: Magic Natural Language Processing Image Automatic Processing A new protocol It's a worldwide network of data built upon a set of interoperable standards that use URLs to identify data and link them together. www.sigma.fr

No Natural Language Processing A human reads: <h1>Semantic Web</h1>  <p>Semantic Web is worldwide network of data invented by <a href="http://w3.org/People/Berners-Lee">Tim Berners Lee</a> in 1994.</p> A machine reads: <h1> ????????????</h1>  <p> ?????????????????????????????????????????????????? ?????<a href="http://w3.org/People/BernersLee"> ???????????????</a> ????????</p> www.sigma.fr

If only ... … The machine could read:  SemanticWeb is_a network  SemanticWeb was_created_by TimBernersLee  SemanticWeb was_created_in 1994 www.sigma.fr

Annotate your document Use rdfa or schema.org <p itemtype="Concept"> <span itemprop="name">Semantic Web</span> is <span itemprop="description">worldwide network of data</span> invented by <a itemprop="creator" href="http://w3.org/People/Berners-Lee"> Tim Berners Lee</a> in <span="creation_date">1994</span>.</p> www.sigma.fr

Publish another representation Publish RDF and use HTTP content-negotiation <http://mysite.com/SemanticWeb> a <http://www.w3.org/2004/02/skos/core#Concept>; skos:closeMatch <http://data.bnf.fr/ark:/12148/cb119328992> ; dc:creator <http://w3.org/People/Berners-Lee/> ; dc:date "1994". More familiar with JSON ? Take a look at JSON-LD www.sigma.fr

Vocabularies, ontologies  An ontology is a structured set of terms and concepts.  Each term and concept is also identified by a URL  There are quite a few standard ontologies for various domains (social interactions, libraries, music, events, etc.) www.sigma.fr

Make it happen now !  RDF is nice  Some database engines store RDF graphs - You can query them with the SPARQL language  Standardized by W3C  You don't necessarily need to change your technology stack  If your data is structured, publishing RDF is easy - Choosing an ontology or a vocabulary can be hard - Make your relational database answer a SPARQL query is hard www.sigma.fr

WHAT ? • Definitions • It’s real www.sigma.fr

It's all about data Publishing structured data: Helps search engines Better indexation Better page rank Eases external data integration Importing a CSV file requires a preliminary agreement on its structure Maintaining data is expensive, reuse published data (dbpedia, freebase, geonames) www.sigma.fr

Examples GoodRelations annotations Schema.org annotations www.sigma.fr

HOW ? • Jahia fits • Integration www.sigma.fr

Client case : Bpi  One goal : use state-of-the art Semantic Web since they are a library (Bibliothèque Publique d’information)  3 main needs: – Input data easily for contents and within contents – Store data in a safe, RDF-friendly manner – Output data • On every page for SEO (RDFa) • In searches • In exports (RDF)  Good news : Jahia fits ! www.sigma.fr

The choice of Jahia  Input : - Jahia allows to define clear content definitions (CND files) with inheritance. - Jahia is content-centric  Enrich within contents : CKEditor  On contents : contribution or edition (GWT) modes www.sigma.fr

The choice of Jahia : storage and output Storage : you need a framework than can abstract different sources of data : enter JCR – Unique repository for all content – External data are abstract : LDAP, Files, other DB… Output: – Graph structure + XML format  fit for meta data – JSP views can be easily tailored for special export formats www.sigma.fr

HOW ? • Jahia fits • Integration www.sigma.fr

Input : CKEditor and categories  Make sure text data is stored as plain HTML - Properties file to map schema.org  HTML code - In-content schema.org properties  Created a CKEditor Plugin  Triple categorization of contents –Categories (closed list) –Tags (open) –Authorities (closed – linked with BnF)  Next steps –Need for a triple store ? –Categorization through automatic spider browsing ? www.sigma.fr

Content structure  Directories per category  The semantic mapping is transparent : no additional field to fill in  Properties files to map a field and its semantic exports (Dublin Core, FOAF..)  Kind of challenges met – Where to store meta data of a file  extend jnt:file – How to create a sub content while creating its parents  edit Spring GWT XML www.sigma.fr

Vocabularies used Page Schema.org OpenGraph Dublin Core FOAF Lists Details on short and long contents No Yes No Yes No Yes No Partial Details : events, IT resource [file] Yes No Yes No Auteurs Place No   No   Yes   Yes   In HTML Everywhere Header Header Everywhere Format in HTML RDFa Meta Meta RDFa In RDF Yes Yes, one line per  meta   Automatic  (mapping) Yes, native Contributed By Yes, one line per  meta     Automatic +  Automatic  Manual Bpi (mapping)   Automatic  (mapping) www.sigma.fr

Output  We chose RDFa because more widely used for now (than microdata)  Debate : shall enrichment be made manually ? Automatically ? Though a mixed technology ?  The field  dc:xxx mapping will be used to improve search results  “ARK” URIs are used to exchange objects between repositories (internal, Jahia, external like BnF) www.sigma.fr

Future    Free your data ! Put them together Share them between applications and externally  Forces you to organize your IT differently www.sigma.fr

Future : Facebook  Facebook is gradually promoting the posts that contain Opengraph data [1]  « Facebook testing more uses for Open Graph » [2] [1] http://newsroom.fb.com/News/787/News-Feed-FYI-WhatHappens-When-You-See-More-Updates-fromFriends(January 21, 2014) [2] http://allfacebook.com/add-to-my-movies-link_b128387 www.sigma.fr

Future : Web 3.0 www.sigma.fr

Conclusion  “If you’re not paying for it, you are the product” [1]  Semantic Web is going to be imposed by internet giants because they need it to know you better  Make the first step to enrich your data, don’t miss the train !  Jahia 7 catches it : – External data provider – Quality, extendable editor [1] http://blogs.law.harvard.edu/futureoftheinternet/2012/03/21/meme-patrol-when-something-online-is-free-youre-not-the-customer-youre-the-product/ www.sigma.fr

Questions & Answers  Webography: New W3C Blog on Semantic Web & linked data : http://www.w3.org/blog/data/ http://fr.slideshare.net/AntidotNet/time2-market-lyon-13nov2013-slideshare# http://fr.slideshare.net/terraces/technologies-du-web-smantique-pour-lentreprise-20 http://fr.slideshare.net/AntidotNet/web-smantique-web-de-donnes-web-30-linked-dataquelques-repres-pour-sy-retrouver www.sigma.fr

