SemanticCampLondon, 16th February 2008

50 %
50 %
Information about SemanticCampLondon, 16th February 2008

Published on February 16, 2008

Author: covert

Source: slideshare.net

Description

My presentation at SemanticCamp London, 16th February 2008

Automatically indexing science using natural- language processing, RDF and SPARQL Andrew Automatically indexing science using Walkingshaw, Nick Day, Peter Corbett, natural-language processing, RDF and Jim Downing, Joe SPARQL Townsend, Peter Murray-Rust Gathering Andrew Walkingshaw, Nick Day, Peter Corbett, Jim data Downing, Joe Townsend, Peter Murray-Rust Extracting (meta)data Using the data Thanks February 16, 2008

Automatically indexing science using natural- language Data sources processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, • Supplemental and experimental data Peter Murray-Rust Gathering data Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language Data sources processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, • Supplemental and experimental data Peter Murray-Rust • Journals Gathering data Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language Data sources processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, • Supplemental and experimental data Peter Murray-Rust • Journals Gathering • Self-archived papers (e.g. arXiv) data Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language Data sources processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, • Supplemental and experimental data Peter Murray-Rust • Journals Gathering • Self-archived papers (e.g. arXiv) data • Mainstream journalism Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language Data sources processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, • Supplemental and experimental data Peter Murray-Rust • Journals Gathering • Self-archived papers (e.g. arXiv) data • Mainstream journalism Extracting (meta)data • Blogs Using the data Thanks

Automatically indexing science using natural- language Supplemental data: CrystalEye processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • http://wwmm.ch.cam.ac.uk/crystaleye/ Gathering data Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language Supplemental data: CrystalEye processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • http://wwmm.ch.cam.ac.uk/crystaleye/ Gathering • Repository for crystallographic data data Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language Journals and arXiv processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • “Traditional” journal articles Gathering data Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language Journals and arXiv processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • “Traditional” journal articles Gathering • Titles and abstracts. . . data Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language Journalism and blogs processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • Unstructured text with little semantics; Gathering data Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language Journalism and blogs processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • Unstructured text with little semantics; Gathering • . . . hence Google Scholar, Web of Science, etc. data Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language Semi-structured data: Golem processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • We’ve got a lot of chemical data as CML Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust Gathering data Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language Semi-structured data: Golem processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • We’ve got a lot of chemical data as CML Peter Corbett, Jim Downing, Joe • http://en.wikipedia.org/wiki/Chemical Markup Language Townsend, Peter Murray-Rust Gathering data Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language Semi-structured data: Golem processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • We’ve got a lot of chemical data as CML Peter Corbett, Jim Downing, Joe • http://en.wikipedia.org/wiki/Chemical Markup Language Townsend, Peter • . . . but we still need to get data out of that and into a Murray-Rust more useful form Gathering data Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language Semi-structured data: Golem processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • We’ve got a lot of chemical data as CML Peter Corbett, Jim Downing, Joe • http://en.wikipedia.org/wiki/Chemical Markup Language Townsend, Peter • . . . but we still need to get data out of that and into a Murray-Rust more useful form Gathering data • hence Golem: http://www.lexical.org.uk/science/golem/ Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language Semi-structured data: Golem processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • We’ve got a lot of chemical data as CML Peter Corbett, Jim Downing, Joe • http://en.wikipedia.org/wiki/Chemical Markup Language Townsend, Peter • . . . but we still need to get data out of that and into a Murray-Rust more useful form Gathering data • hence Golem: http://www.lexical.org.uk/science/golem/ Extracting • GRDDLish strategy for extracting data from CML files: (meta)data Using the data identify dialect-specific concepts with XPath expressions Thanks and XSLT stylesheets

Automatically indexing science using natural- language Semi-structured data: Golem processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • We’ve got a lot of chemical data as CML Peter Corbett, Jim Downing, Joe • http://en.wikipedia.org/wiki/Chemical Markup Language Townsend, Peter • . . . but we still need to get data out of that and into a Murray-Rust more useful form Gathering data • hence Golem: http://www.lexical.org.uk/science/golem/ Extracting • GRDDLish strategy for extracting data from CML files: (meta)data Using the data identify dialect-specific concepts with XPath expressions Thanks and XSLT stylesheets • upshot: we can extract JSON objects from CML files.

Automatically indexing science using natural- language Free text: OSCAR3 processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, • http://oscar3-chem.sourceforge.net/ Joe Townsend, Peter Murray-Rust Gathering data Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language Free text: OSCAR3 processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, • http://oscar3-chem.sourceforge.net/ Joe Townsend, • Natural-language parser for documents about chemistry Peter Murray-Rust Gathering data Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language Free text: OSCAR3 processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, • http://oscar3-chem.sourceforge.net/ Joe Townsend, • Natural-language parser for documents about chemistry Peter Murray-Rust • Dark magic: don’t ask me how it works! Gathering data Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language Free text: OSCAR3 processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, • http://oscar3-chem.sourceforge.net/ Joe Townsend, • Natural-language parser for documents about chemistry Peter Murray-Rust • Dark magic: don’t ask me how it works! Gathering • . . . but it can be run as a Jetty webservice so as long as it data Extracting does, I’m happy (meta)data Using the data Thanks

Automatically indexing science using natural- language Free text: OSCAR3 processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, • http://oscar3-chem.sourceforge.net/ Joe Townsend, • Natural-language parser for documents about chemistry Peter Murray-Rust • Dark magic: don’t ask me how it works! Gathering • . . . but it can be run as a Jetty webservice so as long as it data Extracting does, I’m happy (meta)data • Author’s blog: Using the data http://wwmm.ch.cam.ac.uk/blogs/corbett/ Thanks

Automatically indexing science using natural- language Getting the data in processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • Everything (more or less) talks RSS nowadays. . . Gathering data Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language Getting the data in processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • Everything (more or less) talks RSS nowadays. . . • RSS 0.91, RSS 1.0 (which one?), Atom, etc etc etc. Gathering data Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language Getting the data in processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • Everything (more or less) talks RSS nowadays. . . • RSS 0.91, RSS 1.0 (which one?), Atom, etc etc etc. Gathering data • Thankfully: feedparser (http://feedparser.org/) Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language Serializing metadata processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe • RDF – using: Townsend, Peter Murray-Rust Gathering data Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language Serializing metadata processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe • RDF – using: Townsend, Peter • Dublin Core terms Murray-Rust Gathering data Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language Serializing metadata processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe • RDF – using: Townsend, Peter • Dublin Core terms Murray-Rust • A homebrew ontology based on the IUCr’s CIF data format Gathering data Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language Serializing metadata processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe • RDF – using: Townsend, Peter • Dublin Core terms Murray-Rust • A homebrew ontology based on the IUCr’s CIF data format Gathering data • and another homebrew ontology for OSCAR annotations Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language Serializing metadata processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe • RDF – using: Townsend, Peter • Dublin Core terms Murray-Rust • A homebrew ontology based on the IUCr’s CIF data format Gathering data • and another homebrew ontology for OSCAR annotations Extracting (meta)data • (it’d be good to standardise these, but to be honest, not Using the data many people are doing this sort of thing) Thanks

Automatically indexing science using natural- language The process processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • For each feed in a list of feeds: Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust Gathering data Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language The process processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • For each feed in a list of feeds: Peter Corbett, Jim Downing, • If it’s supplying CML data, set Golem on each entry, get Joe Townsend, the observables out, and turn them into triples; run Peter Murray-Rust OSCAR3 over the title and/or abstract Gathering data Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language The process processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • For each feed in a list of feeds: Peter Corbett, Jim Downing, • If it’s supplying CML data, set Golem on each entry, get Joe Townsend, the observables out, and turn them into triples; run Peter Murray-Rust OSCAR3 over the title and/or abstract Gathering • If it’s not, extract the free text from each entry, send it to data the OSCAR web service, and assign triples based on the Extracting (meta)data chemical entities OSCAR finds Using the data Thanks

Automatically indexing science using natural- language The process processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • For each feed in a list of feeds: Peter Corbett, Jim Downing, • If it’s supplying CML data, set Golem on each entry, get Joe Townsend, the observables out, and turn them into triples; run Peter Murray-Rust OSCAR3 over the title and/or abstract Gathering • If it’s not, extract the free text from each entry, send it to data the OSCAR web service, and assign triples based on the Extracting (meta)data chemical entities OSCAR finds Using the data • Upload the RDF to your triple store Thanks

Automatically indexing science using natural- language The process processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • For each feed in a list of feeds: Peter Corbett, Jim Downing, • If it’s supplying CML data, set Golem on each entry, get Joe Townsend, the observables out, and turn them into triples; run Peter Murray-Rust OSCAR3 over the title and/or abstract Gathering • If it’s not, extract the free text from each entry, send it to data the OSCAR web service, and assign triples based on the Extracting (meta)data chemical entities OSCAR finds Using the data • Upload the RDF to your triple store Thanks • (I’m using the Talis platform, so that’s just curl)

Automatically indexing science using natural- language The process processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, • For each feed in a list of feeds: Peter Corbett, Jim Downing, • If it’s supplying CML data, set Golem on each entry, get Joe Townsend, the observables out, and turn them into triples; run Peter Murray-Rust OSCAR3 over the title and/or abstract Gathering • If it’s not, extract the free text from each entry, send it to data the OSCAR web service, and assign triples based on the Extracting (meta)data chemical entities OSCAR finds Using the data • Upload the RDF to your triple store Thanks • (I’m using the Talis platform, so that’s just curl) • And. . .

Automatically indexing science using natural- language SPARQL is great. processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Just post queries at a SPARQL endpoint: Joe Townsend, authortemplate=’’’ Peter Murray-Rust PREFIX dc: <http://purl.org/dc/terms/> PREFIX ce: Gathering data <http://wwmm.ch.cam.ac.uk/crystaleye/dictionary#> Extracting DESCRIBE ?file WHERE { ?file dc:contributor (meta)data Using the data some author . } Thanks ’’’

Automatically indexing science using natural- language SPARQL isn’t (entirely) great. processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter • Scientists shouldn’t have to know this stuff. Murray-Rust Gathering data Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language SPARQL isn’t (entirely) great. processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter • Scientists shouldn’t have to know this stuff. Murray-Rust • So we need to build a front end which your average senior Gathering data academic might be able to use. . . Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language SPARQL isn’t (entirely) great. processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter • Scientists shouldn’t have to know this stuff. Murray-Rust • So we need to build a front end which your average senior Gathering data academic might be able to use. . . Extracting • (i.e. it’s got to look like a website.) (meta)data Using the data Thanks

Automatically indexing science using natural- language What queries do we want? processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, • What experimental data is an author responsible for? Peter Murray-Rust Gathering data Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language What queries do we want? processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, • What experimental data is an author responsible for? Peter Murray-Rust • What chemical entities are in some data? Gathering data Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language What queries do we want? processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, • What experimental data is an author responsible for? Peter Murray-Rust • What chemical entities are in some data? Gathering • Where is a given chemical entity talked about? data Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language What queries do we want? processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, • What experimental data is an author responsible for? Peter Murray-Rust • What chemical entities are in some data? Gathering • Where is a given chemical entity talked about? data • So we can build a web app around these queries. Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language What queries do we want? processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, • What experimental data is an author responsible for? Peter Murray-Rust • What chemical entities are in some data? Gathering • Where is a given chemical entity talked about? data • So we can build a web app around these queries. Extracting (meta)data • django + rdflib + sparql + Talis Platform Using the data Thanks

Automatically indexing science using natural- language Demo! processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust And here it is. Gathering data Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language Thanks to. . . processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • Talis (http://n2.talis.com/) for access to their platform Gathering data Extracting (meta)data Using the data Thanks

Automatically indexing science using natural- language Thanks to. . . processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust • Talis (http://n2.talis.com/) for access to their platform Gathering • and to the RSC and IUCr for their support of CrystalEye. data Extracting (meta)data Using the data Thanks

Add a comment

Related presentations

Related pages

SemanticCampLondon, 16th February 2008 - Education

SemanticCampLondon, 16th February 2008. by andrew-walkingshaw. on May 12, 2015. Report Category: Education. Download: 0 Comment: 0. 3,471. views. Comments.
Read more

SemanticCampLondon, 16th February 2008 - Education

SemanticCampLondon, 16th February 2008; SemanticCampLondon, 16th February 2008 May 12, 2015 Education andrew-walkingshaw. System is processing data
Read more

February , 2008 - Documents

February , 2008. by najila. on Jan 10, 2016. Report Category: Documents. Download: 0 Comment: 0. 17. views. Comments ...
Read more

Pre-Cal 30S October 16th, 2008 - Education - documents.mx

Pre-Cal 30S October 16th, 2008; Pre-Cal 30S October 16th, 2008 May 18, 2015 Education heviatar. ... SemanticCampLondon, 16th February 2008. Login or Join ...
Read more

Category:Gatherings - W3C Wiki

Category:Gatherings. ... UK * http://semanticcamp.tommorris.org/ and http://barcamp.org/SemanticCampLondon * 16th and 17th February ... (Early 2008 - No ...
Read more

BarCamp / PreviousBarCamps

2008. Mar. 15th - PcampSiliconValley - Sunnyvale, CA. A great event, and the first ever Product Management camp. See everyone next year! Mar. 15th - ...
Read more