DCC Keynote 2007

50 %
50 %
Information about DCC Keynote 2007

Published on December 24, 2007

Author: carolegoble

Source: slideshare.net

Description

A keynote given on experiences in curating workflows and web services.

3rd International Digital Curation Conference: "Curating our Digital Scientific Heritage: a Global Collaborative Challenge"
11-13 December 2007
Renaissance Hotel
Washington DC, USA

Curating Services and Workflows The Good, the Bad and the Ugly A Personal Story in the Small Professor Carole Goble The University of Manchester, UK [email_address] Keynote: 3 rd International Digital Curation Conference, Washington DC, 11-13 December 2007

 

ID MURA_BACSU STANDARD; PRT; 429 AA. DE PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASE DE (EC 2.5.1.7) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINE DE ENOLPYRUVYL TRANSFERASE) (EPT). GN MURA OR MURZ. OS BACILLUS SUBTILIS. OC BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE; OC BACILLUS. KW PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE. FT ACT_SITE 116 116 BINDS PEP (BY SIMILARITY). FT CONFLICT 374 374 S -> A (IN REF. 3). SQ SEQUENCE 429 AA; 46016 MW; 02018C5C CRC32; MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI [GSK]

Programmatic Interfaces to Services (Web Services not Web Sites) Your Script Service Registry Web Service SeqFetch Service BLAT Service BLAST Service SeqFetch Service GO Service Adapted from Lincoln Stein Your Workflow Your Application Interface Description Document WSDL WADL European Bioinformatics Institute API submissions has risen to 3,166,901 for 2007 (Sarah Hunter)

[Mark Wilkinson, 2006]

Workflows describe the scientists in silico experiment Link together and cross reference data in different repositories Mechanism for interoperating. And that includes publications! Remote, third party, external applications and services Accessible to the workflow machinery And that includes data and publications! Results management Semantic metadata annotation of data Provenance tracking of results Sharing and replicating know-how Reuse of workflows Viva la Workflows!

Workflows describe the scientists in silico experiment

Link together and cross reference data in different repositories

Mechanism for interoperating.

And that includes publications!

Remote, third party, external applications and services

Accessible to the workflow machinery

And that includes data and publications!

Results management

Semantic metadata annotation of data

Provenance tracking of results

Sharing and replicating know-how

Reuse of workflows

my Grid Taverna Workflow Workbench http://www.mygrid.org.uk

41000+ downloads 40 per day since June 2006. Ranked 210 sourceforge activity (06 06 07) Open Source Development Used throughout the world Systems biology – SysMo Consortium Proteomics Gene/protein annotation, Microarray data analysis, Medical image analysis Heart simulations, High throughput screening, Phenotypical studies, Phylogeny Plants, Mouse, Human Astronomy, Music, Geography Text mining And Curation….

41000+ downloads

40 per day since June 2006.

Ranked 210 sourceforge activity (06 06 07)

Open Source Development

Used throughout the world

Systems biology – SysMo Consortium

Proteomics

Gene/protein annotation, Microarray data analysis, Medical image analysis

Heart simulations, High throughput screening, Phenotypical studies, Phylogeny

Plants, Mouse, Human

Astronomy, Music, Geography

Text mining

And Curation….

Because software needs curating too. http://www.omii.ac.uk Manchester Southampton Edinburgh European Bioinformatics Institute

Automated Curation using Workflows Coordinating data mirroring refreshes Refreshing Data warehouses e-Fungi, ISPIDER Rebuilding lost databases tGRAP when collapsed picked up by Nijmegen and rebuilt using workflows over two days. Text mining Very, very popular. Workflows instead of data curation? Data regenerated on demand. Curate the workflow and not the data? Bas Vroling, Gert Vriend CMBI NCMLS UMC Nijmegen

Coordinating data mirroring refreshes

Refreshing Data warehouses

e-Fungi, ISPIDER

Rebuilding lost databases

tGRAP when collapsed picked up by Nijmegen and rebuilt using workflows over two days.

Text mining

Very, very popular.

Workflows instead of data curation?

Data regenerated on demand.

Curate the workflow and not the data?

Workflows are reading publications. Workflows are processing the data. Workflows are part of curation pipelines Workflows are another form of outcome to publish and curate alongside data and publications

Workflows are…. … provenance of data … g eneral technique for describing and enacting a process, like a script or a protocol or a method … precise, unambiguous and transparent protocols and records. … often complex, so they need explaining. … often challenging and expensive to develop. … know-how and best practice. … collaborations. … valuable first class scientific assets in their own right. Services are steps in the workflow, and a workflow can be deployed as a service. They are “ Social Networks ” of services. More on this later….

… provenance of data

… g eneral technique for describing and enacting a process, like a script or a protocol or a method

… precise, unambiguous and transparent protocols and records.

… often complex, so they need explaining.

… often challenging and expensive to develop.

… know-how and best practice.

… collaborations.

… valuable first class scientific assets in their own right.

Services are steps in the workflow, and a workflow can be deployed as a service. They are “ Social Networks ” of services. More on this later….

“ We need to curate methods as well as data. With the new large scale data sets process matters as much as content and we are rubbish at curating, capturing and reusing it . Much of what we now rely on is processed, not raw data. We have strategies for curating the raw data - indeed multiple standards. Thus, in life sciences we have a gaping void in our curation . We need standards, need places to put methods, and places to allow re-use. Professor Andy Brass, Bioinformatics

“ We need to curate methods as well as data. With the new large scale data sets process matters as much as content and we are rubbish at curating, capturing and reusing it . Much of what we now rely on is processed, not raw data. We have strategies for curating the raw data - indeed multiple standards.

Thus, in life sciences we have a gaping void in our curation . We need standards, need places to put methods, and places to allow re-use.

Towards Reproducible Science (with Reproducible Scientific Objects)

Trypanosomiasis in Cattle Identified a pathway for which its correlating gene (Daxx) is believed to play a role in trypanosomiasis resistance. Systematic and comprehensive automation. Elimination of user bias. Fisher P et al A systematic strategy for large-scale analysis of genotype–phenotype correlations: identification of candidate genes involved in African trypanosomiasis, Nucleic Acids Research, 2007, 1–9 A PhD student. Paul Fisher.

Identified a pathway for which its correlating gene (Daxx) is believed to play a role in trypanosomiasis resistance.

Systematic and comprehensive automation. Elimination of user bias.

Recycling, Reuse, Repurposing A Trypanosomiasis in Cattle workflow (by Paul) reused without change for Trichuris muris Infection (by Jo). Identified the biological pathways believed to be involved in the ability of mice to expel the parasite. Workflows are memes. Scientific commodities. To be exchanged and traded and vetted and mashed. Users add value.

A Trypanosomiasis in Cattle workflow (by Paul) reused without change for Trichuris muris Infection (by Jo).

Identified the biological pathways believed to be involved in the ability of mice to expel the parasite.

Workflows are memes. Scientific commodities. To be exchanged and traded and vetted and mashed. Users add value.

Scientific memes. Scientific viruses. Increasing numbers. Kepler Triana BPEL Ptolemy II

Aerospace Engine Design 90% of design is variant design 70% of information is taken from previous designs Source: Silvia Wong, University of Southampton, UK

Digital Library Graduate Students Undergraduate Students e-Experimentation e-Scientists Certified Experimental Results & Analyses Data, Metadata & Ontologies Workflows Adapted from the eBank project Institutional Archive Local Web Publisher Holdings Virtual Learning Environment Technical Reports Reprints Peer-Reviewed Journal & Conference Papers Preprints & Metadata

If I had (well) curated services and workflows I could…. Browse around and see what is out there and stop reinventing the wheel. Find a service based on what it does (or was meant to do), and what it consumes as inputs and produces as outputs, and what it uses, or because it matches (somehow) something I have already Understand how it works and when it works Know where there are exact copies or similar services I can use as alternates Know whether I have permission to use it, or have the set up to use it.

Browse around and see what is out there and stop reinventing the wheel.

Find a service based on what it does (or was meant to do), and what it consumes as inputs and produces as outputs, and what it uses, or because it matches (somehow) something I have already

Understand how it works and when it works

Know where there are exact copies or similar services I can use as alternates

Know whether I have permission to use it, or have the set up to use it.

If I had (well) curated services and workflows I could…. Understand how to operate it, configure it correctly with some examples and defaults, invoke it and handle all the error stuff, and predict performance properties Know how expensive it might be to use (financially or performance) Know when and by whom its was created, its version history and track its versions Know what other people think of it, how popular it is and who else use it and how Know how reliable it is, if it still works and how reliable it is and whether it keeps changing.

Understand how to operate it, configure it correctly with some examples and defaults, invoke it and handle all the error stuff, and predict performance properties

Know how expensive it might be to use (financially or performance)

Know when and by whom its was created, its version history and track its versions

Know what other people think of it, how popular it is and who else use it and how

Know how reliable it is, if it still works and how reliable it is and whether it keeps changing.

If I had (well) curated services and workflows I could…. Get intelligent help with using it in my application, like when building workflows Validate it Know how it can be chained with others Find services that can mediate the mismatches between other services. Automagically match it up with others to automagically create new ones Call it from an application or a web browser

Get intelligent help with using it in my application, like when building workflows

Validate it

Know how it can be chained with others

Find services that can mediate the mismatches between other services.

Automagically match it up with others to automagically create new ones

Call it from an application or a web browser

A definition for me [based on wikipedia] Digital curation is about maintaining and adding value to a trusted body of digital assets for current and future use by, and on behalf of, a community. It is a long term process where those assets are managed, cleaned up and corrected, associated with metadata, annotated and discussed, and appropriately preserved or reliably disposed of. Assets are used, we hope By applications and scientists who had anticipated using them. By applications and scientists that had not, or in ways that were unanticipated. http://en.wikipedia.org/wiki/Digital_curation

Digital curation is about maintaining and adding value to a trusted body of digital assets for current and future use by, and on behalf of, a community.

It is a long term process where those assets are managed, cleaned up and corrected, associated with metadata, annotated and discussed, and appropriately preserved or reliably disposed of.

Assets are used, we hope

By applications and scientists who had anticipated using them.

By applications and scientists that had not, or in ways that were unanticipated.

e-Scientists in the Cloud Individual life scientists, in under-resourced labs, using other people’s applications, with little systems support. Consumers are providers. Exploratory. A distributed, disconnected community of scientists.

Individual life scientists, in under-resourced labs, using other people’s applications, with little systems support.

Consumers are providers.

Exploratory.

A distributed, disconnected community of scientists.

Hypo Science © Virtual Laboratories Science in the Small by the Many © Peter Murray-Rust

Global Services in the Cloud Independent third party world-wide service providers of applications, tools and data sets. In the Cloud. Hosted at the originators site. Local applications, tools and datasets. My copies of third party services. Special shim services. Decoupled providers and consumers. 3500 service operations

Independent third party world-wide service providers of applications, tools and data sets. In the Cloud. Hosted at the originators site.

Local applications, tools and datasets. My copies of third party services.

Special shim services.

Decoupled providers and consumers.

3500 service operations

But Surely …. … Can’t I just Google (or Woogle) for a service? The clustalw program from Emboss is called ‘emma’ … Can’t I look at its WSDL document? Input0:string, Output0: string What does SeqRet actually do? Liberal use of polymorphic capabilities What about the ones that are not Web Services? … Can’t I look at its documentation? Ahem.  We have to try them to find out what they do…

… Can’t I just Google (or Woogle) for a service?

The clustalw program from Emboss is called ‘emma’

… Can’t I look at its WSDL document?

Input0:string, Output0: string

What does SeqRet actually do?

Liberal use of polymorphic capabilities

What about the ones that are not Web Services?

… Can’t I look at its documentation?

Ahem.  We have to try them to find out what they do…

Writing Reusable stuff is HARD Predicting the unknown required by the unknown. Services in the Wild are frequently Rubbish. Scientists and Developers are naughty.

Predicting the unknown required by the unknown.

Services in the Wild are frequently Rubbish.

Scientists and Developers are naughty.

Applications and Scientists need a Curated Registry of Services Note: Registry, not repository Services are hosted elsewhere (Just having a workflow system isn’t enough)

Service Curation 3500+ service operations 600+ annotated by full-time curator. myGrid Ontology Annotation and curation pipeline Curation tools Feta and Find-O-Matic discovery tools There are others: DAS Registry BioMOBY Central Since 2002

3500+ service operations

600+ annotated by full-time curator.

myGrid Ontology

Annotation and curation pipeline

Curation tools

Feta and Find-O-Matic discovery tools

There are others:

DAS Registry

BioMOBY Central

Building Annotation Commodities Object Service Endpoint Workflow file etc Annotation Model Functional Operational Provenance Reputation Descriptions Ontologies Controlled vocabulary Tags Folksonomy Free text Layered, Enrichment, Augmentation Annotation model Uses Semantic Web technologies - OWL and RDFS The perspective of the scientist Managed, centralised curation process 700+ class domain ontology Service Ontology 3500+ Services

Volatility and Decay Services are not deposited and preserved. They are referred to. Constant, silent churn and flux. No SLA to be stable or standard. Constantly need tending or else they go bad and stale. SeqHound, BioMART API Rapid metadata heart-beat, especially on operational metadata. Like minutes. (cf. IVOA service validation, DAS). Workflow decay Not Fix, File, Forget BioNanny

Services are not deposited and preserved.

They are referred to.

Constant, silent churn and flux.

No SLA to be stable or standard.

Constantly need tending or else they go bad and stale.

SeqHound, BioMART API

Rapid metadata heart-beat, especially on operational metadata. Like minutes.

(cf. IVOA service validation, DAS).

Workflow decay

Not Fix, File, Forget

One size does not fit all… Scientist - Finding Simple classifications on a few properties. Smart tools. “Coarse grained”. Simple Ontology. Decision Support Automation – Validation and Execution “fine grained” Rich metadata for automatic service configuration, invocation, debugging, repair, automated composition Decision making.

Scientist - Finding

Simple classifications on a few properties. Smart tools. “Coarse grained”. Simple Ontology.

Decision Support

Automation – Validation and Execution “fine grained”

Rich metadata for automatic service configuration, invocation, debugging, repair, automated composition

Decision making.

Increasing value Increased automation Better understanding Investment (cost, effort) Folksonomy Tagging Ontology Curation output{score} is_distance_between pair {input{sequence a}, input{sequence b}} ‘ myalignscript.pl ’ ‘ A tool to compare multiple protein structures ’ performs_task : alignment input_type{seq_a} : sequence… output_type{score} : d_value Scripted tool invocation Guided workflow construction Basic ‘discovery’ style service annotations Knowledge driven visualization Workflow validation Semantically enriched data Automated Workflow Construction Guided workflow reuse Dynamic Service Substitution Manual use of tools, web pages Naïve workflow systems Service Configuration

Progressive Curation Just enough, Just in time Jam today and Jam tomorrow Gain Pain Very BAD Good, but Unlikely Just right

Applications and Scientists needed a Curated Repository of Workflows Find a workflow like this one that I can edit to do something else. That’s really hard.

Workflow Glass Boxes Social Networks of Services Is it dependent on a service I don’t have access to, or is depreciated or is unreliable? Nesting and fragments of workflows Workflow networks Service Diagnostics Popularity, Co-use and clustering Quality of Service Service Curation Automate service annotation Debug service annotations

Social Networks of Services

Is it dependent on a service I don’t have access to, or is depreciated or is unreliable?

Nesting and fragments of workflows

Workflow networks

Service Diagnostics

Popularity, Co-use and clustering

Quality of Service

Service Curation

Automate service annotation

Debug service annotations

Our hard working (real) curators notice how tired they look Curation Sweatshop Steady increase in numbers of services and workflows Time-consuming and expensive. Annotation and the Ontologies Choosing, Adding value. Monitoring. Should we instead enable suppliers to add value? Franck Tanoh Katy Wolstencroft

Steady increase in numbers of services and workflows

Time-consuming and expensive.

Annotation and the Ontologies

Choosing, Adding value. Monitoring.

Should we instead enable suppliers to add value?

Automated Curation Operational: Monitoring information services, dial home diagnostics from applications, customer reports Reputation and Provenance: Recommendations and ratings Functional: Text mining and parsing files and documents (if any) Incidental metadata through use. Annotation derivation from sound workflows and rich service descriptions of inputs and outputs Not perfect, but a help! Needs lots of infrastructure Needs lots of seeding and reviewing

Operational:

Monitoring information services, dial home diagnostics from applications, customer reports

Reputation and Provenance:

Recommendations and ratings

Functional:

Text mining and parsing files and documents (if any)

Incidental metadata through use.

Annotation derivation from sound workflows and rich service descriptions of inputs and outputs

Not perfect, but a help!

Local Libraries and Warehouses of Workflows trapped in their enterprises or platforms

Tryps Twiki World Wikis are where data lives….

Picture of workflow in Flicker – evidence of social tagging and networking

Picture of workflow in Flicker – evidence of social tagging and networking

 

myExperiment.org is… A bazaar for any and all kinds of workflows. A community social network for community annotation and general gossip. A gateway to other publishing environments . A federated repository . Publish self-describing encapsulated myExperiment Objects. Not workflows; Scientific Objects ! e-Crystals, Social science, Astronomy, Geography, Music (A platform for launching workflows.) Since Feb 2007

A bazaar for any and all kinds of workflows.

A community social network for community annotation and general gossip.

A gateway to other publishing environments .

A federated repository .

Publish self-describing encapsulated myExperiment Objects.

Not workflows; Scientific Objects !

e-Crystals, Social science, Astronomy, Geography, Music

(A platform for launching workflows.)

 

Encapsulated myExperiment Objects. A single or collection of workflows with instructions and examples A workflow with its inputs and the products of executing it (including logs), perhaps multiple times Chemistry data from instruments, coupled with blogged log book entries A collection of all the digital items associated with one experiment—including EMOs A reproducible article with workflows and data Virtual Exchange Format

A single or collection of workflows with instructions and examples

A workflow with its inputs and the products of executing it (including logs), perhaps multiple times

Chemistry data from instruments, coupled with blogged log book entries

A collection of all the digital items associated with one experiment—including EMOs

A reproducible article with workflows and data

Encapsulated myExperiment Objects. Open Archives Initiative – Object Reuse and Exchange (OAI-ORE) compound object information and standardised and interoperable mechanisms W3C Open Linked Data Initiative Reproducible Scientific Objects Virtual Exchange Format x

Open Archives Initiative – Object Reuse and Exchange (OAI-ORE)

compound object information and standardised and interoperable mechanisms

W3C Open Linked Data Initiative

Reproducible Scientific Objects

EMO Challenges What happens when the parts are scattered across multiple stores? What happens if someone updates a part? How will my EMO be discovered on the Web? How can I work with an EMO offline? What is the provenance of the EMO and its parts? What happens if a part is unavailable? 24/5/2007 | myExperiment | Slide How do I send an EMO by email? Can I turn an EMO into a tarball? Can I archive an EMO to a CDROM? If I delete this file will it break anyone’s EMOs? How do I trust an EMO? How do I handle an EMO RESTfully? Can my EMO link to objects outside the EMO?

What happens when the parts are scattered across multiple stores?

What happens if someone updates a part?

How will my EMO be discovered on the Web?

How can I work with an EMO offline?

What is the provenance of the EMO and its parts?

What happens if a part is unavailable?

How do I send an EMO by email?

Can I turn an EMO into a tarball?

Can I archive an EMO to a CDROM?

If I delete this file will it break anyone’s EMOs?

How do I trust an EMO?

How do I handle an EMO RESTfully?

Can my EMO link to objects outside the EMO?

Not just Workflows, Not just Biology Chemistry - eCrystals Social Science Astronomy Music Files and Documents Logs and Blogs Ontologies Data

Why EMO?

Respect Cautious Collaboration…. 24/5/2007 | myExperiment | Slide Community web site, federated repository. Multiple and My. Publish what I want when I want within the group I want. Mixed identity regimes: an identity authority OAI-MPH. Open Archives Initiative. http://www.openarchives.org/ The CombeChem project. http://www.combechem.org/ cloud enterprise personal laboratory project

Community web site, federated repository.

Multiple and My.

Publish what I want when I want within the group I want.

Mixed identity regimes: an identity authority

OAI-MPH.

Open Archives Initiative. http://www.openarchives.org/

The CombeChem project. http://www.combechem.org/

A Gateway + more User Participation 24/5/2007 | myExperiment | Slide Tryps team already has a wiki Mash up with Facebook and workflow hosting apps. Bring functionality to the user. Cooperate! Don’t Control. The Research Information Centre British Library and Microsoft Figure courtesy Savas Parastatidis , Microsoft

Tryps team already has a wiki

Mash up with Facebook and workflow hosting apps.

Bring functionality to the user. Cooperate! Don’t Control.

 

Apologies to Larson

From me -Science to we -Science Tribal bonding and sharing Crossing Tribal Boundaries Across communities and disciplines (MIT) “ Intellectual Fusion” & “Swarming”; breaking down silos Understanding outside my expertise. E.g. sources of error Metadata challenges. Social challenges.

Tribal bonding and sharing

Crossing Tribal Boundaries

Across communities and disciplines (MIT)

“ Intellectual Fusion” & “Swarming”; breaking down silos

Understanding outside my expertise. E.g. sources of error

Metadata challenges.

Social challenges.

Curation by the Monks Curation by the Masses Automated Curation refine validate refine validate Curation by Developers seed seed refine validate seed A Change in the World The WS4LS BioCatalogue Project Manchester & EBI

Challenges - where to start? If we thought about them hard we wouldn’t have done it. So we didn’t. Its, er, my experiment. National Centre for e-Social Science

User Participation for Content and Functionality Adoption depends on lots of shared services and workflows and enabling Scientists to add value through applications and collaborative tagging The Selfish Scientist – e-Science is me-Science Incentive models for Scientists to share?

Adoption depends on lots of shared services and workflows

and enabling Scientists to add value through applications and collaborative tagging

The Selfish Scientist –

e-Science is me-Science

Incentive models for Scientists to share?

We expect workflow versioning. We encourage workflow evolution by the developers and others. Versions to be re-pooled. Ownership Sharing Permissions Separate update of workflow from update of metadata. Workflow Versioning and Sharing

We expect workflow versioning.

We encourage workflow evolution by the developers and others.

Versions to be re-pooled.

Ownership

Sharing

Permissions

Separate update of workflow from update of metadata.

Control in the hands of the developers. Is this flexible enough? Sense of Ownership. IP. Authorship attribution. Copyright. Provenance propagation. Validation, Safety, Trust. When does a workflow get changed so much its no longer the same workflow? Workflow Versioning and Sharing

Control in the hands of the developers.

Is this flexible enough?

Sense of Ownership. IP. Authorship attribution. Copyright.

Provenance propagation.

Validation, Safety, Trust.

When does a workflow get changed so much its no longer the same workflow?

More Challenges Privacy, Copyright, IP Incentives to share, collaboratively curate and behave. Altruism, mischief, self-interest Credit, reputation, fame, impact. Me-Science. Expectations – suppose its wrong? Will I get sued? Scientists are naughty too. Quality control. Palpability, buyer beware, memes are tricky things. Community Trust models. Policing. Auto-checking? Shaming? Sustainability leverages The Open Source Development Model On young peoples’ endless enthusiasm to share. Better tooling.

Privacy, Copyright, IP

Incentives to share, collaboratively curate and behave.

Altruism, mischief, self-interest

Credit, reputation, fame, impact. Me-Science.

Expectations – suppose its wrong? Will I get sued?

Scientists are naughty too.

Quality control.

Palpability, buyer beware, memes are tricky things. Community Trust models. Policing. Auto-checking? Shaming?

Sustainability leverages

The Open Source Development Model

On young peoples’ endless enthusiasm to share.

Better tooling.

Keep your Users Close Web 2.0 Style development Perpetual Beta Users Add Value Parties HackFests Advocates Guinea Pigs

Perpetual Beta

Users Add Value

Do we still need curators? “ Hell is other people’s metadata”

Yes! Open tagging, folksonomies, blogging, profiles, recommendations, Social network analysis and e-tracking, workflow analytics. Deafened by the Shouting Overseeing but not Controlling. Review and add value. Tagging -> Structured Pipeline Reconcile Creative Freewheeling with need to Organise. Impedance mismatch between research activities and the recording of research data. Dynamic Scientists vs Prescriptive Platform Ontology dictatorship. Reconciling managed ontologies with emergent folksonomies. Encourage Tagging with Ontologies. Metadata Creep: multi-form, multiple-descriptions

Open tagging, folksonomies, blogging, profiles, recommendations, Social network analysis and e-tracking, workflow analytics.

Deafened by the Shouting

Overseeing but not Controlling. Review and add value.

Tagging -> Structured Pipeline

Reconcile Creative Freewheeling with need to Organise.

Impedance mismatch between research activities and the recording of research data. Dynamic Scientists vs Prescriptive Platform

Ontology dictatorship.

Reconciling managed ontologies with emergent folksonomies. Encourage Tagging with Ontologies.

Metadata Creep: multi-form, multiple-descriptions

Pay as you Go, Emergent Curation Gain Pain Very BAD Good, but Unlikely Just right Folksonomy Tagging Hard Core Ontology Curation

Must be careful to avoid technology seduction Computer people want to do interesting stuff; curators want stability and reliability; users want simplicity. Smart tools and good interfaces often outwit clever techniques. Bummer. However….

Model Flexibility Semantic Web! Flexibility of RDF Incrementality of OWL Self description Reasoning when needed Open Linked Data, SKOS Open Archives Initiative – Object Reuse and Exchange (OAI-ORE) compound object information and standardised and interoperable mechanisms

Semantic Web!

Flexibility of RDF

Incrementality of OWL

Self description

Reasoning when needed

Open Linked Data, SKOS

Open Archives Initiative – Object Reuse and Exchange (OAI-ORE)

compound object information and standardised and interoperable mechanisms

Metadata Middleware Annotations are First Class Citizens A technology independent metadata abstraction layer. Natively supported by the middleware infrastructure. S-OGSA Framework from the Semantic Grid. Semantic Bindings Management.

Annotations are First Class Citizens

A technology independent metadata abstraction layer. Natively supported by the middleware infrastructure.

S-OGSA Framework from the Semantic Grid.

Semantic Bindings Management.

Curation Design Patterns http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html The Long Tail Data is the Next Intel Inside Users Add Value Network Effects by Default Some Rights Reserved The Perpetual Beta Cooperate, Don't Control Beyond a Single Device

http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html

The Long Tail

Data is the Next Intel Inside

Users Add Value

Network Effects by Default

Some Rights Reserved

The Perpetual Beta

Cooperate, Don't Control

Beyond a Single Device

SMARTER Curation S elective – ROI M ass community annotation – cooperate don’t control. Harness people cycles and network effects. A utomate – Derive. Harness compute cycles and network effects. R eact – to changes, automate responses T imely – just in time E xpedient – just enough R eview – seed, oversee & refine rather than control Changes in model support and infrastructure Changes in work practice – if it’s a problem, it’s a people problem.

S elective – ROI

M ass community annotation – cooperate don’t control. Harness people cycles and network effects.

A utomate – Derive. Harness compute cycles and network effects.

R eact – to changes, automate responses

T imely – just in time

E xpedient – just enough

R eview – seed, oversee & refine rather than control

Changes in model support and infrastructure

Changes in work practice – if it’s a problem, it’s a people problem.

Credits David De Roure Matt Lee David Withers Don Cruickshank Jiten Bhagat David Newman Mark Borkum Danius Michaelides Ed Zaluska Jeremy Frey Simon Coles Marco Roos Rob Procter Alex Voss Duncan Hull Paul Fisher Antoon Goderis Katy Wolstencroft Franck Tanoh Robert Stevens Martin Senger Khalid Belhajjame Andy Brass Norman Paton Rodrigo Lopez (EBI) Tom Oinn (EBI) Pinar Alper, Phil Lord, Chris Wroe Mark Wilkinson (BioMOBY) Savas Parastatidis (Microsoft) Alan Williams, Stuart Owen, June Finch, Stian Soiland, Kaixuan Wang, Oscar Corcho And the rest of my Grid and OntoGrid

David De Roure

Matt Lee

David Withers

Don Cruickshank

Jiten Bhagat

David Newman

Mark Borkum

Danius Michaelides

Ed Zaluska

Jeremy Frey

Simon Coles

Marco Roos

Rob Procter

Alex Voss

Duncan Hull

Paul Fisher

Antoon Goderis

Katy Wolstencroft

Franck Tanoh

Robert Stevens

Martin Senger

Khalid Belhajjame

Andy Brass

Norman Paton

Rodrigo Lopez (EBI)

Tom Oinn (EBI)

Pinar Alper, Phil Lord, Chris Wroe

Mark Wilkinson (BioMOBY)

Savas Parastatidis (Microsoft)

Alan Williams, Stuart Owen, June Finch, Stian Soiland,

Kaixuan Wang, Oscar Corcho

And the rest of my Grid and OntoGrid

For More Information myExperiment: http://myexperiment.org David De Roure dder@ecs.soton.ac.uk myGrid: Taverna and WS4LS Catalogue http://www.mygrid.org.uk SoapLab: http://soaplab.sourceforge.net/soaplab2/ OntoGrid: Semantic middleware http://www.semanticgrid.org

myExperiment:

http://myexperiment.org

David De Roure dder@ecs.soton.ac.uk

myGrid: Taverna and WS4LS Catalogue

http://www.mygrid.org.uk

SoapLab:

http://soaplab.sourceforge.net/soaplab2/

OntoGrid: Semantic middleware

http://www.semanticgrid.org

Add a comment

Related presentations

Related pages

Microsoft Keynote

10/19/2007 00:39:55 Title: Microsoft Keynote Subject: Global High Tech Summit 2007 Description: Template: Updates by Sarah Shapiro, Silver Fox Productions.
Read more

keynote | Digital Curation Centre - dcc.ac.uk

Liz Lyon will be a keynote speaker at the JISC/CNI Meeting 2010 recently announced by JISC and the Coalition for Networked Information (CNI) which will ...
Read more

DCC - Brandeis University

The Data Compression Conference (DCC) ... Keynote Addresses "Advances and Challenges in Imaging from Space" M. Dirk Robinson Engineering Manager
Read more

Digital Curation Centre (DCC)

Digital curation involves maintaining, ... Digital curation FAQ. How can the DCC help you? ... 2007; 2006; 2005; 2004; 2003;
Read more

2nd International DCC Conference 2006: Digital Data ...

Opening Keynote Address. ... "2nd International DCC Conference 2006: ... Alexander Ball and Manjula Patel Publication Date: 30-January-2007 Publication: ...
Read more

Dcc | LinkedIn

View 31538 Dcc posts, presentations, experts, and more. Get the professional knowledge you need on LinkedIn.
Read more

Preservation Research landscape - ipres-conference.org

iPRES 2007 11 October 2007 Beijing Photo taken Dec 03 at Sint ra Museum of Modern Art ... (DCC) – http://www.dcc.ac.uk •DELOS – http://www.dpc.delos.info
Read more

Technik-Lexikon: Fahrdynamik und Fahrsicherheit: DCC ...

... DCC (Dynamic Chassis Control) Freitag, 28.03.2008, 16:17. Teilen ... 2000 bis 2007: Athletische ... vor 6 Min. +++ Apple Keynote im Live-Ticker +++...
Read more

Microsoft – Official Home Page

At Microsoft our mission and values are to help people and businesses throughout the world realize their full potential.
Read more

2nd Workshop on Compression, Text, and Algorithms 2007

2nd Workshop on Compression, Text, and Algorithms 2007 ... DCC, Univ. Chile Preliminary Program. Session 1: Pattern Matching 9:00- 9:45 Keynote ...
Read more