Shifting the Burden from the User to the Data Provider

50 %
50 %
Information about Shifting the Burden from the User to the Data Provider

Published on February 17, 2014

Author: HDFEOS



As the volume and complexity of data from myriad Earth Observing platforms, both remote sensing and in-situ increases so does the demand for access to both data and information products from these data. The audience no longer is restricted to an investigator team with specialist science credentials. Non-specialist users from scientists from other disciplines, science-literate public, to teachers, to the general public and decision makers want access. What prevents them from this access to resources? It is the very complexity and specialist developed data formats, data set organizations and specialist terminology. What can be done in response? We must shift the burden from the user to the data provider. To achieve this our developed data infrastructures are likely to need greater degrees of internal code and data structure complexity to achieve (relatively) simpler end-user complexity. Evidence from numerous technical and consumer markets supports this scenario. We will cover the elements of modern data environments, what the new use cases are and how we can respond to them.

Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA projects 1

Outline • Background, definitions • Informatics -> e-Science • Data has lots of uses – Virtual Observatories: use cases – Data Framework: Examples – Data ingest, integration, mining and … • Discussion 2 Fox HDF: Semantic Data Burden Shift Oct 15, 2008

Background Scientists should be able to access a global, distributed knowledge base of scientific data that: • appears to be integrated • appears to be locally available But… data is obtained by multiple instruments, using various protocols, in differing vocabularies, using (sometimes unstated) assumptions, with inconsistent (or non-existent) meta-data. It may be inconsistent, incomplete, evolving, and distributed And… there exist(ed) significant levels of semantic heterogeneity, large-scale data, complex data types, legacy systems, inflexible and unsustainable implementation technology… 3 Fox HDF: Semantic Data Burden Shift Oct 15, 2008

Information Information has But data products have Lots of Audiences More Strategic Less Strategic SCIENTISTS TOO From “Why EPO?”, a NASA internal report on science education, 2005 4 Fox HDF: Semantic Data Burden Shift Oct 15, 2008

The Information Era: Interoperability Modern information and communications technologies are creating an “interoperable” information era in which ready access to data and information can be truly universal. Open access to data and services enables us to meet the new challenges of understand the Earth and its space environment as a complex system: • managing and accessing large data sets • higher space/time resolution capabilities • rapid response requirements • data assimilation into models • crossing disciplinary boundaries. 5 Fox HDF: Semantic Data Burden Shift Oct 15, 2008

Shifting the Burden from the User to the Provider 6 Fox HDF: Semantic Data Burden Shift Oct 15, 2008

Modern capabilities 7 Fox HDF: Semantic Data Burden Shift Oct 15, 2008

Mind the Gap! As a result - finding out who is doing what, • Informatics ofinformation science includes the sharing experience/ expertise, and substantial science of (data and) information, the practice coordination: of information processing, and the engineering • There is/ was still a gap between science and the of information systems. Informatics studies the underlying infrastructure and technology that is structure, behavior, and interactions of natural available and artificial systems that store, process and • Cyberinfrastructure is the new communicate (data and) information. It also research environment(s) that support develops its own conceptual and theoretical advanced data acquisition, data foundations. Since computers, individuals and storage, data management, data organizations all process information, integration, data mining, data informatics has computational, cognitive and visualization and other computing and social aspects, including study of the social information processing services over impact of information technologies. Wikipedia. the Internet. 8 Fox HDF: Semantic Data Burden Shift Oct 15, 2008

Progression after progression Informatics IT Cyber Infrastru cture Cyber Informatics Core Informatics Science Informatics, aka Xinformatics Science, SBAs 9 Fox HDF: Semantic Data Burden Shift Oct 15, 2008

Virtual Observatories • Conceptual examples: • In-situ: Virtual measurements – Related measurements • Remote sensing: Virtual, integrative measurements – Data integration • Managing virtual data products/ sets 10

Virtual Observatories Make data and tools quickly and easily accessible to a wide audience. Operationally, virtual observatories need to find the right balance of data/model holdings, portals and client software that researchers can use without effort or interference as if all the materials were available on his/her local computer using the user’s preferred language: i.e. appear to be local and integrated Likely to provide controlled vocabularies that may be used for interoperation in appropriate domains along with database interfaces for access and storage and “smart” tools for evolution and maintenance. 11

Early days of discipline specific VOs ? VO2 VO3 VO1 DB1 DB2 DB3 ………… DBn 12

The Astronomy approach; datatypes as a service Limited interoperability VO App1 VO App2 VOTable VO App3 Open Geospatial Consortium: Simple Image Access Protocol Web {Feature, Coverage, Mapping}Simple Service Spectrum Sensor Web Enablement: VO layer Sensor {Observation, Planning, Analysis}Lightweight semantics Service DB1 use DB2 Access Protocol Simple Time Access Protocol Limited meaning, hard coded the same approach DBn DB Limited extensibility 3 ………… Under review 13

Added value Education, clearinghouses, disciplines, et c. other services, Semantic mediation layer - mid-upper-level VO Portal Semantic interoperability Added value VO API Web Serv. Added value Semantic query, hypothesis and inference Mediation Layer • Ontology - capturing concepts of Parameters, Instruments, Date/Time, Data Product (and Semantic mediation layer - VSTO associated classes, properties) and Service Classes • Maps queries to underlying data Metadata, schema, data • Generates access requests for metadata, data • Allows queries, reasoning, analysis, new value Added DB2 DB3 hypothesis generation, testing, explanation, et… … … … c. DB 1 Query, access and use of data low level DBn 14

Content: Coupling Energetics and Dynamics of Atmospheric Regions WEB Community data archive for observations and models of Earth's upper atmosphere and geophysical indices and parameters needed to interpret them. Includes browsing capabilities by periods, > 310 instruments, models, > 820 15 parameters…

Content: Mauna Loa Solar real-time Observatory Near products data from Hawaii from a variety of solar instruments. Source for space weather, solar variability, and basic solar physics Other content used too - Center for Integrated Space Weather Modeling 16

Semantic Web Methodology and Technology Development Process • • Establish and improve a well-defined methodology vision for Semantic Technology based application development Leverage controlled vocabularies, et c. Rapid Open World: Evolve, Iterate, Prototype Redesign, Redeploy Leverage Technology Infrastructure Adopt Science/Expert Technology Approach Review & Iteration Use Tools Analysis Use Case Small Team, mixed skills Develop model/ ontology 17

Science and technical use cases Find data which represents the state of the neutral atmosphere anywhere above 100km and toward the arctic circle (above 45N) at any time of high geomagnetic activity. – Extract information from the use-case - encode knowledge – Translate this into a complete query for data - inference and integration of data from instruments, indices and models Provide semantically-enabled, smart data query services via a SOAP web for the Virtual IonosphereThermosphere-Mesosphere Observatory that retrieve data, filtered by constraints on Instrument, Date-Time, and Parameter in any order and with constraints included in any combination. 18

VSTO - semantics and ontologies in an operational environment:, Web Service 19 Fox RPI: Semantic Data Frameworks May 14, 2008

Semantic filtering by domain or instrument hierarchy Partial exposure of Instrument class hierarchy - users seem to LIKE THIS 20


Inferred plot type and return formats for data products 22 Fox RPI: Semantic Data Frameworks May 14, 2008

Inferred plot type and return required axes data 23 Fox RPI: Semantic Data Frameworks May 14, 2008

Semantic Web Benefits • • • • • Unified/ abstracted query workflow: Parameters, Instruments, Date-Time Decreased input requirements for query: in one case reducing the number of selections from eight to three Generates only syntactically correct queries: which was not always insurable in previous implementations without semantics Semantic query support: by using background ontologies and a reasoner, our application has the opportunity to only expose coherent query (portal and services) Semantic integration: in the past users had to remember (and maintain codes) to account for numerous different ways to combine and plot the data whereas now semantic mediation provides the level of sensible data integration required, now exposed as smart web services – understanding of coordinate systems, relationships, data synthesis, transformations, et c. – returns independent variables and related parameters • A broader range of potential users (PhD scientists, students, professional research associates and those from outside the fields) 24

What is a Non-Specialist Use Case? Teacher accesses internet goes to An Educational Virtual Observatory and enters a search for “Aurora”. Someone should be able to query a virtual observatory without having specialist knowledge 25

What should the User Receive? Teacher receives four groupings of search results: 1) Educational materials: and 2) Research, data and tools: via VSTO, VSPO and VITMO, knows to search for brightness, or green/red line emission 3) Did you know?: Aurora is a phenomena of the upper terrestrial atmosphere (ionosphere) also known as Northern Lights 4) Did you mean?: Aurora Borealis or Aurora Australis, et c. 26

Semantic Information Integration: Concept map for educational use of science data in a lesson plan 27 Fox RPI: Semantic Data Frameworks May 14, 2008

28 Fox RPI: Semantic Data Frameworks May 14, 2008

Issues for Virtual Observatories rs se u • Scaling to large numbers of data providers and or redefining the role(s)/ relations with them f as re • Crossing discipline boundaries n a rde • Security, access to resources, policies bu tly • Branding and attribution (where did this data come en from and whourr the credit, is it the correct version, c gets is this anrauthoritative source?) ae se • Provenance/derivation (propagating key information he Tas it passes through a variety of services, copies of processing algorithms, …) • Data quality, preservation, stewardship 29

Problem definition • Data is coming in faster, in greater volumes and outstripping our ability to perform adequate quality control • Data is being used in new ways and we frequently do not have sufficient information on what happened to the data along the processing stages to determine if it is suitable for a use we did not envision • We often fail to capture, represent and propagate manually generated information that need to go with the data flows • Each time we develop a new instrument, we develop a new data ingest procedure and collect different metadata and organize it differently. It is then hard to use with previous projects • 30 The task of event determination and feature classification is onerous and we don't do it until after we get the data

Use cases • • • • • • • • • • Determine which flat field calibration was applied to the image taken on January, 26, 2005 around 2100UT by the ACOS Mark IV polarimeter. Which flat-field algorithm was applied to the set of images taken during the period November 1, 2004 to February 28, 2005? How many different data product types can be generated from the ACOS CHIP instrument? What images comprised the flat field calibration image used on January 26, 2007 for all ACOS CHIP images? What processing steps were completed to obtain the ACOS PICS limb image of the day for January 26, 2005? Who (person or program) added the comments to the science data file for the best vignetted, rectangular polarization brightness image from January, 26, 2005 1849:09UT taken by the ACOS Mark IV polarimeter? What was the cloud cover and atmospheric seeing conditions during the local morning of January 26, 2005 at MLSO? Find all good images on March 21, 2008. Why are the quick look images from March 21, 2008, 1900UT missing? Why does this image look bad? 31

Provenance • Origin or source from which something comes, intention for use, who/what generated for, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery, documented in detail sufficient to allow reproducibility 32




Visual browse 36



Discussion (1) • Taken together, an emerging set of collected experience manifests an emerging informatics core capability that is starting to take data intensive science into a new realm of realizability and potentially, sustainability – – – – Use cases (i.e. real users) X-informatics Core Informatics Cyber Informatics • There are implications for data models 39

Progression after progression Informatics IT Cyber Infrastru cture Cyber Informatics Core Informatics Science Informatics Science, SBAs Example: •CI = OPeNDAP server running over HTTP/HTTPS •Cyberinformatics = Data (product) and service ontologies, triple store •Core informatics = Reasoning engine (Pellet), OWL •Science (X) informatics = Use cases, science domain terms, concepts in an ontology 40

Discussion (2) • Data and information science is becoming the ‘fourth’ column (along with theory, experiment and computation) • Semantics (of the data) are a very key ingredient -> may imply richer data models 41

Summary • Informatics is playing a key role in filling the gap between science (and the spectrum of non-expert) use and generation and the underlying cyberinfrastructure, i.e. in shifting the burden – This is evident due to the emergence of Xinformatics (world-wide) • Our experience is implementing informatics as semantics in Virtual Observatories (as a working paradigm) and Grid environments – VSTO is only one example of success – Data mining, data integration, smart search, provenance are close behind • Informatics is a profession and a community activity and requires efforts in all 3 sub-areas (science, core, cyber) and must be synergistic 42 Fox RPI: Semantic Data Frameworks May 14, 2008

More Information • Virtual Solar Terrestrial Observatory (VSTO):, • Semantically-Enalbed Science Data Integration (SESDI): • Semantic Provenance Capture in Data Ingest Systems (SPCDIS): • Semantic Knowledge Integration Framework (SKIF/SAM): • Semantic Web for Earth and Environmental Terminology (SWEET): • Conferences: AGU 2008, EGU 2009, ISWC 2008, CIKM 2008, … • Peter Fox 43

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

Shifting the Burden from the User to the Data Provider ...

Title: Shifting the Burden from the User to the Data Provider
Read more

Plex Systems - Wikipedia, the free encyclopedia

Plex was the only ERP software solution provider placed ... to production data. [23] While Plex Systems calls ... shifting the burden of the ...
Read more

Shifting the burden of proof in Malaysia - Al Jazeera English

But the move has sparked a public outcry among Malaysian internet users, ... provider as a subscriber of ... shifting the burden of proof ...
Read more

Shifting the Burden | Healthcare Informatics Magazine ...

Shifting the Burden . October 1, ... the number of users. ... shifts IT burden and maintenance to the service provider, ...
Read more

Data-as-a-service: the Next Step in the As-a-service Journey

... is shifting user expectations; ... Given the breadth and depth of data providers available, ... Data as a Service is the next step in the as a service ...
Read more

Shifting the Burden, How Much Can Government Download to ...

Shifting the Burden, How Much Can Government Download to the Non ... distance between social service providers and service users. ... Data provided are ...
Read more

Technology: Transferring the Burden of Compliance

Based on the Application Service Provider ... but shifting the burden ... Controls must prevent useful access to technology by unauthorized users.
Read more

Shifting Administrative Burden to the State: The Case of ...

Shifting Administrative Burden to ... were intended to reduce administrative burden and increase enrollment. Data ... The Results of Shifting Burden from ...
Read more

SqlConnection.ConnectionString-Eigenschaft (System.Data ...

The .NET Framework Data Provider for SQL Server uses its own ... The application should make sure that a user cannot embed additional connection ...
Read more