Cyberinfrastructure and its Role in Science

50 %
50 %
Information about Cyberinfrastructure and its Role in Science
Technology

Published on July 21, 2009

Author: kiddlec

Source: slideshare.net

Description

This presentation examines some of the challenges scientists face and describes various cyberinfrastructure technologies that help address these challenges. Example projects employing cyberinfrastructure technologies that we have worked on at the Grid Research Centre, including the GeoChronos project, are also presented. This presentation was given at the IAI International Wireless Sensor Networks Summer School held at the University of Alberta on July 6th, 2009.

Cyberinfrastructure and its Role in Science Cameron Kiddle Research Fellow, Grid Research Centre Adjunct Assistant Professor, Department of Computer Science, University of Calgary Distributed Systems Architect, WestGrid

Outline Challenges Cyberinfrastructure Cyberinfrastructure Technologies Examples ICE Force Project Molecular Dynamics Simulations GT4-based Grid for Canada Fire Dynamics Simulator Rendering on the Cloud GeoChronos IAI Summer School July 6, 2009 Cyberinfrastructure -

Challenges

Cyberinfrastructure

Cyberinfrastructure Technologies

Examples

ICE Force Project

Molecular Dynamics Simulations

GT4-based Grid for Canada

Fire Dynamics Simulator

Rendering on the Cloud

GeoChronos

Collaboration Challenges Familiarity/awareness of collaboration tools Keeping all interested parties in the loop Finding related work and researchers Keeping up to date with current research Collaboration while working in the field IAI Summer School July 6, 2009 Cyberinfrastructure -

Familiarity/awareness of collaboration tools

Keeping all interested parties in the loop

Finding related work and researchers

Keeping up to date with current research

Collaboration while working in the field

Data Challenges Acquisition of data Many different data sources Large quantities of data Different regulations/mechanisms for accessing data Lack of automation Finding the right data Bandwidth constraints Managing data Scattered and unorganized data Inadequate tools for recording/maintaining metadata Data without metadata is meaningless Lack of suitable metadata standards Validation of metadata Tracking provenance of data Pre-processing of data Raw data typically cannot be directly analyzed Significant amount of time spent preparing data for analysis Lack of automation IAI Summer School July 6, 2009 Cyberinfrastructure -

Acquisition of data

Many different data sources

Large quantities of data

Different regulations/mechanisms for accessing data

Lack of automation

Finding the right data

Bandwidth constraints

Managing data

Scattered and unorganized data

Inadequate tools for recording/maintaining metadata

Data without metadata is meaningless

Lack of suitable metadata standards

Validation of metadata

Tracking provenance of data

Pre-processing of data

Raw data typically cannot be directly analyzed

Significant amount of time spent preparing data for analysis

Lack of automation

Application Challenges Limited availability of computing resources Access to and familiarity of heterogeneous computing resources Fault tolerance and reliability Access to software available in research lab while in field or other locations Installing, configuring and updating software System dependencies of software Awareness and suitability of available software Sharing applications and results IAI Summer School July 6, 2009 Cyberinfrastructure -

Limited availability of computing resources

Access to and familiarity of heterogeneous computing resources

Fault tolerance and reliability

Access to software available in research lab while in field or other locations

Installing, configuring and updating software

System dependencies of software

Awareness and suitability of available software

Sharing applications and results

Cyberinfrastructure “ Like the physical infrastructure of roads, bridges, power grids, telephone lines, and water systems that support modern society, "cyberinfrastructure" refers to the distributed computer, information and communication technologies combined with the personnel and integrating components that provide a long-term platform to empower the modern scientific research endeavor.” Report of the National Science Foundation Blue-Ribbon Advisory Panel on Cyberinfrastructure, 2003. IAI Summer School July 6, 2009 Cyberinfrastructure -

“ Like the physical infrastructure of roads, bridges, power grids, telephone lines, and water systems that support modern society, "cyberinfrastructure" refers to the distributed computer, information and communication technologies combined with the personnel and integrating components that provide a long-term platform to empower the modern scientific research endeavor.”

Report of the National Science Foundation Blue-Ribbon Advisory Panel on Cyberinfrastructure, 2003.

Cyberinfrastructure Technologies Grid Computing Cloud Computing Virtualization Web 2.0 / Social Networking Web Portals / Scientific Gateways Semantic Web … IAI Summer School July 6, 2009 Cyberinfrastructure -

Grid Computing

Cloud Computing

Virtualization

Web 2.0 / Social Networking

Web Portals / Scientific Gateways

Semantic Web



Grid Computing Many different definitions/uses computational grids, data grids, desktop grids, campus grids, sensor grids, access grids Coordinated sharing of heterogeneous resources across administrative domains IAI Summer School July 6, 2009 Cyberinfrastructure - Resources Shared by Virtual Organization X Resources Shared by Virtual Organization Y Domain A Domain B Domain C

Many different definitions/uses

computational grids, data grids, desktop grids, campus grids, sensor grids, access grids

Coordinated sharing of heterogeneous resources across administrative domains

Grid Middleware The layer between users/applications and grid resources that glues everything together Example grid middleware Globus Toolkit GT2 – pre-standards GT4 – Web Services based UNICORE gLite ARC NAREGI IAI Summer School July 6, 2009 Cyberinfrastructure -

The layer between users/applications and grid resources that glues everything together

Example grid middleware

Globus Toolkit

GT2 – pre-standards

GT4 – Web Services based

UNICORE

gLite

ARC

NAREGI

Key Grid Middleware Services Security Services Concerned with authentication, authorization, secure communication, … Information Services Provide information about resources, policy, services and applications to tools and users Data Management Services Manage movement and replication of data as well as metadata about data Execution Management Services Handle placement, provisioning and lifetime management of jobs and workflows IAI Summer School July 6, 2009 Cyberinfrastructure -

Security Services

Concerned with authentication, authorization, secure communication, …

Information Services

Provide information about resources, policy, services and applications to tools and users

Data Management Services

Manage movement and replication of data as well as metadata about data

Execution Management Services

Handle placement, provisioning and lifetime management of jobs and workflows

Benefits of Grid Computing Easier access to more resources Users/organizations can share resources Single sign-on Common interface (hide heterogeneity) Improved data management Efficient file transfers Abstraction of physical location of data Automated execution of jobs and workflows IAI Summer School July 6, 2009 Cyberinfrastructure -

Easier access to more resources

Users/organizations can share resources

Single sign-on

Common interface (hide heterogeneity)

Improved data management

Efficient file transfers

Abstraction of physical location of data

Automated execution of jobs and workflows

Example Grid Projects IAI Summer School July 6, 2009 Cyberinfrastructure - Name Description LHC Computing Grid http://lcg.web.cern.ch/ data storage and analysis infrastructure for the high energy physics community using the Large Hadron Collider (LHC) at CERN (ATLAS Tier-1 site at TRIUMF in British Columbia) Network for Earthquake Engineering Simulation (NEES) http://www.nees.org/ a US national network of 15 facilities to study the impact of earthquakes on buildings, bridges, etc. Expanding GEOsciences on DEmand (EGEODE) http://www.egeode.org/ a virtual organization (VO) associated with EGEE that is dedicated to research in geoscience for both public and private industrial R&D and academic laboratories International Virtual Observatory Alliance (IVOA) http://www.ivoa.net/ development of standards and infrastructure to share and analyze astronomical archives from around the world

Cloud Computing Transparent access to scalable and dynamic services over the Internet Key features: Everything as a Service (EaaS) Utility/On-demand Accessibility/Transparency Scalability Virtualization IAI Summer School July 6, 2009 Cyberinfrastructure -

Transparent access to scalable and dynamic services over the Internet

Key features:

Everything as a Service (EaaS)

Utility/On-demand

Accessibility/Transparency

Scalability

Virtualization

Cloud Computing Solutions IAI Summer School July 6, 2009 Cyberinfrastructure -

Benefits of Cloud Computing Reduce capital, support and maintenance costs Pay only for what you use Get access to more/fewer resources when needed Ready to use for users No more downloads, installations or updates Simplify and speed up software development Don’t have to support multiple platforms Application popularity and lifespan difficult to predict Scale applications according to user demand IAI Summer School July 6, 2009 Cyberinfrastructure -

Reduce capital, support and maintenance costs

Pay only for what you use

Get access to more/fewer resources when needed

Ready to use for users

No more downloads, installations or updates

Simplify and speed up software development

Don’t have to support multiple platforms

Application popularity and lifespan difficult to predict

Scale applications according to user demand

Cloud Computing Case Study: Application Popularity on Facebook Difficult to predict popularity and lifespan of applications Facebook Application Growth Sep. 2007: ~ 3700 Sep. 2008: ~39000 Facebook Application Popularity (Sep. 12, 2008) 39181 applications Active user data for 37155 apps 3 apps > 10 million active users 80% apps < 1000 active users IAI Summer School July 6, 2009 Cyberinfrastructure - Monthly Active Users vs. Rank of Facebook Applications (September 12, 2008)

Difficult to predict popularity and lifespan of applications

Facebook Application Growth

Sep. 2007: ~ 3700

Sep. 2008: ~39000

Facebook Application Popularity (Sep. 12, 2008)

39181 applications

Active user data for 37155 apps

3 apps > 10 million active users

80% apps < 1000 active users

Cloud Computing Case Study: Shrek (Dreamworks) Shrek (2001) – 5 million CPU render hours Shrek 2 (2004) – 10 million CPU render hours Shrek 3 (2007) – 20 million CPU render hours IAI Summer School July 6, 2009 Cyberinfrastructure - (Source: R. Rowe. DreamWorks Animation &quot;Shrek the Third&quot;: Linux Feeds an Ogre. Linux Journal . June 5, 2007. (http://www.linuxjournal.com/article/9653)) Time to Render 1 CPU 100 CPUs 10000 CPUs Shrek 571 years 5.7 years 21 days Shrek 2 1142 years 11.4 years 42 days Shrek 3 2283 years 22.8 years 83 days

Shrek (2001) – 5 million CPU render hours

Shrek 2 (2004) – 10 million CPU render hours

Shrek 3 (2007) – 20 million CPU render hours

Cloud Computing Case Study: Animoto Animoto ( http://animoto.com ) Produces professional quality videos from images Runs on Amazon EC2 Popularity soared when promoted on Facebook During the course of 4 days: Jumped from 8 to 450 renderings per minute ~20000 new users per hour 3500 instances running on Amazon EC2 at peak IAI Summer School July 6, 2009 Cyberinfrastructure - (Source: D. Barker. You Need 3,500 Servers by When?! On-demand Enterprise . 2008.07.07)

Animoto ( http://animoto.com )

Produces professional quality videos from images

Runs on Amazon EC2

Popularity soared when promoted on Facebook

During the course of 4 days:

Jumped from 8 to 450 renderings per minute

~20000 new users per hour

3500 instances running on Amazon EC2 at peak

Virtualization Can transform a single physical machine into multiple virtual machines (VMs) each with their own OS and software stack Virtualization software Xen, KVM, VMWare Support allocation, deallocation, checkpointing and migration of VMs Benefits Custom environments (root access) More efficient use of resources (consolidation) System maintenance without disruption IAI Summer School July 6, 2009 Cyberinfrastructure -

Can transform a single physical machine into multiple virtual machines (VMs) each with their own OS and software stack

Virtualization software

Xen, KVM, VMWare

Support allocation, deallocation, checkpointing and migration of VMs

Benefits

Custom environments (root access)

More efficient use of resources (consolidation)

System maintenance without disruption

Web 2.0 – The “Social Web” Aimed at: Providing feature rich user environments Making it easier for users to generate Web content Improving online social connectivity Example Web 2.0 technologies Blogs (WordPress, TypePad) Wikis (Wikipedia) Mashups (HousingMaps, ChicagoCrime) Widgets/Gadgets (iGoogle, Netvibes) Social networks (Facebook, MySpace, YouTube) IAI Summer School July 6, 2009 Cyberinfrastructure -

Aimed at:

Providing feature rich user environments

Making it easier for users to generate Web content

Improving online social connectivity

Example Web 2.0 technologies

Blogs (WordPress, TypePad)

Wikis (Wikipedia)

Mashups (HousingMaps, ChicagoCrime)

Widgets/Gadgets (iGoogle, Netvibes)

Social networks (Facebook, MySpace, YouTube)

Social Networking Sites/Platforms IAI Summer School July 6, 2009 Cyberinfrastructure -

Web Portals / Scientific Gateways Aimed at providing a community of users access to computing resources through a common Web-based interface Web portal development tools GridSphere (portlet based) Web 2.0/Social Networking Examples TeraGrid Scientific Gateways (over 30 of them) nanoHUB IAI Summer School July 6, 2009 Cyberinfrastructure -

Aimed at providing a community of users access to computing resources through a common Web-based interface

Web portal development tools

GridSphere (portlet based)

Web 2.0/Social Networking

Examples

TeraGrid Scientific Gateways (over 30 of them)

nanoHUB

Semantic Web Aimed at representing knowledge, not just information Connecting and relating data in a way understandable by machines Semantic Web standards Resource Description Framework (RDF) Web Ontology Language (OWL) IAI Summer School July 6, 2009 Cyberinfrastructure -

Aimed at representing knowledge, not just information

Connecting and relating data in a way understandable by machines

Semantic Web standards

Resource Description Framework (RDF)

Web Ontology Language (OWL)

Confederation Bridge ICE Force Monitoring Project Monitoring of forces on the Confederation Bridge Data analyzed by civil engineering groups at University of Calgary and Carleton University GRC developed solution to automate data management as part of a CANARIE AAP project IAI Summer School July 6, 2009 Cyberinfrastructure - ( http://www.confederationbridge.com ) ( http://www.confederationbridge.com )

Monitoring of forces on the Confederation Bridge

Data analyzed by civil engineering groups at University of Calgary and Carleton University

GRC developed solution to automate data management as part of a CANARIE AAP project

ICE Force - Technologies Used Grid Middleware GT4 Data Management Proactive Data Management Service (PDMS) Data Transfer - GridFTP, RFT Replication Management – RLS Metadata Management - MCS IAI Summer School July 6, 2009 Cyberinfrastructure -

Grid Middleware

GT4

Data Management

Proactive Data Management Service (PDMS)

Data Transfer - GridFTP, RFT

Replication Management – RLS

Metadata Management - MCS

Molecular Dynamics Simulations (GROMACS) GROMACS Parallel molecular dynamics simulation application Can simulate hundreds to millions of particles Simulation runs can take days, weeks or months Issues with long running jobs Fault tolerance Scheduler policy constraints IAI Summer School July 6, 2009 Cyberinfrastructure - ( http://moose.bio.ucalgary.ca/ )

GROMACS

Parallel molecular dynamics simulation application

Can simulate hundreds to millions of particles

Simulation runs can take days, weeks or months

Issues with long running jobs

Fault tolerance

Scheduler policy constraints

GROMACS - Grid Enabled Solution Automated grid enabled solution developed by GRC to manage GROMACS simulations as part of a CANARIE AAP project Long jobs split into a series of shorter jobs Automates checkpointing, migration and reconfiguration of jobs IAI Summer School July 6, 2009 Cyberinfrastructure -

Automated grid enabled solution developed by GRC to manage GROMACS simulations as part of a CANARIE AAP project

Long jobs split into a series of shorter jobs

Automates checkpointing, migration and reconfiguration of jobs

GROMACS - Portal IAI Summer School July 6, 2009 Cyberinfrastructure -

GROMACS - Technologies Used Grid Middleware GT4 Information Services WS MDS Data Management PDMS (GridFTP, RFT, RLS, MCS) Execution Management Custom system (Condor-G, WS GRAM) Portal GridSphere IAI Summer School July 6, 2009 Cyberinfrastructure -

Grid Middleware

GT4

Information Services

WS MDS

Data Management

PDMS (GridFTP, RFT, RLS, MCS)

Execution Management

Custom system (Condor-G, WS GRAM)

Portal

GridSphere

Web Service based Grid Environment for Canada IAI Summer School July 6, 2009 Cyberinfrastructure - Established a GT4-based grid environment from resources across Canada (CANARIE CIIP)

Established a GT4-based grid environment from resources across Canada (CANARIE CIIP)

GT4-based Grid - Model Schemas Models developed to describe systems, applications and scheduler policy (GRC Model Schema) IAI Summer School July 6, 2009 Cyberinfrastructure - System Model Class Diagram

Models developed to describe systems, applications and scheduler policy (GRC Model Schema)

GT4-based Grid – Viewing Resource Information Used WebMDS, a customizable Web based interface for viewing resource information published by WS MDS IAI Summer School July 6, 2009 Cyberinfrastructure -

Used WebMDS, a customizable Web based interface for viewing resource information published by WS MDS

GT4-based Grid - Technologies Used Grid Middleware GT4 Data Management GridFTP, RFT Information Services GRC Model Schema, WS MDS, WebMDS Execution Management Condor-G, WS GRAM IAI Summer School July 6, 2009 Cyberinfrastructure -

Grid Middleware

GT4

Data Management

GridFTP, RFT

Information Services

GRC Model Schema, WS MDS, WebMDS

Execution Management

Condor-G, WS GRAM

Example: Fire Simulation Developed a comprehensive environment for the Fire Dynamics Simulator (FDS) as part of a collaborative project between GRC and HP Labs Deployed on HP Labs Data Centre at University of Calgary Initial focus of project Leverage Web 2.0 technologies Explore use of virtualization in a utility/cloud computing environment IAI Summer School July 6, 2009 Cyberinfrastructure -

Developed a comprehensive environment for the Fire Dynamics Simulator (FDS) as part of a collaborative project between GRC and HP Labs

Deployed on HP Labs Data Centre at University of Calgary

Initial focus of project

Leverage Web 2.0 technologies

Explore use of virtualization in a utility/cloud computing environment

Fire Simulation - Technologies Used User level Web 2.0/social networking technology (Facebook) Service provider level LAMP environment (Linux, Apache, MySQL, Perl/Python/PHP) Simulation (FDS, Condor) Visualization (Smokeview, VNC) Resource (utility) provider level Cloud computing technology (ASPEN) Virtual machine technology (Xen) IAI Summer School July 6, 2009 Cyberinfrastructure -

User level

Web 2.0/social networking technology (Facebook)

Service provider level

LAMP environment (Linux, Apache, MySQL, Perl/Python/PHP)

Simulation (FDS, Condor)

Visualization (Smokeview, VNC)

Resource (utility) provider level

Cloud computing technology (ASPEN)

Virtual machine technology (Xen)

Example: Rendering on the Cloud GRC created an on-demand cloud rendering service for EDM Studio Cybera Pilot Project Technologies used: Cloud computing technology (ASPEN) Virtual machine technology (Xen) Social networking technology (Ning/Elgg) IAI Summer School July 6, 2009 Cyberinfrastructure -

GRC created an on-demand cloud rendering service for EDM Studio

Cybera Pilot Project

Technologies used:

Cloud computing technology (ASPEN)

Virtual machine technology (Xen)

Social networking technology (Ning/Elgg)

An on-line platform For: Earth Observation Scientists Facilitating: Collaboration between scientists Data access, management and sharing Application access, management and sharing Leveraging: Web 2.0 / social networking technologies (Elgg) Semantic Web technologies (RDF, OWL) Cloud computing and virtualization technologies (ASPEN, Xen) IAI Summer School July 6, 2009 Cyberinfrastructure -

An on-line platform

For:

Earth Observation Scientists

Facilitating:

Collaboration between scientists

Data access, management and sharing

Application access, management and sharing

Leveraging:

Web 2.0 / social networking technologies (Elgg)

Semantic Web technologies (RDF, OWL)

Cloud computing and virtualization technologies (ASPEN, Xen)

GeoChronos - Collaboration Social networking portal Elgg-based (elgg.org) Social networking services Blogs Tags Media/document sharing Wikis Friends/contacts Groups Discussions Message boards Calendars Status News Feeds IAI Summer School July 6, 2009 Cyberinfrastructure - http://geochronos.org/

Social networking portal

Elgg-based (elgg.org)

Social networking services

Blogs

Tags

Media/document sharing

Wikis

Friends/contacts

Groups

Discussions

Message boards

Calendars

Status

News Feeds

GeoChronos - Data Data Acquisition Automated acquisition of data from sensors (ground, airborne, satellite) or third party Data Storage Store, share, browse and search data i.e., spectral library Data Processing Automated data workflows i.e., mosaic, reproject and subset MODIS data IAI Summer School July 6, 2009 Cyberinfrastructure -

Data Acquisition

Automated acquisition of data from sensors (ground, airborne, satellite) or third party

Data Storage

Store, share, browse and search data

i.e., spectral library

Data Processing

Automated data workflows

i.e., mosaic, reproject and subset MODIS data

GeoChronos - Applications Interactive Application Service (IAS) On-line, on-demand access to scientific applications Share application sessions and data with other users Access control to applications Batch Processing Service Batch processing environment for longer running data processing tasks or simulations For use directly by individual users or as part of automated data workflows IAI Summer School July 6, 2009 Cyberinfrastructure -

Interactive Application Service (IAS)

On-line, on-demand access to scientific applications

Share application sessions and data with other users

Access control to applications

Batch Processing Service

Batch processing environment for longer running data processing tasks or simulations

For use directly by individual users or as part of automated data workflows

GeoChronos - Project Team IAI Summer School July 6, 2009 Cyberinfrastructure - Dr. Arturo Sanchez-Azofeifa University of Alberta Dr. John Gamon University of Alberta Dr. Benoit Rivard University of Victoria Dr. Rob Simmonds University of Calgary Prinicipal Investigators Project Coordination Platform Development Domain Scientists

GeoChronos - Virtual Organization IAI Summer School July 6, 2009 Cyberinfrastructure -

Contact Information IAI Summer School July 6, 2009 Cyberinfrastructure - Cameron Kiddle [email_address] http://pages.cspc.ucalgary.ca/~kiddlec/ http://grid.ucalgary.ca/

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

Cyberinfrastructure for Education and Learning for the Future

Cyberinfrastructure for Education and Learning for the ... in science and industry. Its role in supporting ... if any is the role of cyberinfrastructure?
Read more

Cyberinfrastructure, Data, and Libraries, Part 1: A ...

Cyberinfrastructure, Data, and Libraries, ... of a new Office for Cyberinfrastructure (OCI). Since its ... change in the role of computing in science at ...
Read more

Cyberinfrastructure.Frameworkfor.21 .Century.. Science.and ...

Cyberinfrastructure.Frameworkfor.21st.Century.. Science.and ... Cyberinfrastructure!is!rapidly!advancing!science,!andchanging!its!conduct ... the!key!role ...
Read more

Cyberinfrastructure for e-Science

Cyberinfrastructure for e-Science. ... services” vision and the role of semantics in such an e-Infrastructure ... Science Magazine ...
Read more

CyberinfrastruCture V 21st Century DisCoVery

Vision for 21st Century Discovery. ... NSF investments in data cyberinfrastructure. Science and engineering ... cyberinfrastructure and its widespread ...
Read more

Developing Cyberinfrastructure for Earth Science: an ...

Developing Cyberinfrastructure for Earth Science: ... necessary to support science data and its ... mimic DARPA's role in the development of ...
Read more

davidribes.com - Cyberinfrastructure

What is cyberinfrastructure? ... is the leading role of ... constitute cyberinfrastructure and its promise to revolutionize the sciences. ...
Read more

Cyberinfrastructure and the Future of Collaborative Work ...

Cyberinfrastructure and the Future of Collaborative Work. ... Even in the life sciences, ... If the cyberinfrastructure is to deliver on its enormous ...
Read more

Cyberinfrastructure, Science Gateways, Campus Bridging ...

Cyberinfrastructure, Science Gateways, Campus Bridging, and Cloud Computing. Stewart, Craig A.; Knepper, Richard D.; Link, Matthew R.; Pierce, Marlon ...
Read more