OceanStore tahoe2

50 %
50 %
Information about OceanStore tahoe2

Published on October 7, 2007

Author: Melinda

Source: authorstream.com

OceanStore Global-Scale Persistent Storage:  OceanStore Global-Scale Persistent Storage John Kubiatowicz Ubiquitous Devices  Ubiquitous Storage:  Ubiquitous Devices  Ubiquitous Storage Consumers of data move, change from one device to another, work in cafes, cars, airplanes, the office, etc. Properties REQUIRED for Endeavour storage substrate: Strong Security: data must be encrypted whenever in the infrastructure; resistance to monitoring Coherence: too much data for naïve users to keep coherent “by hand” Automatic replica management and optimization: huge quantities of data cannot be managed manually Simple and automatic recovery from disasters: probability of failure increases with size of system Utility model: world-scale system requires cooperation across administrative boundaries Utility-based Infrastructure:  Pac Bell Sprint IBM AT&T Canadian OceanStore Service provided by confederation of companies Monthly fee paid to one service provider Companies buy and sell capacity from each other IBM Utility-based Infrastructure State of the Art?:  State of the Art? Widely deployed systems: NFS, AFS (/DFS) Single “regions” of failure, caching only at endpoints ClearText exposed at various levels of system Compromised server all data on server compromised Mobile computing community: Coda, Ficus, Bayou Small scale, fixed coherence mechanism Not optimized to take advantage of high-bandwidth connections between server components ClearText also exposed at various levels of system Web caching community: Inktomi, Akamai Specialized, incremental solutions Caching along client/server path, various bottlenecks Database Community: Interfaces not usable by legacy applications ACID update semantics not always appropriate OceanStore Assumptions:  OceanStore Assumptions Untrusted Infrastructure: Infrastructure is comprised of untrusted components Only cyphertext within the infrastructure Must be careful to avoid leaking information Mostly Well-Connected: Data producers and consumers are connected to a high-bandwidth network most of the time Exploit mechanism such as multicast for quicker consistency between replicas Promiscuous Caching: Data may be cached anywhere, anytime Global optimization through tacit information collection Operations Interface with Conflict Resolution: Applications employ an operations-oriented interface, rather than a file-systems interface Coherence is centered around conflict resolution OceanStore Technologies I: Naming and Data Location:  OceanStore Technologies I: Naming and Data Location Requirements: Find nearby data without global communication Don’t get in way of rapid relocation of data Search should reflect locality and network efficiency System-level names should help to authenticate data OceanStore Technology: Underlying namespace is flat and built from cryptographic signatures (160-bit SHA-1) Data location is a form of gradient-search of local pools of data (use of attenuated Bloom-filters) Fallback to global, “exact” indexing structure in case data not found with local search Bloom Filters (brief aside):  Bloom Filters (brief aside) Use multiple hash functions to hash each item Use hash values to generate bit offsets Combine bits of all items together Bit vector is summary To use summary, hash new value. Value is NOT in pool if any bit=0 Pool Summary 1 0 0 1 1 1 0 Pool Cascaded-Pools Hierarchy:  Cascaded-Pools Hierarchy Local Summary Local Summary Local Summary Local Summary Downward Summary Downward Summary Local Summary Every pool has good randomized index structure (such as Treaps) Progress Last Term::  Progress Last Term: Sean Rhea and Westly Weimer Built data location facility on simulated network Uses attenuated bloom filters Performs search by passing messages from node to node. All state kept in messages! Updates filters through semi-chaotic passing of information between neighbors Resembles compiler dataflow algorithm Can be shown to converge Future? Find other “holographic representations of location” Whole new approach to data location? Unified name service, data location, routing OceanStore Technologies II: High-Availability and Disaster Recovery:  OceanStore Technologies II: High-Availability and Disaster Recovery Requirements: Handle diverse, unstable participants in OceanStore Eliminate backup as independent (and fallible) technology Flexible “disaster recovery” for everyone OceanStore Technologies: Use of erasure-codes (Tornado codes) to provide stable storage for archival copies and snapshots of live data Mobile replicas are self-contained centers for logging and conflict resolution Version-based update for painless recovery Redundancy exploited to tolerate variation of performance from network servers (RIVERS) Progress Last Term:  Progress Last Term Hakim Weatherspoon, Shelley Zhuang and Matthew Delco Designed a storage system using erasure codes Compared Reed-Solomon codes to Tornado codes: over 1000 to 1 performance advantage in favor of Tornado codes! Explored different distribution and gathering techniques Future? Can this system be turned into a generic replacement for standard UNIX backup? Transform into underlying archival piece of OceanStore Use of Tornado codes for Rivers-like adaptation to variations in latency Self-repairing data structures??? OceanStore Technologies III: Introspective Monitoring and Optimization:  OceanStore Technologies III: Introspective Monitoring and Optimization Requirements: Reasonable job on a global-scale optimization problem Take advantage of locality whenever possible Sensitivity to limited storage and bandwidth at endpoints Stability in chaotic environment OceanStore Technologies: Introspective Monitoring and analysis of relationships: between different pieces of data between users of a given piece of data Rearrangement of data in response to monitoring: Economic models with analogies to simulated annealing Sub problem of Tacit Information Analysis (option 5) Progress Last Term:  Progress Last Term Patrick R. Eaton, Dennis Geels and Greg Mori Introspective monitoring of local file system Clustering of related data together Identifying of patterns for prefetching Built filesystem simulation system in which to explore techniques Byung Hoon Kang, Sarika Sahni and H. Wilson So in collaboration with Laurent El Ghaoui Time-series extraction of patterns Do people move predictably? Can we use this? Future? Kalman filters, hidden-Markov Models, and other statistical methods for automatically migrating data More realistic traces (collaboration with Mary Baker?) OceanStore Technologies IV: Rapid Update in an Untrusted Infrastructure:  OceanStore Technologies IV: Rapid Update in an Untrusted Infrastructure Requirements: Scalable coherence mechanism which provides performance even though replicas widely separated Operate directly on encrypted data Updates should not reveal info to untrusted servers OceanStore Technologies: Operations-based interface using conflict resolution Use of incremental cryptographic techniques: No time to decrypt/update/re-encrypt Use of oblivious function techniques to perform this update (fallback to secure hardware in general case) Use of automatic techniques to verify security protocols Progress Last Term:  Progress Last Term Monica Chew and Chris Wells and David Bindel Designed ECFS, the extended cryptographic filesystem Explored metadata in an untrusted infrastructure Uses encryption and signatures to provide protection against substitution attacks Dawn Song, David Wagner, Doug Tygar New technique for encrypting data in a way that is searchable Could perform general “grep” functionality at server without revealing what you are searching for Use in conflict resolution seems plausible Future? Key problem: Denial of Service Conflict resolution interfaces Computation on Encrypted data? Grab Bag:  Grab Bag Use of Archival system to handle portions of the Berkeley backup? To get “same level of service” need 12TB of spinning storage Want it to be off site for disaster recovery New Opportunity: 100TB of spinning storage (Brewster Kahle) OceanStore as a software distribution technology: Microsoft windows in the net? Versioning mechanism for handling software upgrades Two-Phase Implementation::  Two-Phase Implementation: Year I: Read-Mostly Prototype Construction of data location facility Initial introspective gathering of tacit info and adaptation Initial archival techniques (use of erasure codes) Unix file-system interface under Linux (“legacy apps”) Year III: Full Prototype Final conflict resolution and encryption techniques More sophisticated tacit info gathering and rearrangement Final object interface and integration with Endeavour applications Wide-scale deployment via NTON and Internet-2

Add a comment

Related presentations