Apache jclouds at Maginatics

50 %
50 %
Information about Apache jclouds at Maginatics

Published on March 6, 2014

Author: Maginatics

Source: slideshare.net


“Experiences with billions of blobs across many blobstore providers.”

How Maginatics used Apache jclouds and architected MagFS to achieve broad blobstore portability and high scalability for the Maginatics Cloud Storage Platform, a cloud-optimized NAS filer.

Presented by Andrew Gaul at the Apache jclouds meetup on March 4, 2014.

Apache jclouds at Maginatics Experiences with billions of blobs across many blobstore providers. Andrew Gaul jclouds PMC

Agenda • • • • • • • What is blob storage Blobstore compatibility Case study: Maginatics Cloud Storage Platform (MCSP) Scaling Lessons learned Future directions Conclusion http://jclouds.apache.org https://maginatics.com Maginatics 2

What is blob storage? Blobstores offer key-value storage that is: • Scalable: 10s TB with few nodes and 100s of PB with thousands of nodes • Inexpensive: built on commodity hardware • Available/durable: tolerates hardware failures Do not offer guarantees that block storage and file systems provide: • Limited interface: get, put, delete • Eventual consistency: blob reads may return stale or no data for some limited time Maginatics 3

jclouds supports many providers Multiple public and private implementations allow customer trade-offs. Public Object Storage Maginatics Private Object Storage 4

Blobstore compatibility jclouds abstracts differences between APIs, but semantic differences remain: • Atmos: cannot overwrite blob • AWS-S3: cannot mutate or append to a blob, cannot put blob without explicit size • Swift: eventually consistent Portable applications must use the lowest-common denominator functionality: • Write to blobs exactly once, never mutate or append • Can read from blobs at any time, but must retry due to eventual consistency • When deleting, never reuse blob name Maginatics 5

Maginatics Cloud Storage Platform (MCSP) • Virtualized, cloud-based storage system • Layers network file system semantics on top of blob storage • Run any application on a variety platforms, including multiple-client file sharing • MCSP is a cloud-optimized NAS filer • Smart client gives LAN performance over WANs • Flexible deployment options: public, private, hybrid cloud • Refer to SNIA SDC 2013 slides for technical background Maginatics 6

Scaling Throughput MCSP supports thousands of clients reading and writing simultaneously. Single server could become a bottleneck, especially smaller instance sizes. Instead vend signed URLs to clients to allow them direct access to blobstore: • Cryptographically signed URLs allow read or write access to a specific blob for a specified time • Can embed other properties like content length and hash This technique allows a single MCSP server to mediate many Gbit/s throughput! Maginatics 7

Scaling Number of Blobs MCSP manages 100 TB of blob data across 1 billion blobs. Some providers require specific naming or sharding for best performance: • Atmos: no more than 100,000 blobs per directory, shard across directories • AWS-S3: name blob with unique prefixes • Swift: no more than 1 million blobs per container, shard across containers • GCS & HPC: remove Expect: 100-continue • Other quirks: Cleversafe performs better when disabling container listing Surprisingly challenging workload: removing all blobs from a large container. Maginatics 8

Scaling Blob Sizes Most MCSP blobs have small sizes, but some use cases require larger ones. jclouds support up to 2 GB blobs across all blobstores: • Could support 5 GB with Java 7 AWS-S3, Azure, and Swift support multi-part upload, tested with 40 GB blobs: Large blobs increase chances of transient network errors and failures: • Use a repeatable Payload like ByteSource to allow jclouds to retry • Always include MD5 checksum to guarantee data integrity Maginatics 9

Lessons Learned Cross-provider support required substantial effort: • Long tail of issues with authentication, configuration, error codes, timeouts, etc. • S3- and Swift-compatible clones are like snowflakes, no two are alike Measuring performance is difficult: • Blob naming and sharding important • Public providers will reshard very active containers for better performance • Private blobstores require configuration and tuning Mock blobstores (filesystem and transient) helped testing. Maginatics 10

Future Directions More diagnostic tools, especially for private blobstores. • Maginatics will contribute benchmark tool and compatibility tester Modernize with Guava additions, e.g., ByteSource, Hashing, MediaType. Simplify implementation: • De-async? • Remove annotations? New providers: • Modernized Swift (in-progress) • Google Cloud Storage (GSoC 2014?) • Amazon Glacier • Joyent Manta? Maginatics 11

Recap • jclouds can provide portability between blobstore providers if your application does not strongly depend on blobstore semantics • Applications can scale with the correct architecture and implementation choices • More work to do to make jclouds an inviting platform for all Java developers • jclouds community helped Maginatics over the last three years and we look forward to continuing to contribute Maginatics 12

http://jclouds.apache.org https://maginatics.com Thank you. 13

Add a comment

Related presentations

Presentación que realice en el Evento Nacional de Gobierno Abierto, realizado los ...

In this presentation we will describe our experience developing with a highly dyna...

Presentation to the LITA Forum 7th November 2014 Albuquerque, NM

Un recorrido por los cambios que nos generará el wearabletech en el futuro

Um paralelo entre as novidades & mercado em Wearable Computing e Tecnologias Assis...

Microsoft finally joins the smartwatch and fitness tracker game by introducing the...

Related pages

EMC and Maginatics Join Forces

EMC and Maginatics Join Forces. October 28, 2014. We are delighted to announce that EMC and Maginatics are joining forces. EMC is the leader in ...
Read more

Apache jclouds® :: Who is using jclouds

Who is using jclouds. ... Apache ACE: Uses jclouds to launch nodes in Amazon EC2: ... Maginatics: Uses jclouds to integrate with a variety of object stores:
Read more

Maginatics March News - Smarter Storage with Endpoint Agents

Apache jclouds at Maginatics . Apache jclouds is a multi-cloud toolkit which Maginatics uses to provide support for a broad range of object store providers.
Read more

Introduction to Apache jclouds - The Linux Foundation

Introduction to Apache jclouds Everett Toews Developer Advocate @everett_toews ApacheCon April 7, 2014 @ 11:55 am
Read more

Apache jclouds® :: Release Notes Version 1.2.1

Toggle navigation Apache jclouds ... The 1.2 release of jclouds includes results of ... Hugo Duncan(duncan.hugo), Andrew Gaul (gaul at maginatics.com ...
Read more

Apache jclouds at Maginatics - Gaul

Apache jclouds at Maginatics - Gaul
Read more

jcloudsProposal - Incubator Wiki

jclouds Proposal for Apache Incubator. Abstract. jclouds is an open source cloud agnostic library that enables developers to access a variety of ...
Read more

GitHub - maginatics/jclouds-site: Mirror of Apache jclouds ...

jclouds-site - Mirror of Apache jclouds site source repo
Read more

GitHub - maginatics/jclouds-karaf: Read-only mirror of ASF ...

This project currently hosts a Karaf feature repository for easy installation of jclouds inside Apache Karaf. It also provides Managed Service Factories ...
Read more