Building AuroraObjects- Ceph Day Frankfurt

50 %
50 %
Information about Building AuroraObjects- Ceph Day Frankfurt
Technology
cto

Published on March 11, 2014

Author: Inktank_Ceph

Source: slideshare.net

Description

Wido den Hollander, 42on.com

Building AuroraObjects

Who am I? ● Wido den Hollander (1986) ● Co-owner and CTO of a PCextreme B.V., a dutch hosting company ● Ceph trainer and consultant at 42on B.V. ● Part of the Ceph community since late 2009 – Wrote the Apache CloudStack integration – libvirt RBD storage pool support – PHP and Java bindings for librados

PCextreme? ● Founded in 2004 ● Medium-sized ISP in the Netherlands ● 45.000 customers ● Started as a shared hosting company ● Datacenter in Amsterdam

What is AuroraObjects? ● Under the name “Aurora” my hosting company PCextreme B.V. has two services: – AuroraCompute, a CloudStack based public cloud backed by Ceph's RBD – AuroraObjects, a public object store using Ceph's RADOS Gateway ● AuroraObjects is a public RADOS Gateway service (S3 only) running in production

The RADOS Gateway (RGW) ● Service objects using either Amazon's S3 or OpenStack's Swift protocol ● All objects are stored in RADOS, the gateway is just a abstraction between HTTP/S3 and RADOS

The RADOS Gateway

Our ideas ● We wanted to cache frequently accessed objects using Varnish – Only possible with anonymous clients ● SSL should be supported ● Storage between Compute and Objects services shared ● 3x replication

Varnish ● A caching reverse HTTP proxy – Very fast ● Up to 100k requests/s – Configurable using the Varnish Configuration Language (VCL) – Used by Facebook and eBay ● Not a part of Ceph, but can be used with the RADOS Gateway

The Gateways ● SuperMicro 1U – AMD Opteron 6200 series CPU – 128GB RAM ● 20Gbit LACP trunk ● 4 nodes ● Varnish runs locally with RGW on each node – Uses the RAM to cache objects

The Ceph cluster ● SuperMicro 2U chassis – AMD Opteron 4334 CPU – 32GB Ram – Intel S3500 80GB SSD for OS – Intel S3700 200GB SSD for Journaling – 6x Seagate 3TB 7200RPM drive for OSD ● 2Gbit LACP trunk ● 18 nodes ● ~320TB of raw storage

Our problems ● When we cache Objects in Varnish, they don't show up in the usage accounting of the RGW – The HTTP request never reaches RGW ● When a Object changes we have to purge all caches to maintain cache consistency – User might change a ACL or modify a object with a PUT request ● We wanted to make cached requests cheaper then non-cached requests

Our solution: Logstash ● All requests go from Varnish into Logstash and into ElasticSearch – From ElasticSearch we do the usage accounting ● When Logstash sees a PUT, DELETE or PUT request it makes a local request which sends out a multicast to all other RGW nodes to purge that specific object ● We also store bucket storage usage in ElasticSearch so we have an average over the month

Our solution: Logstash ● All requests go from Varnish into Logstash and into ElasticSearch – From ElasticSearch we do the usage accounting ● When Logstash sees a PUT, DELETE or PUT request it makes a local request which sends out a multicast to all other RGW nodes to purge that specific object ● We also store bucket storage usage in ElasticSearch so we have an average over the month

LogStash and ElasticSearch ● varnishncsa → logstash → redis → elasticsearch input { pipe { command => "/usr/local/bin/varnishncsa.logstash" type => "http" } } ● And we simply execute varnishncsa varnishncsa -F '%{VCL_Log:client}x %{VCL_Log:proto}x %{VCL_Log:authorization}x % {Bucket}o %m %{Host}i %U %b %s %{Varnish:time_firstbyte}x %{Varnish:hitmiss}x'

%{Bucket}o? ● With %{<header>}o you can display the output of the return header <header>: – %{Server}o: Apache 2 – %{Content-Type}o: text/html ● We patched RGW (is in master) that it can optionally return the bucket name in the response: 200 OK Connection: close Date: Tue, 25 Feb 2014 14:42:31 GMT Server: AuroraObjects Content-Length: 1412 Content-Type: application/xml Bucket: "ceph" X-Cache-Hit: No ● 'rgw expose bucket = true' in ceph.conf returns Bucket

Usage accounting ● We only query RGW for storage usage and also store that in ElasticSearch ● ElasticSearch is used for all traffic accounting – Allows us to differentiate between cached and non-cached traffic

Back to Ceph: CRUSHMap ● A good CRUSHMap design should reflect the physical topology of your Ceph cluster – All machines have a single power supply – The datacenter has a A and B powercircuit ● We use a STS (Static Transfer Switch) to create a third powercircuit ● With CRUSH we store each replica on a different powercircuit – When a circuit fails, we loose 2/3 of the Ceph cluster – Each powercircuit has it's own switching / network

The CRUSHMap type 7 powerfeed host ceph03 { alg straw hash 0 item osd.12 weight 1.000 item osd.13 weight 1.000 .. } powerfeed powerfeed-a { alg straw hash 0 item ceph03 weight 6.000 item ceph04 weight 6.000 .. } root ams02 { alg straw hash 0 item powerfeed-a item powerfeed-b item powerfeed-c } rule powerfeed { ruleset 4 type replicated min_size 1 max_size 3 step take ams02 step chooseleaf firstn 0 type powerfeed step emit }

The CRUSHMap

Testing the CRUSHMap ● With crushtool you can test your CRUSHMap ● $ crushtool -c ceph.zone01.ams02.crushmap.txt -o /tmp/crushmap ● $ crushtool -i /tmp/crushmap --test --rule 4 --num-rep 3 –show- statistics ● This shows you the result of the CRUSHMap: rule 4 (powerfeed), x = 0..1023, numrep = 3..3 CRUSH rule 4 x 0 [36,68,18] CRUSH rule 4 x 1 [21,52,67] .. CRUSH rule 4 x 1023 [30,41,68] rule 4 (powerfeed) num_rep 3 result size == 3: 1024/1024 ● Manually verify those locations are correct

A summary ● We cache anonymously accessed objects with Varnish – Allows us to process thousands of requests per second – Saves us I/O on the OSDs ● We use LogStash and ElasticSearch to store all requests and do usage accounting ● With CRUSH we store each replica on a different power circuit

Resources ● LogStash: http://www.logstash.net/ ● ElasticSearch: http://www.elasticsearch.net/ ● Varnish: http://www.varnish-cache.org/ ● CRUSH: http://ceph.com/docs/master/ ● E-Mail: wido@42on.com ● Twitter: @widodh

Add a comment

Related presentations

Related pages

Ceph Day Frankfurt

Ceph Day Frankfurt. Ceph Days In Frankfurt. ... Building Tomorrow's Ceph Sage Weil, ... Building AuroraObjects Wido den Hollander, ...
Read more

Building AuroraObjects- Ceph Day Frankfurt - Technology

Wido den Hollander, 42on.com ... 1. Building AuroraObjects . 2. Who am I? Wido den Hollander (1986) Co-owner and CTO of a PCextreme B.V., a dutch hosting ...
Read more

Ceph Days and Developer Summit Giant

Read on for details of these Ceph community events. Ceph Day Frankfurt. ... “Building AuroraObjects ... in putting on a Ceph day, the Ceph community is ...
Read more

Keynote: Building Tomorrow's Ceph - Ceph Day Frankfurt ...

London Ceph Day: Unified Cloud Storage with Synnefo + Ceph + Ganeti
Read more

Ceph Days and Developer Summit Giant - 推酷

Ceph Day Frankfurt. If you haven’t been to a Ceph Day event yet, you are certainly missing out! ... “Building AuroraObjects ...
Read more

Scaling Ceph at CERN - Ceph Day Frankfurt ...

Scaling Ceph at CERN Dan van der Ster (daniel.vanderster@cern.ch) Data and Storage Service Group | CERN IT Department CERN’s Mission and Tools CERN ...
Read more

A Day in Frankfurt - Documents - docslide.us

Download A Day in Frankfurt. Transcript. X. Recommended. Using Ceph in a Private Cloud ... Building AuroraObjects- Ceph Day Frankfurt Wido den Hollander, ...
Read more

readthedocs.org

... OSSStorageDriver PCextreme AuroraObjects ClickAURORAOBJECTS single region ... times per day which means ... AWS Frankfurt, Germany region ...
Read more

SEO Frankfurt; Building Your Reputation - Documents

SEO Frankfurt; Building Your ReputationThe term "Search motor marketing (SEO)" is not new to many of us. Just as the term implies, ...
Read more