Update on Crimson - the Seastarized Ceph - Seastar Summit

50 %
50 %
Information about Update on Crimson - the Seastarized Ceph - Seastar Summit

Published on March 26, 2020

Author: ScyllaDB

Source: slideshare.net

1. Update on Crimson The Seatarized Ceph kchai@redhat.com Seastar Summit 2019

2. A unified storage system RGW S3 and Swift object storage LIBRADOS Low-level storage API RADOS Reliable, elastic, distributed storage layer with replication and erasure coding RBD Virtual block device CEPHFS Distributed network file system OBJECT BLOCK FILE

3. RADOS -- The Cluster APPLICATION RADOS CLUSTER LIBRADOS M M M

4. OSD (Object storage Daemon) ● Stores data on an HDD or SSD ● Services client IO requests ● Cooperatively peers, replicates, rebalances data ● Reports stats to manager daemons ● 10s-1000s per cluster ceph-osd (crimson)

5. PG (placement groups) objects 1532.000 1532.001 1532.002 1532.003 1532.004 1532.005 ... OSDS N replicas of each PG 10s of PGs per OSD PLACEMENT GROUPS pgid = hash(obj_name) % pg_num many GiB of data per PG 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.fff ... client

6. A closer look OSD frontend 1.0 1.2 1.4 1.6 1.8

7. Crimson - a faster OSD - Less overhead - Bypass kernel - Zero memcpy - Less context switches - Understands modern storage devices

8. share something => share nothing What we imaged: - Multi-reactor OSD - Shared connections - Connections to manager daemons - Connections to peer OSDs - Connections to clients - Shared io queue - Shared metadata - Knowledge about the cluster What we have now: - Single threaded OSD - Fully connected network - Monitor’s load increases

9. Average IOPS

10. Average latency

11. Instructions per cycle

12. CPU util

13. bluestore metadata RocksDB BlueRocksEnv BlueFS Allocator data

14. Seastarized bluestore Allocator data metadata RocksDB SeastarEnv BlueFS* Seastar AIO

15. Seastar + RocksDB

16. alien threadsseastar reactor Alienized bluestore OSD 1.0 1.2 1.4 1.6 1.8

17. SeaStore ???

18. Q & A

Add a comment