Boltdb - an embedded key value database

50 %
50 %
Information about Boltdb - an embedded key value database

Published on November 28, 2016

Author: awmanoj


1. Manoj Awasthi, Tech Architect @Tokopedia Boltdb an embedded key value database

2. Structure of this talk..

3. A bit of history.. Image Server 1 Image Server 2 Image Server N ... 123.jpg : server_03; 246.jpg : server_02; 345.jpg : server_17; …. …. tokopedia image router .. as time passed

4. gradually, we kept newer images to s3:// .. • All images uploaded from that point onwards could be served from a single server • no such mapping (mongodb) was required, • Old images still being served in the same way and did need the mapping. • But, now the database was “read only” and fixed size. 
 Also: We suffered frequent memory spikes and process kill by linux “out of memory killer” (mongodb) which led both to latency and downtimes.

5. Search for alternative.. Requirements boiled down to: 
 • Fast retrieval - needed all across • Scalable - to tens of thousands of queries per second • Persistent - don’t have to recompute everything from scratch on each bootup (in case!) Read only usage - not a constraint but this could help in “trade off” Also, we can do with

6. Redis! Well, it could work well given our fixed data size and read only usage. In fact, we did try and saw scale problems with redis (high cpu load). Also $$.
 We needed a lightweight embedded database .. “BoltDB” - an embedded key value database written in golang looked interesting.
 Why not redis?

7. Compact, fast. Based on LMDB [0]. Both use B+ tree for storage, maintain ACID semantics with fully serializable transactions, and support many other database features. Simple While LMDB focuses on raw performance, Boltdb is focussed on ease of use. Fits better for a “read heavy” usage (read more, write less) Written in golang so fits well with rest of the stack at Tokopedia. [0] Why boltdb?

8. Why boltdb? In traditional sense, boltdb is not really a database but simply a memory mapped file. But it provides ACID semantics and other properties associated with databases so calling it a DB is not misnomer, though. No installation required ● It comes as a library ● Installation is as simple as 
 importing it in your go program

9. Opening the 
 database.. Add a key value Fetch a value by key

10. bolt - command line utility Bolt is a tool for inspecting bolt databases 
 Things to use it for: Check the integrity of bolt database Run synthetic benchmarks against bolt database for gauging read and write performance Print basic info about database Generate useful statistics on all pages in the database Available under cmd/bolt in the github repository.

11. Caveat: random writes slow as the db grows! Let’s get back to the problem we were solving. 
 The raw data from mongodb exported using mongo-export utility was ~ 4G. This translated to ~ 13G boltdb database file. 
 Export tool that we wrote to export from mongo output to boltdb became much slower as the size of the database grew. Hence we used sharding to horizontally partition the data from mongo into many small files and have a smaller boltdb file for each of them.

12. The result! Following is the output of `free -m’ on one of the servers we use: 
 Snippet of `top’ output from the same server:

13. Limitations Bolt is good for read intensive workloads. Random writes can be slow. Bolt uses B+ tree internally so there can be a lot of random page access. SSDs provide a significant performance boost over spinning disks. Bolt can handle databases much larger than available physical RAM, provided its memory map fits in process address space. It may be problematic on 32 bit systems. The data structures used by bolt are memory mapped and hence endian specific. This means that you cannot copy a bolt file from a little endian machine to a big endian machine and have it work. (Most modern CPUs are little endian).

14. Conclusion Boltdb worked pretty well for our usecase. Service handles many thousands of queries per second, is not limited by physical RAM and doing well! :D Do give it a try if it fits some of your use case. References: [1]

15. Connect with me over: { “Email”: “”, 
 “Twitter”: “”, 
 “Linkedin”: “”, 
 “Github”: “”, 
 “Blog”: [ “”, “”]
 } Thank you!

Add a comment