advertisement

The InnoDB Storage Engine for MySQL

50 %
50 %
advertisement
Information about The InnoDB Storage Engine for MySQL

Published on February 26, 2014

Author: morgo

Source: slideshare.net

advertisement

<Insert Picture Here> The InnoDB Storage Engine for MySQL
 Morgan Tocker, MySQL Community Manager
 http://www.tocker.ca/


Safe Harbor Statement The  following  is  intended  to  outline  our  general  product  direction.   It  is  intended  for  information  purposes  only,  and  may  not  be   incorporated  into  any  contract.  It  is  not  a  commitment  to  deliver   any  material,  code,  or  functionality,  and  should  not  be  relied  upon   in  making  purchasing  decisions.   
 The  development,  release,  and  timing  of  any  features  or   functionality  described  for  Oracle’s  products  remains  at  the  sole   discretion  of  Oracle.

4 Years of MySQL Innovation MySQL Cluster 7.3 MySQL Workbench 6.0 MySQL Migration Wizard <Insert Picture Here> MySQL 5.5 MySQL 5.6 Windows installer & Tools MySQL 5.7 M y S Q L Cluster MySQL MySQL Enterprise Monitor 2.3 & e r M a n a g 3.0 Applier for Hadoop MySQL Enterprise Backup Security MySQL Utilities MySQL Workbench 5.2 & 6.0 Scalability MySQL Cluster 7.2 HA MySQL Enterprise MySQL Cluster 7.1 O r a c l e C e r t i f i c a t i o n s Audit

Hello and Welcome! • I will be talking about InnoDB’s internal behaviour. • Not talking (much) about MySQL. • Aim of this talk is to give you X-ray vision. • i.e. not so many direct takeaways, but one day it will help you debug a problem.

Prerequisites Copyright  ©  2012  Oracle  and/or  its  affiliates.  All  rights  reserved.

MySQL Architecture

IO Performance L1 cache reference Branch mispredict L2 cache reference Mutex lock/unlock Main memory reference Compress 1K bytes with Zippy Send 2K bytes over 1 Gbps network Read 1 MB sequentially from memory Round trip within same datacenter Disk seek Read 1 MB sequentially from disk Send packet CA->Netherlands->CA 0.5 ns! 5 ns! 7 ns! 25 ns! 100 ns! 3,000 ns! 20,000 ns! 250,000 ns! 500,000 ns! 10,000,000 ns! 20,000,000 ns! 150,000,000 ns See: http://www.linux-mag.com/cache/7589/1.html and Google http:// www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf

IO Performance (cont.) • 5-10ms per disk IO. • Maybe 50us for a high end SSD. • Still not “memory speed”.

Buffered IO • Operating Systems compensate well already. • Reads are cached with free memory. • Writes don’t happen instantly. • A step is introduced to rewrite and merge. Block 9, 10, 1, 4, 200, 5. Block 1, 4, 5, 9, 10, 200

fsync Synopsis #include <unistd.h> int fsync(int fd); int fdatasync(int fd); ! Description fsync() transfers ("flushes") all modified in-core data of (i.e., modified buffer cache pages for) the file referred to by the file descriptor fd to the disk device (or other permanent storage device) where that file resides. The call blocks until the device reports that the transfer has completed. It also flushes metadata information associated with the file (see stat(2)).

Basic  Operation Copyright  ©  2012  Oracle  and/or  its  affiliates.  All  rights  reserved.

Buffer Pool Data Dictionary Cache Buffer Pool Flush List Adaptive Hash Indexes Log Group ibdata1 space 0 Storage Buffer Pool LRU Page Cache Additional Mem Pool Log Buffer Transaction System Caching InnoDB High Level Overview IBUF_HEADER IBUF_TREE TRX_SYS FIRST_RSEG DICT_HDR iblogfile0 iblogfile1 Doublewrite Buffer Data Dict. Block 1 (64 pages) Block 2 (64 pages) SYS_TABLES SYS_COLUMNS SYS_INDEXES SYS_FIELDS iblogfile2 Tables with file_per_table A.ibd B.ibd C.ibd In Memory On Disk

Query (pages not in buffer pool) mysqld SELECT * FROM a WHERE id = 10; Buffer Pool Buffer Pool LRU Adaptive Hash Indexes Data Dictionary Cache Buffer Pool Flush List Page Cache Log Group ibdata1 space 0 Storage InnoDB Additional Mem Pool Log Buffer Transaction System Caching Not Found IBUF_HEADER IBUF_TREE TRX_SYS FIRST_RSEG DICT_HDR iblogfile0 iblogfile1 Doublewrite Buffer Data Dict. Block 1 (64 pages) Block 2 (64 pages) SYS_TABLES SYS_COLUMNS SYS_INDEXES SYS_FIELDS iblogfile2 Tables with file_per_table A.ibd B.ibd C.ibd

Query (pages in buffer pool) mysqld SELECT * FROM a WHERE id = 10; Buffer Pool Data Dictionary Cache Buffer Pool Flush List Adaptive Hash Indexes Log Group ibdata1 space 0 Storage Buffer Pool LRU Page Cache Additional Mem Pool Log Buffer Transaction System Caching InnoDB IBUF_HEADER IBUF_TREE TRX_SYS FIRST_RSEG DICT_HDR iblogfile0 iblogfile1 Doublewrite Buffer Data Dict. Block 1 (64 pages) Block 2 (64 pages) SYS_TABLES SYS_COLUMNS SYS_INDEXES SYS_FIELDS iblogfile2 Tables with file_per_table A.ibd B.ibd C.ibd

Update Query in a Transaction (simplified) mysqld UPDATE a SET col1 = ‘new’ WHERE id = 10; commit; Buffer Pool Data Dictionary Cache Buffer Pool Flush List Adaptive Hash Indexes Log Group ibdata1 space 0 Storage Buffer Pool LRU Page Cache Additional Mem Pool Log Buffer Transaction System Caching InnoDB IBUF_HEADER IBUF_TREE TRX_SYS FIRST_RSEG DICT_HDR iblogfile0 iblogfile1 Doublewrite Buffer Data Dict. Block 1 (64 pages) Block 2 (64 pages) SYS_TABLES SYS_COLUMNS SYS_INDEXES SYS_FIELDS iblogfile2 Tables with file_per_table A.ibd B.ibd C.ibd

Log files • Provide recovery. • Only written to in regular operation. • Read only required if there is a crash. • Are rewritten over-and-over again. • Think of it like a tank tread.

Log files (cont.) • Are an optimization! • 512B aligned sequential writes. • Tablespace writes are 16KiB random writes. • Tablespace writes to same pages in close time window can be merged. • Just need a large enough log file.

Checkpoint (Background Activity) mysqld (nothing) Buffer Pool Data Dictionary Cache Buffer Pool Flush List Adaptive Hash Indexes Log Group ibdata1 space 0 Storage Buffer Pool LRU Page Cache Additional Mem Pool Log Buffer Transaction System Caching InnoDB IBUF_HEADER IBUF_TREE TRX_SYS FIRST_RSEG DICT_HDR iblogfile0 iblogfile1 Doublewrite Buffer Data Dict. Block 1 (64 pages) Block 2 (64 pages) SYS_TABLES SYS_COLUMNS SYS_INDEXES SYS_FIELDS iblogfile2 Tables with file_per_table A.ibd B.ibd C.ibd

FAQ • Q: What do we write to the log - is it committed data only, or can we write uncommitted data as well? • A: Both.

FAQ • Q: How do you unapply transactions? • A: UNDO space.
 Think of it like a hidden table internally stored in ibdata1.

Update Query (More Accurate*) mysqld UPDATE a SET col1 = ‘new’ WHERE id = 10; commit; Page Cache Buffer Pool Data Dictionary Cache Buffer Pool Flush List Adaptive Hash Indexes Additional Mem Pool Redirect older version Update any indexes Modify row in place of row to undo space to hold both versions. Log Group ibdata1 space 0 Storage Buffer Pool LRU Page Cache Log Buffer Transaction System Caching InnoDB IBUF_HEADER IBUF_TREE TRX_SYS FIRST_RSEG DICT_HDR iblogfile0 iblogfile1 Doublewrite Buffer Data Dict. Block 1 (64 pages) Block 2 (64 pages) SYS_TABLES SYS_COLUMNS SYS_INDEXES SYS_FIELDS iblogfile2 Tables with file_per_table A.ibd B.ibd C.ibd

Update Query (cont.) • Background purge process is able to clean old rows from UNDO as soon as oldest transaction advances forward.

Summarized Performance Characteristics • Log Files: • Are short sequential writes. • They permit InnoDB to delay tablespace writes enabling more merging/optimization. • Buffer Pool: • “In memory version of the tablespace”. • Loading/unloading via modified LRU algorithm.

Index Structure • Indexes and “data” in InnoDB are B+Trees. • Clustered Index design means that data itself is stored in an index.

Index Structure (cont.) Empty root Infimum Level 0 Root Page 3 Next Record Supremum

Index Structure (cont.) Insert: 1 Infimum Level 0 Root Page size is 16KB B* is 2KB Page 3 1 B* Supremum

Index Structure (cont.) Insert: 1 to 7 I Level 0 Root S Page 3 1 B* 2 B* 3 B* 4 B* 5 B* 6 B* 7 B*

Index Structure (cont.) Insert: 8 Infimum Page 3 Level 1 Root 1 4 I Level 0 Leaf Allocate new page and link in root Move records to new page Split new page Supremum S Page 4 1 B* 2 B* 3 B* 4 B* 5 B* 6 B* 7 B*

Index Structure (cont.) Insert: 8 (Cont.) Infimum Level 1 Root 1 4 I Level 0 Leaf S Page 4 1 B* Split at the middle of original page 2 B* 3 B* Supremum Page 3 4 5 I S Page 5 4 B* 5 B* 6 B* 7 B*

Index Structure (cont.) Insert: 9 and 10 Infimum Level 1 Root 1 4 I Level 0 Leaf S Page 4 1 B* 2 B* 3 B* Supremum Page 3 4 5 I S Page 5 4 B* 5 B* 6 B* 7 B* 8 B* 9 B* 10 B*

Index Structure (cont.) Insert: 11 Infimum Level 1 Root 4 5 1 4 I Level 0 Leaf Page 3 S Page 4 I Supremum 11 6 S Page 5 I S Page 6 1 B* 2 B* 3 B* Insert leads to a split at the insertion point 4 B* 5 B* 6 B* 7 B* 8 B* 9 B* 10 B* 11 B*

Index Structure Level 2 Root Infimum Level 1 Internal Page 6 0 A 1 B S Page 4 ≥0 → 6 I Level 0 Leaf Next Record ≥0 → 4 I S Supremum Page 3 ≥4 → 5 I Next Page ≥2 → 7 ≥4 → 8 Prev Page I Page 7 2 C 3 D S I Page 8 4 E 5 F S Page 5 S ≥6 → 9 I Page 9 6 G 7 H S

Page Format 0 38 FIL Header (38) Other headers and page data, depending on page type. Total usable space: 16,338 bytes. 16376 16384 FIL Trailer (8)

Row Format N-5 N-4 N-2 N N+k N+k+6 N+k+13 N+k+13+j Variable field lengths (1-2 bytes per var. field) Info Flags (4 bits) Number of Records Owned (4 bits) Order (13 bits) Record Type (3 bits) Next Record Offset (2) Cluster Key Fields (k) Transaction ID (6) Roll Pointer (7) Non-Key Fields (j)

Conclusion • Page is basic unit of storage. • Default is 16KiB • Rows of variable length.

Two more useful features • Adaptive hash - Partial hash index to accelerate secondary key lookups. • Change buffering - when non-unique indexes are not in memory, changes can be temporarily buffered until they are.

Query (by secondary key) mysqld SELECT * FROM a WHERE b_key = 10; Buffer Pool Data Dictionary Cache Buffer Pool Flush List Adaptive Hash Indexes Log Group ibdata1 space 0 Storage Buffer Pool LRU Page Cache Additional Mem Pool Log Buffer Transaction System Caching InnoDB IBUF_HEADER IBUF_TREE TRX_SYS FIRST_RSEG DICT_HDR iblogfile0 iblogfile1 Doublewrite Buffer Data Dict. Block 1 (64 pages) Block 2 (64 pages) SYS_TABLES SYS_COLUMNS SYS_INDEXES SYS_FIELDS iblogfile2 Tables with file_per_table A.ibd B.ibd C.ibd

Update Query (large table) mysqld UPDATE a SET col1 = ‘new’ WHERE id = 10; commit; Buffer Pool Caching InnoDB Buffer Pool LRU Adaptive Hash Indexes Data Dictionary Cache Buffer Pool Flush List Page Cache Additional Mem Pool Not Required Log Group ibdata1 space 0 Storage Transaction System Log Buffer IBUF_HEADER IBUF_TREE TRX_SYS FIRST_RSEG DICT_HDR iblogfile0 iblogfile1 Doublewrite Buffer Data Dict. Block 1 (64 pages) Block 2 (64 pages) SYS_TABLES SYS_COLUMNS SYS_INDEXES SYS_FIELDS iblogfile2 Tables with file_per_table A.ibd B.ibd C.ibd

New  Features Copyright  ©  2012  Oracle  and/or  its  affiliates.  All  rights  reserved.

MySQL 5.5+ • • • • • • • • • IO Scalability Async IO Multiple Buffer Pools Adaptive Flushing Scan Resistant LRU Compressed Pages CPU Scalability Improved Atomics Spin Loops with PAUSE

MySQL 5.6+ • LRU Dump and Restore • Improved Group Commit • Fulltext Search • Fast Read-only Transactions • Memcached Interface • Information Schema metadata tables • • • • Persistent Statistics Variable Page Size Online DDL Transportable Tablespace • Transactional Replication
 Using InnoDB

MySQL 5.7+ • Faster Temporary Tables • Index Lock Contention Reduction • More Online DDL • Extend VARCHAR • Rename Index • Improved Read-Only Transactions • Improved CPU Scalability

Configuration Copyright  ©  2012  Oracle  and/or  its  affiliates.  All  rights  reserved.

The Top 3 1. innodb-buffer-pool-size 2. innodb-log-file-size 3. innodb_flush_log_at_trx_commit

innodb-buffer-pool-size • Really only one major buffer/cache settings to set. • Responsible for all pages types (data, indexes, undo, insert buffer..)

innodb-buffer-pool-size (cont.) • Recommendation is 50-80% of RAM. • Default is 128M of RAM. • Please allow 5-10% on top for other meta data to grow.

innodb-log-file-size • Log files are on disk, but this contributes to how many unflushed (dirty) pages you can hold in memory. • In theory larger log files = longer crash recovery. • In MySQL 5.5 -2G max. • In MySQL 5.6 - 4G is usually safe. • Early versions should be much smaller. • Default is 48M Log Files.

innodb_flush_log_at_trx_commit • Default is full ACID Compliance (=1) • Can be set to 0/2 if you do not mind some data loss.

innodb_flush_log_at_trx_commit • 0 = Log buffer written + synced once per second. Nothing done at commit. • 1 = Log buffer written + synced once per second + written and synced on commit. • 2 = Log buffer written + synced once per second + written (not synced) on commit. ! 2 is a slightly safer version of 0.

Requires about 5-10% of buffer pool size as overhead (not directly configurable). Basic Configuration Buffer Pool InnoDB Buffer Pool LRU Adaptive Hash Indexes Additional Mem Pool innodb_log_file_size. Typical values 256M+. Default of 2 files (innodb_log_files_in_group). Log Group iblogfile0 ibdata1 space 0 innodb_log_buffer_size. Typical Doublewrite Buffer values 1-8M. Flushing IBUF_HEADER IBUF_TREE Block 1 (64 pages) behaviour influenced by TRX_SYS Block 2 (64 pages) FIRST_RSEG innodb_flush_log_at_trx_commit. DICT_HDR Storage Data Dictionary Cache Buffer Pool Flush List Page Cache Log Buffer Transaction System Caching Innodb_buffer_pool_size. Recommendation is 50-80% RAM. iblogfile1 iblogfile2 Tables with file_per_table Data Dict. SYS_TABLES SYS_COLUMNS SYS_INDEXES SYS_FIELDS A.ibd B.ibd C.ibd innodb_file_per_table (Default: ON in 5.6+)

The ~Top 10 1. innodb-buffer-pool-size 2. innodb-log-file-size 3. innodb-log-buffer-size 4. innodb_flush_log_at_trx_commit 5. innodb_flush_method 6. innodb_flush_neighbors 7. innodb_io_capacity, innodb_io_capacity_max, innodb_lru_scan_depth 8. innodb-buffer-pool-instances 9. innodb_read_io_threads and innodb_write_io_threads

Less-likely to need configuration • • • • • innodb_thread_concurrency innodb_concurrency_tickets innodb_max_pct_dirty_pages innodb_use_native_aio (always on) innodb_old_blocks_time (5.6 default: 1000)

Deprecated Settings • Typically “remove on sight” from config files: • innodb_additional_mempool_size • innodb_use_sys_malloc

Credits • InnoDB Architecture Diagrams via https://github.com/ jeremycole/innodb_diagrams • Available under (3-clause) BSD license
 Copyright (c) 2013, Twitter, Inc.
 Copyright (c) 2013, Jeremy Cole <jeremy@jcole.us>
 Copyright (c) 2013, Davi Arnaut <darnaut@gmail.com>

Add a comment

Related pages

Chapter 15 The InnoDB Storage Engine - MySQL :: Developer Zone

The world's most popular open source database MySQL.com; Downloads; Documentation; Developer Zone; Developer Zone Downloads MySQL.com
Read more

Chapter 14 The InnoDB Storage Engine - MySQL :: Developer Zone

The world's most popular open source database MySQL.com; Downloads; Documentation; Developer Zone; Developer Zone Downloads MySQL.com
Read more

MySQL 5.5: Storage Engine Performance Benchmark for MyISAM ...

With the release of MySQL 5.5, InnoDB has become the default storage engine. InnoDB is designed to handle transactional applications that require crash ...
Read more

InnoDB - Wikipedia, the free encyclopedia

The InnoDB Storage Engine, in the MySQL manual. MySQL. Editions: MySQL; MySQL Enterprise; MySQL Cluster; MySQL tools: MySQL Workbench; SQLyog; phpMyAdmin ...
Read more

Video: The InnoDB Storage Engine for MySQL - MySQL ...

Webinars. Percona offers free webinars about MySQL®, OpenStack, NoSQL, Percona Toolkit, DBA best practices and more. Read More; Ebooks. Important ...
Read more

Comparison of MySQL database engines - Wikipedia, the free ...

Comparison of MySQL database engines ... (or "storage engine") ... InnoDB recovers from a crash or other unexpected shutdown by replaying its logs.
Read more

windows - Howto: Clean a mysql InnoDB storage engine ...

The InnoDB engine does not store deleted data. As you insert and delete rows, unused space is left allocated within the InnoDB storage files.
Read more

Percona XtraDB Storage Engine for MySQL - Percona XtraDB ...

Percona XtraDB is an enhanced version of the InnoDB storage engine for MySQL® and MariaDB®. With more features, faster performance and better scalability ...
Read more

InnoDB 1.1 for MySQL 5.5 User's Guide

1.1 Features of the InnoDB Storage Engine InnoDB in MySQL 5.5 contains several important new features: • Fast index creation: ...
Read more