SSTable format Block Based¶
Note
Block-Based format SSTable is the default format of RocksDB SSTable.
The full name of SSTable is Static Sorted Table which the basic file to
store the data in RocksDB.
Top Level Structure¶
| label | type | note |
| data blocks | data_block[N] | sorted order |
| meta blocks | meta_block[K] | filter, stats, compression dictionary, range deletion |
| metaindex blocks | metaindex_block[K] | |
| index blocks | index_block[N] | N for one-level index, Not N for two-level index |
| footer | footer | fixed size |
0. Data Block¶
Note
This Block is built by block_builder.cc in sources of RocksDB.
The data block format as below:
| label | type |
| groups | group[groups_count] |
| groups_offset | fixed32[groups_count] |
| groups_count | fixed32 |
| compress_type | char |
| CRC32 | fixed32 |
The data group format as below:
| label | type | note |
| shared_bytes | varint32 | compress prefix of key, 0 for restart point |
| unshared_bytes | varint32 | unshared key bytes length |
| value_length | varint32 | value length |
| key_delta | char[unshared_bytes] | unshared key bytes |
| value | char[value_length] | value bytes |
1. Meta Block¶
Note
The meta block is built by block_builder.cc in sources of RocksDB too.
1.0. Filter Meta Block¶
1. Full filter block for entire SSTable.
2. Partitioned filter for too big filter block.In this case, the are two-level
of filter, the one is the top-level index for 2nd filter block, the other is
the real of filter meta block.
1.1. Properties Block¶
| label | type |
| props | K/V[P] |
Default properties as below:
- data size
- index size
- filter size
- raw key size ; size of key before any process(such as compress etc.)
- raw value size ; size of value before any process(such as compress etc.)
- number of entries
- number of data block
1.2. Compression Dictionary¶
Note
This only apply to bottommost level.
1.3. Range Deletion¶
Note
Can only be obsoleted during compaction to the bottommost level.
| label | note |
| User Key | the range’s begin key |
| Sequence Number | the sequence number at which range-deletion was inserted to the DB |
| Value Type | kTypeRangeDeletion |
| Value | the range’s end key |
2. Meta Index¶
The entry of each metaindex block.
K : name of the metablock
V : BlockHandle point to corresponding metablock.
| label | type |
| metaindex blocks | metaindex_block[K] |